Intro

When I discovered CKAN over a year ago it was version 1.7 or 8 and the implementation I studied was using Elastic Search for indexing… I was really confused by the complexity of the setup and it took me a few attempts to get my “play box” right.

By the time I started working on a project based on CKAN 2.2. the documentation had improved tremendously, and projects such as Data.gov.uk To Go as well as CKAN Packaging Scripts have really improved the way you can package & deploy the many components of a typical CKAN portal.

A few months ago I gave a short talk on CKAN at a JBug Scotland event and I was really amazed by Ian Lawson presentation on OpenShift. A few months after that I discovered Docker. And also found out that OpenShift was going to support Docker

I thought this was a really good news, because that meant that if I “containerised” my CKAN install I could use the same containers in every environments I’m working on, Dev, Test, Staging, Prod, Cloud!… and I wasn’t the only one thinking that way, in May Nick Stenning make a great Pull Request with the first containers for CKAN

There were a few issues though, such as the absence of datastore, inability to setup the ckanext-spatial because PostGIS was not installed, editing the config was complex and not very flexible, and the three containers were using different bases which meant that you were pulling three bases images instead of caching one.

Standing on the shoulders of giants

I picked up from there as re-factored the containers one by one, starting with Postgres, then Solr & CKAN. When that was done I created another Dockerfile that extends the main CKAN Dockerfile to allow custom configurations based on the core project.

All containers extend the same base image phusion/baseimage (updated to 0.9.13), which means you only pull & cache them once, the first few steps are also identical to rely on Docker cache as much as possible
The Postgres container installs PostGIS, configures the database, the datastore & PostGIS on the CKAN database. The default names & passwords can easily be overridden with environment variables.
The Solr container has been updated to 4.10.1
The CKAN Core container has been updated to configure the datapusher, has all the dependencies required to use the spatial extension & also supervisor to manage tasks.
The custom config shows how to extend the Core container to enable common extensions such as ckanext-viewhelpers, ckanext-archiver, ckanext-spatial, ckanext-harvest, and how you can extract services such as redis from the CKAN container and let that service be handled by a separate container.

Docker

Building containers is easy, caching is powerful. But you need to cheat sometimes, especially with the ADD command. In the Solr container for instance, I quickly realised that the following command:

	ADD https://archive.apache.org/dist/lucene/solr/$SOLR_VERSION/$SOLR.tgz

is not cached, whereas

	RUN wget --progress=bar:force https://archive.apache.org/dist/lucene/solr/$SOLR_VERSION/$SOLR.tgz

is.

And since Solr tar is over a 100Mb, so installing wget & cheating is really worth it! In some cases like that RUN is more appropriate than ADD, but it really depends on the use case.

Managing containers can be tedious, especially when you’re developing them. There are a lot of tools to help. I’ve not tried Shipyard yet but I will soon. In the meantime docker-cleanup is pretty useful, and the usual docker stop $(docker ps -aq) & docker rm $(docker ps -aq) work great to clean-up any running containers

But when I’m working with a custom Docker container I have to type (or copy & paste) 4 commands to build them, 4 commands to run them… and just as many to stop the containers

#  build the containers locally
docker build --tag="clementmouchet/postgres" .
docker build --tag="clementmouchet/solr" .
docker build --tag="clementmouchet/ckan" .

# build your custom container
docker build --tag="clementmouchet/ckan_custom" .

docker run -d --name postgres -h postgres.docker.local clementmouchet/postgres
docker run -d --name solr -h solr.docker.local clementmouchet/solr
docker run -d --name redis -h redis.docker.local redis

docker run \
    -d \
    --name ckan \
    -h ckan.docker.local \
    -p 80:80 \
    -p 8800:8800 \
    --link postgres:postgres \
    --link solr:solr \
    --link redis:redis \
    clementmouchet/ckan_custom

This is a bit tedious, and that’s why I looked at Fig

Fig

Fig allows you to define all the above in a single YAML file to do the following:

start, stop and rebuild services
view the status of running services
tail running services’ log output
run a one-off command on a service

so the 8+ commands above are reduced to 1: fig up thanks to the definition below:

postgres:
  build: ../postgresql
  hostname: postgres
  domainname: docker.local
  environment:
    - CKAN_PASS=ckan_pass
    - DATASTORE_PASS=datastore_pass
solr:
  build: ../../../ckan/config/solr
  hostname: solr
  domainname: docker.local
redis:
  image: redis:2.8
  hostname: redis
  domainname: docker.local
ckan:
  build: .
  hostname: ckan
  domainname: docker.local
  ports:
    - "80:80"
    - "8800:8800"
  links:
    - postgres:postgres
    - solr:solr
    - redis:redis

And fig can simplify the rest of the docker commands you want to run, to view logs etc.

Vagrant

Now you may wonder why do you need/want Vagrant? The whole point about Docker is that containers are not VMs, and Fig has reduced the complexity of managing containers, why would you want to bring virtualisation back in the picture?

Well the answer is simple: portability. I have a personal Mac, a work PC, and Linux servers… Docker will work on all those operating systems; natively on Linux and through proxy a VM on OS X & Windows: Boot2docker. I love this project, it’s fast, lightweight & simple to use, but it doesn’t support volumes & shared folders on Windows yet (Boot2docker 1.3 offers partial support on Mac OS X), and it’s not really representing your production host.

That’s why I think Vagrant is useful, and I was really excited to see support for Docker added in Vagrant 1.6

My goal was you make sure than any development environment would represent production and behave exactly the same. This also helps portability of the environment, since a simple command: vagrant up --provider=docker --no-parallel will create Linux hosts running Docker if required (OSX & Windows), build & run boot all the containers in order & mount the source directory on your machine as a volume inside the container.

The development Dockerfile is slightly different & designed to be lightweight, Apache & Nginx are not installed. paster serve does just what you want on a dev box. vagrant ssh also works a treat with Phusion baseimage and you can ssh directly into the container.

Wrap up

That was a great personal journey into containerisation & virtualisation to build consistent & portable development environments. There’s still to be done on the core Dockerfile to extract Nginx from the main container & link the official Ngnix container instead. The Example Vagrant file is really just a template to show what’s possible but at the moment it only maps the CKAN source directory, so you would have to add new synced folders to build your custom extensions. It’s just one step further, and hopefully it’s just a start.

Check this out on my Github repo

Clément

References

some really good reading

Clément Mouchet

CKAN Development & Deployment using Docker, Fig & Vagrant