Posts


ckan-docker updated
30 Nov 2014


Checkout the update of ckan-docker on Github

It’s a big update that adds a data container to the project to store CKAN FileStore and Postgres Data & config.

After a few weeks working with this project for a client, I have realised how important that is and that docker cp and pg_dump are not quick enough to keep your productivity up.

It’s very easy to use; in fig.yml, a service called data is built from the data Dockerfile

data:
  build: docker/data
  hostname: data
  domainname: localdomain

… and the Postgres & CKAN containers inherites the volumes from the data container /var/lib/ckan, /etc/postgresql/9.3/main, /var/lib/postgresql/9.3/main with the volumes_from instruction

  volumes_from:
    - data

Of course custom location can be set if you decide to use other locations, but I deliberatelly chose to override standard locations by default, to make it easier to takeon by anyone. The locations are identified by environement variables when the image is created, so you would only have to change their values and the value of volumes to match it.

Check this out on CKAN Github repo

Cheers,

Clément




CKAN Docker Official Repo
26 Oct 2014


A few days ago I submitted a pull request to share my work on Docker for CKAN with the rest of the community.

But it’s a little too big :) … 37 files changed with 1,266 additions and 115 deletions.

So we’ve decided to move the Docker, Fig & Vagrant stuff out of the main CKAN repo, which makes a lot of sense.

In the meantime I’ve officially become a member of the CKAN organisation on Github which is pretty cool!

Extracting the Docker stuff out got me thinking a lot, and I re-factored even more stuff.

I’ve come up with a radically different tree structure:

├── Dockerfile 			# CKAN Dockerfile
├── LICENCE
├── README.md
├── Vagrantfile			# CKAN Vagrantfile
├── _etc 			# config files copied to /etc
│   ├── README.md
│   ├── apache2
│   │   ├── apache.conf
│   │   └── apache.wsgi
│   ├── ckan
│   │   └── custom_options.ini
│   ├── cron.d
│   │   └── ckan
│   ├── my_init.d
│   │   └── 55_configure
│   ├── nginx
│   │   └── nginx.conf
│   ├── postfix
│   │   └── main.cf
│   └── supervisor
│       └── conf.d
├── _service-provider 		# any service provider used in the portal
│   ├── README.md
│   └── datapusher
├── _solr			# any custom schema
│   ├── README.md
│   └── schema.xml
├── _src 			# CKAN & extensions source code
│   ├── README.md
│   ├── ckan
│   └── ckanext-...
├── docker 			# Dockerfiles & supporting files
│   ├── ckan
│   │   ├── my_init.d
│   │   ├── pip_install_req.sh
│   │   └── svc
│   ├── fig
│   │   ├── Dockerfile
│   │   └── README.md
│   ├── insecure_key
│   ├── nginx
│   │   ├── Dockerfile
│   │   └── cmd.sh
│   ├── postgres
│   │   ├── Dockerfile
│   │   └── svc
│   └── solr
│       ├── Dockerfile
│       └── svc
├── fig.yml
├── vagrant			# Vagrant Docker host (used in OS X & Windows)
└── docker-host
    └── Vagrantfile

The point of this structure if that it should be easy to manage your entire project. This should be able to wrap everything you need in there, and package it.

I’ve also consolidated the various Dockerfiles I wrote for CKAN (default, custom & dev) into one :)

  • all processes are managed by supervisor, which makes it easier to shut-down Apache in a development context, and use paster instead.
  • Nginx now gone from the CKAN Dockerfile, and this service is now handled by another container as it should
  • The requirement of being able to make live edits on the code for development is covered by mounting a volume as source directory, which overrides the data that was initially copied in the container. Pip requirements remain and I just have to re-install the packages automatically as part of the init process.
  • I’ve added a lot of ONBUILD triggers that allow building children images for dev & prod, which covers what I was initially doing with my custom Dockerfile. -I’ve also extracted the datapusher from the CKAN config, into a separate container.

And there is more…

Check this out on CKAN Github repo

Cheers,

Clément




CKAN Development & Deployment using Docker, Fig & Vagrant
19 Oct 2014


Intro

When I discovered CKAN over a year ago it was version 1.7 or 8 and the implementation I studied was using Elastic Search for indexing… I was really confused by the complexity of the setup and it took me a few attempts to get my “play box” right.

By the time I started working on a project based on CKAN 2.2. the documentation had improved tremendously, and projects such as Data.gov.uk To Go as well as CKAN Packaging Scripts have really improved the way you can package & deploy the many components of a typical CKAN portal.

A few months ago I gave a short talk on CKAN at a JBug Scotland event and I was really amazed by Ian Lawson presentation on OpenShift. A few months after that I discovered Docker. And also found out that OpenShift was going to support Docker

I thought this was a really good news, because that meant that if I “containerised” my CKAN install I could use the same containers in every environments I’m working on, Dev, Test, Staging, Prod, Cloud!… and I wasn’t the only one thinking that way, in May Nick Stenning make a great Pull Request with the first containers for CKAN

There were a few issues though, such as the absence of datastore, inability to setup the ckanext-spatial because PostGIS was not installed, editing the config was complex and not very flexible, and the three containers were using different bases which meant that you were pulling three bases images instead of caching one.

Standing on the shoulders of giants

I picked up from there as re-factored the containers one by one, starting with Postgres, then Solr & CKAN. When that was done I created another Dockerfile that extends the main CKAN Dockerfile to allow custom configurations based on the core project.

  • All containers extend the same base image phusion/baseimage (updated to 0.9.13), which means you only pull & cache them once, the first few steps are also identical to rely on Docker cache as much as possible

  • The Postgres container installs PostGIS, configures the database, the datastore & PostGIS on the CKAN database. The default names & passwords can easily be overridden with environment variables.

  • The Solr container has been updated to 4.10.1

  • The CKAN Core container has been updated to configure the datapusher, has all the dependencies required to use the spatial extension & also supervisor to manage tasks.

  • The custom config shows how to extend the Core container to enable common extensions such as ckanext-viewhelpers, ckanext-archiver, ckanext-spatial, ckanext-harvest, and how you can extract services such as redis from the CKAN container and let that service be handled by a separate container.

Docker

Building containers is easy, caching is powerful. But you need to cheat sometimes, especially with the ADD command. In the Solr container for instance, I quickly realised that the following command:

	ADD https://archive.apache.org/dist/lucene/solr/$SOLR_VERSION/$SOLR.tgz

is not cached, whereas

	RUN wget --progress=bar:force https://archive.apache.org/dist/lucene/solr/$SOLR_VERSION/$SOLR.tgz

is.

And since Solr tar is over a 100Mb, so installing wget & cheating is really worth it! In some cases like that RUN is more appropriate than ADD, but it really depends on the use case.

Managing containers can be tedious, especially when you’re developing them. There are a lot of tools to help. I’ve not tried Shipyard yet but I will soon. In the meantime docker-cleanup is pretty useful, and the usual docker stop $(docker ps -aq) & docker rm $(docker ps -aq) work great to clean-up any running containers

But when I’m working with a custom Docker container I have to type (or copy & paste) 4 commands to build them, 4 commands to run them… and just as many to stop the containers

#  build the containers locally
docker build --tag="clementmouchet/postgres" .
docker build --tag="clementmouchet/solr" .
docker build --tag="clementmouchet/ckan" .

# build your custom container
docker build --tag="clementmouchet/ckan_custom" .

docker run -d --name postgres -h postgres.docker.local clementmouchet/postgres
docker run -d --name solr -h solr.docker.local clementmouchet/solr
docker run -d --name redis -h redis.docker.local redis

docker run \
    -d \
    --name ckan \
    -h ckan.docker.local \
    -p 80:80 \
    -p 8800:8800 \
    --link postgres:postgres \
    --link solr:solr \
    --link redis:redis \
    clementmouchet/ckan_custom

This is a bit tedious, and that’s why I looked at Fig

Fig

Fig allows you to define all the above in a single YAML file to do the following:

  • start, stop and rebuild services
  • view the status of running services
  • tail running services’ log output
  • run a one-off command on a service

so the 8+ commands above are reduced to 1: fig up thanks to the definition below:

postgres:
  build: ../postgresql
  hostname: postgres
  domainname: docker.local
  environment:
    - CKAN_PASS=ckan_pass
    - DATASTORE_PASS=datastore_pass
solr:
  build: ../../../ckan/config/solr
  hostname: solr
  domainname: docker.local
redis:
  image: redis:2.8
  hostname: redis
  domainname: docker.local
ckan:
  build: .
  hostname: ckan
  domainname: docker.local
  ports:
    - "80:80"
    - "8800:8800"
  links:
    - postgres:postgres
    - solr:solr
    - redis:redis

And fig can simplify the rest of the docker commands you want to run, to view logs etc.

Vagrant

Now you may wonder why do you need/want Vagrant? The whole point about Docker is that containers are not VMs, and Fig has reduced the complexity of managing containers, why would you want to bring virtualisation back in the picture?

Well the answer is simple: portability. I have a personal Mac, a work PC, and Linux servers… Docker will work on all those operating systems; natively on Linux and through proxy a VM on OS X & Windows: Boot2docker. I love this project, it’s fast, lightweight & simple to use, but it doesn’t support volumes & shared folders on Windows yet (Boot2docker 1.3 offers partial support on Mac OS X), and it’s not really representing your production host.

That’s why I think Vagrant is useful, and I was really excited to see support for Docker added in Vagrant 1.6

My goal was you make sure than any development environment would represent production and behave exactly the same. This also helps portability of the environment, since a simple command: vagrant up --provider=docker --no-parallel will create Linux hosts running Docker if required (OSX & Windows), build & run boot all the containers in order & mount the source directory on your machine as a volume inside the container.

The development Dockerfile is slightly different & designed to be lightweight, Apache & Nginx are not installed. paster serve does just what you want on a dev box. vagrant ssh also works a treat with Phusion baseimage and you can ssh directly into the container.

Wrap up

That was a great personal journey into containerisation & virtualisation to build consistent & portable development environments. There’s still to be done on the core Dockerfile to extract Nginx from the main container & link the official Ngnix container instead. The Example Vagrant file is really just a template to show what’s possible but at the moment it only maps the CKAN source directory, so you would have to add new synced folders to build your custom extensions. It’s just one step further, and hopefully it’s just a start.

Next

Check this out on my Github repo

Clément

References

some really good reading




Working with Jekyll
11 Oct 2014


I’ve spent a few hours playing with this blog and jekyll itself. It’s pretty cool. I really like the flexibility of the platform, and the fact you can pretty much throw any markup at it… I have blog pages like this in Markdown, and other pages in HTML, you can just pick and choose the mardown format as well.

It’s also my first time playing with SASS. I’ve used LESS in the past, when developing custom templates & themes for CKAN, and to be fair I can’t see much difference between the two, both work pretty well and do what you expect; functions, variables etc.

Jekyll is really a useful tool if you know what you want, because you’ll have the freedom to do whatever you want. The drawback is that you have to do a lot more that you have to with a CMS such as Drupal or Wordpress… Mostly because you don’t rely on a database or index to do search & faceting. I’ll have a look at plugins & modules to cover that at some point.

Anyway the fun is there for sure. I’ll be posting soon about Docker & CKAN containers.

Clément




Say Hi to Jekyll!
10 Oct 2014


Say hi to http://clementmouchet.github.io

I’ve just created this website/blog using Jekyll, JQuery & Bootstrap, loved the simplicity and the free hosting on Github!

Jekyll is really simple, and easy to manage, no databses, just markdown pages, like the one you’re reading now, see below:

---
layout: post
title:  "Welcome to Jekyll!"
date:   2014-10-10 23:51:24
categories: jekyll update
---

Say hi to http://clementmouchet.github.io

I've just created this website/blog using [Jekyll][jekyll], [JQuery][JQuery] & [Bootstrap][Bootstrap], loved the simplicity and the free hosting on Github!

[Jekyll][jekyll] is really simple, and easy to manage, no databses, just markdown pages, like the one you're reading now, see below:
[Visit the jekyll website to find out more][jekyll]

I'll add a few posts as soon as possible with some stuff regardin [Docker][Docker]

Cheers,

Clément

[Docker]:      https://www.docker.com
[Bootstrap]:      http://getbootstrap.com
[JQuery]:      hhttp://jquery.com
[jekyll]:      http://jekyllrb.com

Visit the jekyll website to find out more

I’ll add a few posts as soon as possible with some stuff regardin Docker

Cheers,

Clément