Fixed RabbitMQ cluster in Docker Swarm Mode

Posted in category docker on 2017-01-20

You can find concrete implementation in the GitHub repository - rabbitmq (Tree: 391be14915).

Setting up RabbitMQ cluster in Docker Swarm (Mode)

Official clustering guidelines suggest that there are a few ways to create RabbitMQ cluster. There are also options that will allow creating clusters that could discover its nodes automatically through some discovery service like Etcs or Consul.

Due to the fact that our runtime environment is Docker Swarm where we will need to mount volumes to ensure that data is getting persisted and not lost over the course of RabbitMQ upgrade or a restart. For us it was proven to be highly problemmatic to make sure that new instances of RabbitMQ cluster get mounted to the same mount points and in the same time recognize new cluster configuration (since after restart all RabbitMQ nodes will have new hostnames while e.g. Consul still have an understanding of a cluster with old hostnames). There might be a solution to this which we would like to know about. So, if you are willing to contribute, please send us your pull request.

What this project attempts to do is to create a declarative approach to defining RabbitMQ cluster i.e. define nodes that needs to be descovered on startup of docker image.

Parameters that can control how exactly RabbitMQ is going to be configured are here:

Here is how we will do this (Dockerfile):

We are taking our own rabbitmq.config file that has only one important bit to it - cluster recovery mode:

And we change our entry point to our own script where we can pre-configure a lot of things (for that reason we needed to have python installed as part of our image):

Notice that by default we create SyncQs policy that will automatically synchronize queues across all cluster nodes.

RABBITMQ_SETUP_DELAY is used here to make sure different nodes are trying to join cluster and setup other things in different times.

Configuring persistence layer

Let’s now setup persistence layer such that after RabbitMQ restart data stays intact. Since we are currently running 3 instances of RabbitMQ, we will need to also create target folder for mount point that is going to be used by RabbitMQ server (let’s say on SERVER1, SERVER3 and SERVER5):

Then, we need to label our swarm cluster nodes appropriately.

It is possible to see that label has been correctly set by invoking following command:

This will produce relatively big output, you will need to inspect Spec > Labels part of it.

And now, after we have configured our labels and created folder for mount point, we can revisit service creation instructions for e.g. 3-noded RabbitMQ cluster:

$ docker service create \
    -name rabbit-1 \
    -network net \
    -e RABBITMQ_SETUP_DELAY=120 \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-2 rabbit@rabbit' \
    -constraint node.labels.rabbitmq==1 \
    -mount type=bind,source=/data/rabbitmq-1,target=/var/lib/rabbitmq \
    -e RABBITMQ_NODENAME=rabbit@rabbit-1 \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    kuznero/rabbitmq:3.6.6-cluster

$ docker service create \
    -name rabbit-2 \
    -network net \
    -e RABBITMQ_SETUP_DELAY=60 \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-1 rabbit@rabbit' \
    -constraint node.labels.rabbitmq==2 \
    -mount type=bind,source=/data/rabbitmq-2,target=/var/lib/rabbitmq \
    -e RABBITMQ_NODENAME=rabbit@rabbit-2 \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    kuznero/rabbitmq:3.6.6-cluster

$ docker service create \
    -name rabbit \
    -network net \
    -p #{HTTP_UI_PORT}:15672 \
    -e RABBITMQ_SETUP_DELAY=20 \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-1 rabbit@rabbit-2' \
    -constraint node.labels.rabbitmq==3 \
    -mount type=bind,source=/data/rabbitmq-3,target=/var/lib/rabbitmq \
    -e RABBITMQ_NODENAME=rabbit@rabbit \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    kuznero/rabbitmq:3.6.6-cluster

This will start 3 different services (single replica services).

Considerations for delivery pipeline for RabbitMQ cluster

All nodes of RabbitMQ cluster must run same version of RabbitMQ and OTP. That enforces some limitations onto how it is possible to perform upgrades. The only option for RabbitMQ cluster upgrade is during non-working hours when there is no activity such that it is possible to bring whole cluster down and upgrade it.