Improving Fixed RabbitMQ cluster in Docker Swarm Mode

Posted in category docker on 2017-06-14

You can find concrete implementation in the GitHub repository - rabbitmq (Tree: 6f2b504008).

This post is based on the previous post in the series “Building RabbitMQ cluster in Docker Swarm Mode” that elaborated on how to setup fixed RabbitMQ cluster in Docker Swarm. Over time it became clear that there are some caveats with how it was suggested to do it. For example:

Available images

There is a docker image available on hub.docker.com - kuznero/rabbitmq:3.6.10-mancluster. It follows official rabbitmq:3.6.10-management.

Setting up RabbitMQ cluster in Docker Swarm (Mode)

Parameters that can control how exactly RabbitMQ is going to be configured are now extended compare to the previous version:

Here is how we will do it (Dockerfile):

FROM rabbitmq:3.6.10-management

COPY rabbitmq.config /etc/rabbitmq/rabbitmq.config
RUN chmod 777 /etc/rabbitmq/rabbitmq.config

ENV RABBITMQ_SETUP_DELAY=5
ENV RABBITMQ_USER=guest
ENV RABBITMQ_PASSWORD=guest
ENV RABBITMQ_LOOPBACK_USERS=guest
ENV RABBITMQ_CLUSTER_NODES=
ENV RABBITMQ_CLUSTER_PARTITION_HANDLING=autoheal
ENV RABBITMQ_CLUSTER_DISC_RAM=disc
ENV RABBITMQ_FIREHOSE_QUEUENAME=
ENV RABBITMQ_FIREHOSE_ROUTINGKEY=publish.#
ENV RABBITMQ_HIPE_COMPILE=false
ENV RABBITMQ_NODENAME=

RUN apt-get update -y && apt-get install -y python

ADD init.sh /init.sh
EXPOSE 15672

CMD ["/init.sh"]

We will now take rabbitmq.config file that has a lot more static configuration than before:

%% -*- mode: erlang -*-
[
 {rabbit,
  [
   {cluster_partition_handling, [[CLUSTER_PARTITION_HANDLING]]},
   {cluster_nodes, {[[[CLUSTER_NODES]]], [[CLUSTER_DISC_RAM]]}},
   {default_vhost, <<"/">>},
   {default_user, <<"[[USER]]">>},
   {default_pass, <<"[[PASSWORD]]">>},
   {default_permissions, [<<".*">>, <<".*">>, <<".*">>]},
   {default_user_tags, [administrator, management]},
   {hipe_compile, [[HIPE_COMPILE]]},
   {loopback_users, [[[LOOPBACK_USERS]]]},
   {mnesia_table_loading_retry_limit, 10},
   {mnesia_table_loading_retry_timeout, 30000}
   % {log_levels, [{connection, channel, federation, mirroring, debug}]}
  ]}
].

Note that placeholder in the form [[PLACEHOLDER]] will be replaced right before container starts filling respective values from environment variables passed to it.

Notice that init.sh is the entry point in our Dockerfile. This script will start rabbitmq-server as well as pre-configure everything required:

#!/usr/bin/env bash

echo "RABBITMQ_SETUP_DELAY                = ${RABBITMQ_SETUP_DELAY:=5}"
echo "RABBITMQ_USER                       = ${RABBITMQ_USER:=guest}"
echo "RABBITMQ_PASSWORD                   = ${RABBITMQ_PASSWORD:=guest}"
echo "RABBITMQ_LOOPBACK_USERS             = RABBITMQ_LOOPBACK_USERS"
echo "RABBITMQ_CLUSTER_NODES              = $RABBITMQ_CLUSTER_NODES"
echo "RABBITMQ_CLUSTER_PARTITION_HANDLING = ${RABBITMQ_CLUSTER_PARTITION_HANDLING:=autoheal}"
echo "RABBITMQ_CLUSTER_DISC_RAM           = ${RABBITMQ_CLUSTER_DISC_RAM:=disc}"
echo "RABBITMQ_FIREHOSE_QUEUENAME         = $RABBITMQ_FIREHOSE_QUEUENAME"
echo "RABBITMQ_FIREHOSE_ROUTINGKEY        = $RABBITMQ_FIREHOSE_ROUTINGKEY"
echo "RABBITMQ_HIPE_COMPILE               = ${RABBITMQ_HIPE_COMPILE:=false}"
echo "RABBITMQ_NODENAME                   = $RABBITMQ_NODENAME"

CONFIG=/etc/rabbitmq/rabbitmq.config

nodes_list=""
IFS=' '; read -ra nodes <<< "$RABBITMQ_CLUSTER_NODES"
for node in "${nodes[@]}"; do
  nodes_list="$nodes_list, '$node'"
done
nodes_list=${nodes_list:2}

lbusers_list=""
IFS=' '; read -ra lbusers <<< "$RABBITMQ_LOOPBACK_USERS"
for lbuser in "${lbusers[@]}"; do
  lbusers_list="$lbusers_list, <<\"$lbuser\">>"
done
lbusers_list=${lbusers_list:2}

sed -i "s/\[\[CLUSTER_PARTITION_HANDLING\]\]/$RABBITMQ_CLUSTER_PARTITION_HANDLING/" $CONFIG
sed -i "s/\[\[CLUSTER_NODES\]\]/$nodes_list/" $CONFIG
sed -i "s/\[\[CLUSTER_DISC_RAM\]\]/$RABBITMQ_CLUSTER_DISC_RAM/" $CONFIG
sed -i "s/\[\[HIPE_COMPILE\]\]/$RABBITMQ_HIPE_COMPILE/" $CONFIG
sed -i "s/\[\[USER\]\]/$RABBITMQ_USER/" $CONFIG
sed -i "s/\[\[PASSWORD\]\]/$RABBITMQ_PASSWORD/" $CONFIG
sed -i "s/\[\[LOOPBACK_USERS\]\]/$lbusers_list/" $CONFIG

echo "<< RabbitMQ.config ... >>>"
cat $CONFIG
echo "<< RabbitMQ.config >>>"

(
  sleep ${RABBITMQ_SETUP_DELAY:-5}

  rabbitmqctl \
    set_policy SyncQs '.*' '{"ha-mode":"all","ha-sync-mode":"automatic"}' \
    --priority 0 --apply-to queues
  if [[ "$RABBITMQ_FIREHOSE_QUEUENAME" != "" ]]; then
    echo "<< Enabling Firehose ... >>>"
    ln -s $(find -iname rabbitmqadmin | head -1) /rabbitmqadmin
    chmod +x /rabbitmqadmin
    echo -n "Declaring '$RABBITMQ_FIREHOSE_QUEUENAME' queue ... "
    ./rabbitmqadmin declare queue name=$RABBITMQ_FIREHOSE_QUEUENAME
    ./rabbitmqadmin list queues
    echo -n "'amq.rabbitmq.trace' -> '$RABBITMQ_FIREHOSE_QUEUENAME' ... "
    ./rabbitmqadmin declare binding \
      source=amq.rabbitmq.trace \
      destination=$RABBITMQ_FIREHOSE_QUEUENAME \
      routing_key=$RABBITMQ_FIREHOSE_ROUTINGKEY
    ./rabbitmqadmin list bindings
    rabbitmqctl trace_on
    echo "<< Enabling Firehose ... DONE >>>"
  fi
) & rabbitmq-server $@

Notice that by default we create SyncQs policy that will automatically synchronize queues across all cluster nodes.

RABBITMQ_SETUP_DELAY (in seconds) is used here to make sure setup process starts when RabbitMQ server had started (typically a small value, like 5 seconds) and is not here to delay the startup a lot.

In case you need to build slimmer version of docker image there is a possibility to give up python environment, but only if you ready to give up firehose feature.

Docker Services

$ docker service create \
    --name rabbit-1 \
    --network net \
    --constraint node.labels.rabbitmq-1==on \
    --mount type=bind,source=/data/rabbitmq-1,target=/var/lib/rabbitmq \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-1 rabbit@rabbit-2 rabbit@rabbit-3' \
    -e RABBITMQ_CLUSTER_PARTITION_HANDLING=autoheal \
    -e RABBITMQ_CLUSTER_DISC_RAM=disc \
    -e RABBITMQ_NODENAME=rabbit@rabbit-1 \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    -e RABBITMQ_HIPE_COMPILE=true \
    kuznero/rabbitmq:3.6.10-mancluster

$ sleep 10

$ docker service create \
    --name rabbit-2 \
    --network net \
    --constraint node.labels.rabbitmq-2==on \
    --mount type=bind,source=/data/rabbitmq-2,target=/var/lib/rabbitmq \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-1 rabbit@rabbit-2 rabbit@rabbit-3' \
    -e RABBITMQ_CLUSTER_PARTITION_HANDLING=autoheal \
    -e RABBITMQ_CLUSTER_DISC_RAM=disc \
    -e RABBITMQ_NODENAME=rabbit@rabbit-2 \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    -e RABBITMQ_HIPE_COMPILE=true \
    kuznero/rabbitmq:3.6.10-mancluster

$ sleep 10

$ docker service create \
    --name rabbit-3 \
    --network net \
    --constraint node.labels.rabbitmq-3==on \
    --mount type=bind,source=/data/rabbitmq-3,target=/var/lib/rabbitmq \
    -e RABBITMQ_USER=admin \
    -e RABBITMQ_PASSWORD=adminpwd \
    -e RABBITMQ_CLUSTER_NODES='rabbit@rabbit-1 rabbit@rabbit-2 rabbit@rabbit-3' \
    -e RABBITMQ_CLUSTER_PARTITION_HANDLING=autoheal \
    -e RABBITMQ_CLUSTER_DISC_RAM=disc \
    -e RABBITMQ_NODENAME=rabbit@rabbit-3 \
    -e RABBITMQ_ERLANG_COOKIE=a-little-secret \
    -e RABBITMQ_FIREHOSE_QUEUENAME=trace \
    -e RABBITMQ_FIREHOSE_ROUTINGKEY=publish.# \
    -e RABBITMQ_HIPE_COMPILE=true \
    kuznero/rabbitmq:3.6.10-mancluster

With this setup nodes are reliably reconnecting to cluster after outages.

Testing

For testing purposes there are start.sh and stop.sh scripts included under ./scripts folder that is possible to use for running small 3-noded RabbitMQ cluster on a local (possible single noded) docker swarm cluster.