Using Docker Swarm to provide Farmerbot redundancy

This post will serve as a guide for anyone wishing to run farmerbot on multiple nodes as a way of providing redundancy. We’ll be installing GlusterFS as a shared storage system, then creating a swarm before spinning up farmerbot across the cluster.

My setup consists of three rpi4’s each running a high endurance sd card. This is not optimal as GlusterFS prefers to have its shared volumes separate from root volume but we’ll be working around that. Your setup may be different but the steps will be mostly the same.

Before we begin, the assumption is made that you have a working farmerbot configured using Scott’s guide, each node has the hostname ‘node1’, ‘node2’ and ‘node3’ and IP’s of 192.168.50.21, 22 and 23 respectively. We will be working as root, so sudo su.

  • First Step
    Ensure the system is up to date with apt update && apt upgrade -y

  • Install GlusterFS on all nodes
    apt-get -y install glusterfs-server
    systemctl enable --now glusterd
    systemctl status glusterd

  • Edit hosts file on all nodes
    nano /etc/hosts
    add
    192.168.50.21 node1 #hostname
    192.168.50.22 node2 #hostname
    192.168.50.23 node3 #hostname

  • Configure firewall
    Install iptables-persistent with:
    apt install iptables-persistent
    Allow firewall route between nodes with:
    iptables -I INPUT -s 192.168.50.21 -j ACCEPT
    iptables -I INPUT -s 192.168.50.22 -j ACCEPT
    iptables -I INPUT -s 192.168.50.23 -j ACCEPT
    Save firewall rules with:
    iptables-save > /etc/iptables/rules.v4

  • Create a directory for GlusterFS to store it’s local volume (called a ‘brick’)
    On each node run:
    mkdir /gluster

  • Add peers
    Pick a node to use as the master. We’ll use node1 and run:
    gluster peer probe node2
    gluster peer probe node3
    Check peer status with:
    gluster peer status

  • Create a replicated volume (brick) this should be run on the master
    gluster volume create volume1 replica 3 node1:/gluster node2:/gluster node3:/gluster force
    We need to use force because we’re creating a volume on the root partition. Check volume with:
    gluster volume list
    Then:
    gluster volume info
    You should see three bricks

  • Start volume
    gluster volume start volume1

  • Ensure Gluster has r/w permissions
    chown -R gluster:gluster /var/lib/glusterd/vols/volume1
    chmod -R 600 /var/lib/glusterd/vols/volume1/*

  • Mount Gluster volume on each node and add to fstab
    This will mount the shared volume at /mnt - you’re free to choose another location if you’d prefer.
    On node1:
    mount -t glusterfs node1:/volume1 /mnt
    echo "node1:/volume1 /mnt glusterfs defaults,_netdev 0 0" | tee -a /etc/fstab
    On node2:
    mount -t glusterfs node2:/volume1 /mnt
    echo "node2:/volume1 /mnt glusterfs defaults,_netdev 0 0" | tee -a /etc/fstab
    On node3:
    mount -t glusterfs node3:/volume1 /mnt
    echo "node3:/volume1 /mnt glusterfs defaults,_netdev 0 0" | tee -a /etc/fstab

The next step is required to ensure systemd-mount will only mount the shared volume once GlusterFS has started. This step is needed because Gluster takes longer to start than systemd-mount and we want to be sure fstab is read after Gluster service is up and running

  • Ensure glusterfs-server is started before systemd-mount reads /etc/fstab
    mkdir /etc/systemd/system/mnt.mount.d/
    Then create override file:
    nano /etc/systemd/system/mnt.mount.d/override.conf
    Containing:
    [Unit]
    After=glusterfs-server.service
    Wants=glusterfs-server.service

  • Create replicated farmerbot directories and files
    These should be run on master node:
    mkdir /mnt/farmerbot && touch /mnt/farmerbot/docker-stack.yaml
    Then:
    mkdir /mnt/farmerbot/config && touch /mnt/farmerbot/config/config.md

Check for successful replication by navigating to /mnt/farmerbot on node2 and node3, then running ls

At this stage, GlusterFS is set up and working correctly. Now we move onto Docker Swarm setup but before we do this, navigate to the location of your original, working farmerbot docker-compose.yaml file and run:
docker compose config
This will output the yaml with appropriate environment variables pulled in. Copy and paste this entire output to a text editor (notepad++ or similar) - we need to modify this file as docker-compose and docker swarm do things a little differently. Save the file as docker-stack.yaml. Let’s continue…

  • Ensure each node has Docker installed
    wget -O docker.sh get.docker.com
    sudo sh docker.sh

  • Create the cluster
    Select a node to fulfil manager role and run:
    docker swarm init --advertise-addr [ip address of manager node]

  • Join worker nodes to the swarm
    Copy/paste the output from the above command into nodes 2 and 3

  • Check swarm status
    Run the following on your manager node:
    docker node ls

The cluster is now built. Remarkably easy, right? From this point on we’re only going to be interacting with the manager node. The next step is optional but allows for easy monitoring of your cluster:

  • Deploy Portainer
    curl -L https://downloads.portainer.io/ce2-18/portainer-agent-stack.yml -o portainer-agent-stack.yml
    Then:
    docker stack deploy -c portainer-agent-stack.yml portainer
    You’ll find the Portainer web UI at [manager IP]:9000 (192.168.50.21:9000 in this example)

Now onto modifying the docker-stack.yaml we saved in the text editor earlier. There are a number of changes we need to make and these are listed below:

  • replace: name: [farmerbot name] with version ‘3’
  • remove or #comment: depends_on blocks from each container
  • remove or #comment: restart: always from each container
  • add: deploy: restart_policy: condition: any block at the end of each container
    deploy:
    restart_policy:
    condition: any
  • replace: farmerbot: volumes: block with one line: - /gluster/farmerbot/config:/farmerbot (I feel this should be pointed towards the mount point rather than brick location but when I specify this line as - /mnt/farmerbot/config:/farmerbot the service refuses to load correctly)
  • replace: grid3_client: ports: block with one line: - “3000:3000”
  • replace: redis: ports: block with one line: - “6379:6379”
  • replace: redis: volumes: block with one line: - db:/data
  • replace: rmbpeer: image: with scottyeager/rmb-peer:1.0.3 (this image is compiled for 64 bit arm)
  • remove or #comment: networks: name: line
  • remove or #comment: volumes: db: name: line

If you’re running a 64 bit pi you will also want to ensure the image under rmbpeer is specified as scottyeager/rmb-peer:1.0.3 because docker swarm does not play nice with QEMU, it needs a native 64 bit rmbpeer image to function correctly.

Once these changes have been made, copy and paste them into the /mnt/farmerbot/docker-stack.yaml file we created earlier. Next we’ll need to populate the /config/config.md file with the appropriate information, or copy and paste the file from your original farmerbot instance.

Once this is done we should be ready to fire up the swarm. On the manager node, navigate to /mnt/farmerbot and run:
docker stack deploy -c docker-stack.yaml farmerbot
You’ll see the images being pulled down. You can check the status of each container by running:
docker service ls

You will see one failed farmerbot container; this is due to dependent services not being ready when farmerbot is initially started and can be safely ignored - a second farmerbot container will be created soon after and should run without issue.

The swarm manager will determine where to put each of the four services. If one node fails, the manager will spin up any containers being run on the failed node somewhere else in the swarm. If you need to take a node offline the command node update --availability drain [nodeID] will tell manager to move any running services to another node and will prevent new services from starting.

There we have it - farmerbot running on a cluster.

2 Likes

More than amazing work, @TheCaptain!!!

So if I understand correctly, you are running those 3 different servers on the same LAN, locally?

If I understand well, I think then that it would be possible to have the Docker Swarm you made on a Wireguard VPN with 3 nodes in 3 different locations, as shown in this guide: https://www2.manual.grid.tf/terraform/advanced/terraform_nextcloud_redundant.html#create-a-two-servers-wireguard-vpn-with-terraform

The only difference would be setting up the VPN with 3 different 3nodes, and then replacing the 192.168.50.21-22-23 ip addresses with 10.1.3.2, 10.1.4.2, 10.1.5.2 (generated by Terraform during deployment).

What do you guys think?

1 Like

Yes, my pi’s are all on the same subnet.

I have no experience with this but from my understanding docker swarm is pretty latency sensitive, so if the manager doesn’t receive a heartbeat within its expected timeframe it’ll mark the node as unreachable and spin up new containers elsewhere. There does appear to be a way to mitigate this by extending the heartbeat time - docker swarm update --dispatcher-heartbeat 60s for example. Default heartbeat looks to be 5s.

There are definitely people who’ve sucessfully run a cluster across the internet which proves it’s doable.

1 Like

Love to see it. Nice work :+1: