Install HA Swarm with GlusterFS

Setup Highly Available applications with Docker Swarm and Gluster¶

../../../img/ha-swarm-diagram.png — Docker Swarm cluster with shared glusterFS replicated volume for HA

A good design pattern for highly available applications is to deploy the application as a container on a Docker Swarm cluster with persistent storage provided by GlusterFS. GlusterFS is a fast shared filesystem that can keep the container volume in sync between multiple VMs running the Docker Swarm cluster. This pattern ensures high availability for your containerised application. In the event a VM dies, Docker Swarm will spin up the container on another VM. GlusterFS will ensure the container has access to the same data when it comes up.

In this setup, we’ll look at setting up GlusterFS on 3 VMs and create a replicated volume with a replication factor of 3. We install Docker Swarm over these three VMs. Goal is to use GlusterFS to provide persistent storage to your application container, and docker swarm for high availability.

1. Plan and setup the infrastructure¶

For the setup, first we’ll need three Ubuntu Gluster VMs, each with 2 disks attached. We’ll use the first disk to run the OS, and the second as the GlusterFS replicated volume. Create three VMs with two disks. In my case, my VMs had the root volume on /dev/vda and the second disk on /dev/vdb. Create three VMs and let’s assume the private IPs of these VMs are 10.42.1.50, 10.42.1.51, 10.42.1.52, and their hostnames are swnode1, swnode2, swnode3.

Note

All commands are being executed as root user

nimda@swnode1:~$ sudo lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom
vda    253:0    0   64G  0 disk 
├─vda1 253:1    0    1G  0 part /boot/efi
└─vda2 253:2    0 62.9G  0 part /
vdb    253:16   0   64G  0 disk

Update the /etc/hosts files on each VM to reflect the private IPs of each VM. This is important for GlusterFS, and you may encounter bugs or issues if you give private IPs directly to Gluster volumes. After editing the files should look like:

nimda@swnode1:~$ sudo cat /etc/hosts
127.0.0.1 localhost 
10.42.1.50 swnode1
10.42.1.51 swnode2
10.42.1.52 swnode3

nimda@swnode2:~$ sudo cat /etc/hosts
127.0.0.1 localhost
10.42.1.50 swnode1
10.42.1.51 swnode2
10.42.1.52 swnode3

nimda@swnode3:~$ sudo cat /etc/hosts
127.0.0.1 localhost
10.42.1.50 swnode1
10.42.1.51 swnode2
10.42.1.52 swnode3

Format the disks with xfs filesystem on each VM in case you haven’t already. You can also use ext4 if you prefer.

sudo mkfs.xfs /dev/vdc

2. Create directories for GlusterFS storage¶

Setup the glusterFS directories where the gluster “bricks” will reside. Better to name them differently so it’s easy to identify on which node the replicated volumes reside. Also add an entry to your /etc/fstab file on each VM so that our brick gets mounted when the operating system boots or restarts.

nimda@swnode1:~$ sudo mkdir -p /node1/bricks/1
nimda@swnode1:~$ sudo echo '/dev/vdb /node1/bricks/1 xfs defaults 0 0' >> /etc/fstab
nimda@swnode1:~$ sudo mount -a
nimda@swnode1:~$ sudo mkdir /node1/bricks/1/brick

nimda@swnode2:~$ sudo mkdir -p /node1/bricks/2
nimda@swnode2:~$ sudo echo '/dev/vdb /node2/bricks/2 xfs defaults 0 0' >> /etc/fstab
nimda@swnode2:~$ sudo mount -a
nimda@swnode2:~$ sudo mkdir /node2/bricks/2/brick

nimda@swnode3:~$ sudo mkdir -p /node1/bricks/3
nimda@swnode3:~$ sudo echo '/dev/vdb /node3/bricks/3 xfs defaults 0 0' >> /etc/fstab
nimda@swnode3:~$ sudo mount -a
nimda@swnode3:~$ sudo mkdir /node3/bricks/3/brick

3. Install GlusterFS¶

Install GlusterFS on all VMs by executing following commands on each VM:

sudo apt-get -y update && apt-get -y upgrade
sudo apt-get install -y software-properties-common
sudo add-apt-repository ppa:gluster/glusterfs-11 && apt-get update # Latest glusterFS version
sudo apt-get install -y glusterfs-server
sudo systemctl enable glusterd # automatically start glusterfs on boot
sudo systemctl start glusterd # start glusterfs right now
sudo systemctl status glusterd # Should show status active

4. Peer with other Gluster VMs¶

Now peer with other nodes from gluster1:

nimda@swnode1:~$ gluster peer probe swnode2
peer probe: success.
nimda@swnode1:~$ gluster peer probe swnode3
peer probe: success.

nimda@swnode1:~$ sudo gluster peer status
Number of Peers: 2

Hostname: swnode2
Uuid: 6f3d9125-c4ab-4f5a-9ad2-f7bcea684fa5
State: Peer in Cluster (Connected)

Hostname: swnode3
Uuid: f0022060-3378-4848-ba36-f3d5a23e3710
State: Peer in Cluster (Connected)

5. Setup the Gluster “replicated volume”¶

GlusterFS has multiple volume types. For our HA architecture, we want to setup a “replicated” volume that stores the files created on each of the 3 VMs and hence the file is available to any app or container running on these VMs. Create the replicated volume named “gfs” with 3 replicas:

nimda@swnode1:~$ sudo gluster volume create gfs \
replica 3 \
swnode1:/node1/bricks/1/brick \
swnode2:/node2/bricks/2/brick \
swnode3:/node3/bricks/3/brick
volume create: gfs: success: please start the volume to access data

nimda@swnode1:~$ sudo gluster volume start gfs
nimda@swnode1:~$ sudo gluster volume status gfs
Status of volume: gfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick swnode1:/node1/bricks/1/brick         59641     0          Y       4839 
Brick swnode2:/node2/bricks/2/brick         53330     0          Y       4468 
Brick swnode3:/node3/bricks/3/brick         57670     0          Y       4238 
Self-heal Daemon on localhost               N/A       N/A        Y       4871 
Self-heal Daemon on swnode3                 N/A       N/A        Y       4270 
Self-heal Daemon on swnode2                 N/A       N/A        Y       4500 

Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks

nimda@swnode1:~$ sudo gluster volume info gfs

Volume Name: gfs
Type: Replicate
Volume ID: 085ccf18-7a7e-4224-85df-da89d40bfeb4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: swnode1:/node1/bricks/1/brick
Brick2: swnode2:/node2/bricks/2/brick
Brick3: swnode3:/node3/bricks/3/brick
Options Reconfigured:
auth.allow: 10.42.1.50,10.42.1.51,10.42.1.52
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

6. Setup security and authentication for this volume¶

GlusterFS will allow any clients to connect to volumes by default. However, you will need to authorize the three infra VMs running GlusterFS to connect to the GlusterFS Volumes on each node. You can do it by authorizing the private IPs of each VM to connect to the volume. This will allow replication to happen. Execute:

nimda@swnode1:~$ sudo gluster volume set gfs auth.allow 10.42.1.50,10.42.1.51,10.42.1.52

7. Mount the glusterFS volume where applications can access the files¶

We’ll mount the volume onto /mnt on each VM, and also append it to our /etc/fstab file so that it mounts on boot:

nimda@swnode1:~$ sudo echo 'localhost:/gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
nimda@swnode1:~$ sudo mount.glusterfs localhost:/gfs /mnt

nimda@swnode2:~$ sudo echo 'localhost:/gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
nimda@swnode2:~$ sudo mount.glusterfs localhost:/gfs /mnt

nimda@swnode3:~$ sudo echo 'localhost:/gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
nimda@swnode3:~$ sudo mount.glusterfs localhost:/gfs /mnt

8. Verify¶

Verify mounted glusterfs volume:

nimda@swnode1:~$ df -Th
Filesystem     Type            Size  Used Avail Use% Mounted on
tmpfs          tmpfs           794M  1.3M  793M   1% /run
efivarfs       efivarfs        256K   80K  172K  32% /sys/firmware/efi/efivars
/dev/vda2      ext4             62G  8.7G   50G  15% /
tmpfs          tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs          tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda1      vfat            1.1G  6.2M  1.1G   1% /boot/efi
tmpfs          tmpfs           794M   12K  794M   1% /run/user/1000
/dev/vdb       xfs              64G  1.4G   63G   3% /node1/bricks/1
localhost:/gfs fuse.glusterfs   64G  2.0G   62G   4% /mnt

The total space available on the volume comes up as 64G even though we have 3 disks of 64G each connected to GlusterFS. This is due to our replication factor of 3. Total volume size is 192G, but with a replication factor or 3 for each file only 64G is available to us.

Test GlusterFS replication:

nimda@swnode1:~$ sudo echo "Hello World!" | sudo tee /mnt/test.txt

nimda@swnode2:~$ sudo cat /mnt/test.txt
Hello World!

nimda@swnode3:~$ sudo cat /mnt/test.txt
Hello World!

Part 2: Verify High Availbility Swarm Cluster¶

We’ll use our gluster replicated volume to achieve High Availability for our stateful containerized application. We’ll test with Wordpress.

All commands executed as root.

We’ll use docker stack to setup a single container Wordpress backed by a single container of MySQL, and then test if this setup is resilient to VM failure.

1. Create replicated directories for wordpress and mysql in glusterFS¶

nimda@swnode1:~$ sudo mkdir /mnt/wp-content

nimda@swnode1:~$ sudo mkdir /mnt/mysql

2. Create the wordpress-stack.yml file¶

This stack file exposes wordpress on port 8080 on all swarm nodes, even the swarm-manager node. It mounts the directories created for wp-content and mysql as volumes on the containers.

nimda@swnode1:~$ sudo cat wordpress-stack.yml

# wordpress-stack.yml
version: '3.1'
services:
  wordpress:
    image: wordpress
    restart: always
    ports:
      - 8080:80
    environment:
      WORDPRESS_DB_HOST: db
      WORDPRESS_DB_USER: exampleuser
      WORDPRESS_DB_PASSWORD: examplepass
      WORDPRESS_DB_NAME: exampledb
    volumes:
      - "/mnt/wp-content:/var/www/html/wp-content"
    deploy:
      placement:
        constraints: [node.role == worker]
  db:
    image: mysql:5.7
    restart: always
    environment:
      MYSQL_DATABASE: exampledb
      MYSQL_USER: exampleuser
      MYSQL_PASSWORD: examplepass
      MYSQL_RANDOM_ROOT_PASSWORD: '1'
    volumes:
      - "/mnt/mysql:/var/lib/mysql"
    deploy:
      placement:
        constraints: [node.role == worker]

3. Use docker stack to deploy Wordpress and MySQL¶

nimda@swnode1:~$ sudo docker stack deploy -c wordpress-stack.yml wordpress
Ignoring unsupported options: restart
Creating network wordpress_default
Creating service wordpress_db
Creating service wordpress_wordpress
swarm-manager:~# docker stack ps wordpress
ID                  NAME                    IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
x5vvrt6ohko2        wordpress_db.1          mysql:5.7           swnode2             Running             Running 5 minutes ago
idree9r7qlxb        wordpress_wordpress.1   wordpress:latest    swnode1             Running             Running 5 minutes ago

Check if Wordpress is up by entering http://<any-worker-external-ip>:8080/ in the browser.

Note: 10.42.1.52 was one of my gluster worker VM’s (swnode3) external IP Go through the install process, choose an admin username and password, and create your first post.

4. Test High Availability by shutting down a VM¶

Check on which VM the Wordpress and MySQL containers are running. We’ll shutdown each VM to understand whether HA is working properly. In my case, the Wordpress container was running on swnode1 and MySQL was running on swnode2.

nimda@swnode1:~$ sudo docker stack ps wordpress
ID                  NAME                    IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
x5vvrt6ohko2        wordpress_db.1          mysql:5.7           swnode2             Running             Running 24 minutes ago
idree9r7qlxb        wordpress_wordpress.1   wordpress:latest    swnode1             Running             Running 24 minutes ago
Shutdown swnode1 and check what happens. You’ll find that docker swarm starts a new container on a new worker VM. The website will continue to work, your data will still be stored, but you’ll have to login again as the session data is lost with the previous container.

nimda@swnode1:~$ sudo docker stack ps wordpress
ID                  NAME                    IMAGE               NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
u8s93kowj2mx        wordpress_wordpress.1   wordpress:latest    swnode3             Running             Running 3 seconds ago
x5vvrt6ohko2        wordpress_db.1          mysql:5.7           swnode2             Running             Running 28 minutes ago
idree9r7qlxb        wordpress_wordpress.1   wordpress:latest    swnode1             Shutdown            Running about a minute ago
Start the swnode1 VM again and let’s repeat the HA test with MySQL host swnode2. Shutdown swnode2 which was running the MySQL container. After shutdown, you’ll find docker swarm has scheduled MySQL on another worker VM.

nimda@swnode1:~$ sudo docker stack ps wordpress
ID                  NAME                    IMAGE               NODE                DESIRED STATE       CURRENT STATE              ERROR               PORTS
px90rs5q22ei        wordpress_db.1          mysql:5.7           swnode1             Running             Preparing 41 seconds ago
u8s93kowj2mx        wordpress_wordpress.1   wordpress:latest    swnode3             Running             Running 6 minutes ago
x5vvrt6ohko2        wordpress_db.1          mysql:5.7           swnode2             Shutdown            Running 50 seconds ago
idree9r7qlxb        wordpress_wordpress.1   wordpress:latest    swnode1             Shutdown            Shutdown 3 minutes ago

The website will continue to work without any data loss as the MySQL container would have found the replicated volume under the same path (/mnt/mysql).

Add the three worker VM IPs with port behind a Load Balancer (like HAProxy, AWS ELB etc.) and voilà, A Highly Available stateful deployment on Docker Swarm using GlusterFS.