Source Allies Logo

Sharing Our Passion for Technology

& Continuous Learning

<   Back to Blog

GlusterFS Replication for Clustering

I recently was searching for a way to simulate shared physical storage in a VPS environment for clustering purposes. In an enterprise data center we can expect some type of SAN available to provide shared physical storage. GFS is a simple solution in this case to create a shared file system that can be used as a resource in a cluster. GlusterFS allows us to provide this type of functionality to multiple nodes when we have no means of providing access to the same physical storage.

The gluster community site will be a great resource for anyone wanting to implement the file system and is located at http://www.gluster.org.

For the remainder of this post I will be referring to an environment consisting of two CentOS VPS nodes.

Preparing Ext3 File System for Sharing

Gluster will not share raw devices but instead will use an already mounted file system.  I will be assuming the use of a complete ext3 file system on the mount point /replicator. If you can't provide a unique storage device for this purpose you can just use a directory on the root file system for testing.

Installing GlusterFS Server and Client

The following commands need to be executed on each node to grab and install the necessary RPMs.
wget -r -l 1 http://ftp.gluster.com/pub/gluster/glusterfs/3.0/3.0.0/CentOS/
cd ftp.gluster.com/pub/gluster/glusterfs/3.0/3.0.0/CentOS/
rpm -Uvh glusterfs-*-3.0.0-1.x86_64.rpm

Execute the following on either node to generate the necessary configuration files in the current working directory.  This will create a client configuration along the lines of replicator-tcp.vol.  A server configuration file will be created for each node and begin with the appropriate node hostname.

glusterfs-volgen --name replicator --raid 1 node1:/replicator node2:/replicator

Move the client file to /etc/glusterfs/glusterfs.vol on each node.  Also move the appropriate server file to /etc/glusterfs/glusterfsd.vol for each node.

Mounting GlusterFS Volumes

The simplest way to configure mounting of the volumes is via /etc/fstab.  Place a line in fstab on each node.
/etc/glusterfs/glusterfs.vol /data glusterfs defaults 0 0

This will mount the shared volumes to /data.  Try writing a file to one node and watch it appear on the other!

cd /data
dd if=/dev/zero of=/data/test bs=1M count=32

High Availability Implications

At this point I am still vetting gluster's reliability as a HA solution. It will most definitely keep data intact during planned maintenance. If we properly stop the client/server on any node then changes can continue to occur on the other.  Also we can join a node to active shared storage and synchronization is automatic.

The real test is whether gluster will hold up in the not so routine situations. Some crude tests involving yanking network connectivity from a node that is replicating changes seems to cause some issues. For example, if I start the dd operation above on node1 and kill the connection to node2, one way or another, before it finishes then node1 still completes the operation fine.  When I reattach node2 even the active mount on /data seems to synchronize with node1 just fine.  Where some differences start to appear is in the /replicator directory on node2. It seems that this does get out of whack and neither client pays attention to this server any longer.

If gluster can hold up to software and hardware failures without data corruption it can certainly be used as shared storage for clustering. I'll continue to explore these options and report back later.