all posts tagged SysAdmin

by on March 21, 2013

GlusterFS in AWS

Amazon Web Services provides an highly available hosting for our applications but are they prepared to run on more than one server?

When you design a new application, you can follow best practices’ guides on AWS but if the application is inherited, it requires many modifications or to work with a POSIX shared storage as if it’s local.

That’s where GlusterFS enters the game, beside adding flexibility to storage with horizontal growth opportunities in distributed mode, it has a replicated mode, which lets you replicate a volume (or a single folder in a file system) across multiple servers.


Preliminary considerations

Before realizing a proof of concept with two servers, in different availability zones, replicating an EBS volume with an ext4 filesystem, we will list the cases where GlusterFS should not be used:

  • Sequential files written simultaneously from multiple servers such as logs. The locking system can lead to serious problems if you store logs within GlusterFS. The ideal solution it’s to store them locally then use S3 to archive them.  If necessary we can consolidate multiple server logs before or after storing them in S3.
  • Continuously changing files, eg PHP session files or cache. In this kind of files performance it’s relevant, if we want to unify sessions we must use a database (RDS, DynamoDB, SimpleDB) or memcached (ElastiCache), we can not burden the application with GlusterFS’ replication layer. In case we cannot modify the application to store session externally, we can use a local folder or shared memory (shm) and enable sticky sessions on ELB. Ideally, caching has to be done using memcached or in its absence, a local folder in memory (tmpfs), so that it’s transparent to the application.
  • Complex applications in PHP without cache, it’s advisable to store your code in repositories, either by having version control and deploy across multiple servers easily. If it’s inevitable to place code in GlusterFS, we need to use a cache like APC or XCache so that we’ll avoid to perform stat() for each file include which would slow down the application.


Amazon Linux AMI includes GlusterFS packages in the main repository so there’s no need to add external repositories. If yum complains about the GlusterFS packages just enable the EPEL repo.We can install the packages and start services in each of the nodes:

yum install fuse fuse-libs glusterfs-server glusterfs-fuse nfs-utils
chkconfig glusterd on
chkconfig glusterfsd on
chkconfig rpcbind on
service glusterd start
service rpcbind start

Fuse and nfs packages are needed to mount GlusterFS volumes, we recommend using NFS mode for compatibility.


We prepare an ext4 partition, though we might use any compatible POSIX filesystem; in this case the partition points to an EBS volume, we could also use ephemeral storage, bearing in mind that we need to keep at least one instance running to keep data consistent. These commands must be run on each node:

mkfs.ext4 -m 1 -L gluster /dev/sdg
echo -e "LABEL=gluster\t/export\text4\tnoatime\t0\t2" >> /etc/fstab
mkdir /export
mount /export

Now select one of the nodes to execute the commands to create the GlusterFS volume. Instances should have full access between them, no firewalls o security group limitations:

gluster peer probe $SERVER2
gluster volume create webs replica 2 transport tcp $SERVER1:/export $SERVER2:/export
gluster volume start webs
gluster volume set webs auth.allow '*'
gluster volume set webs performance.cache-size 256MB

We must replace $SERVER1 and $SERVER2 for the instances’ DNS names, being 1 the local instance and 2 the remote. We can use either the public or the internal DNS since Amazon returns the internal IP in any case. If we do not work with VPC then we don’t have fixed internal IPs, so we’ll have to use a dynamic DNS or assign Elastic IPs to instances.

Two non-standard options were defined, the first is auth.allow which allow access to all the IPS, as we will restrict access by Security Groups, and the second is performance.cache-size that allows us to allocate part of the cache memory to improve performance.

Volume it’s already created, now we have to select a mount point or create it if it doesn’t exist, mount the partition and modify the fstab if we want it automatically mounted on reboot. What must be done on both nodes:

mkdir -p /home/webs
mount -t nfs -o _netdev,noatime,vers=3 localhost:/webs /home/webs
# If we want to mount it automatically, we need to modify /etc/fstab
echo -e "localhost:/webs\t/home/webs\tnfs\t_netdev,noatime,vers=3\t0\t0" >> /etc/fstab
chkconfig netfs on

Now we can store content in /home/webs, it will be automatically replicated to the other instance. We can force an update by running a simple ls -l on the folder to be updated, since stat() forces GlusterFS to check the health of the reply.