all posts tagged High-Availability

by on June 30, 2014

Windows (CIFS) fileshares using GlusterFS and CTDB for Highly available data

This tutorial will walk through the setup and configuration of GlusterFS and CTDB to provide highly available file storage via CIFS. GlusterFS is used to replicate data between multiple servers. CTDB provides highly available CIFS/Samba functionality.


2 servers (virtual or physical) with RHEL 6 or derivative (CentOS, Scientific Linux). When installing create a partition for root of around 16Gb, but leave a large amount of disk space available for the shared data (you can add this in the installer but ensure the partition type is XFS and that the mountpoint is /gluster/bricks/data1) Once you have an installed system, ensure networking is configured and running, in this example the two servers will be:

server1 = storenode1 –

server2 = storenode2 –

lets add host entries (unless you have DNS available, in which case add an entry for both hosts in there.

echo " storenode1" >> /etc/hosts

echo " storenode2" >> /etc/hosts

Next make sure both of your systems are completely up to date:

yum -y update

Reboot if there are any kernel updates.

Filesystem layout

Now we have 2 fully updated working installs its time to start laying out the filesystem, in this instance we will have a partition dedicated to the underlying gluster volume.

If you didn’t add a partition for /gluster/bricks/data1 during the install do this now:

fdisk a partition on the disk (/dev/sda3?)

fdisk /dev/sda mkfs.xfs /dev/sda3

If mkfs.xfs isn’t installed, yum install xfsprogs will add it to your system.If you are running Red Hat you will need to subscribe to the Scalable filesystem channel to get this package.

The directory where this partition will be mounted:

mkdir /gluster/bricks/data1 -p

mount /dev/sda3 /gluster/bricks/data1

If the mount command worked correctly, lets add it to our fstab so it mounts at boot time.

echo "/dev/sda3 /gluster/bricks/data1 xfs default 0 0" >> /etc/fstab

You need to repeat the above steps to partition and mount the volume on server 2.

Introducing Gluster to the equation

Now we have a couple of working filesystems we are ready to bring gluster into the mix, we are going to use the /gluster/bricks/data1 as a location to store our brick for our Gluster volume. A Gluster volume is made up of many bricks, these bricks are essentially a directory on one or more servers that are grouped together to provide a storage array similar to RAID.

In our configuration we will have 2 servers, each with a directory used as a brick to create a replicated gluster volume. Also, for simplicity I have disabled both SELINUX and iptables for this build, however it’s fairly straight forward to get both working correctly with gluster, I may revisit at some point to add this configuration but for now I’m taking the stance that these servers are tucked away safely inside your network behind at least one firewall.

Lets install gluster, on both servers run the following:

cd /etc/yum.repos.d/


yum install glusterfs-server -y

chkconfig glusterd on

service glusterd start

Woohoo, we have Gluster up and running, oh wait it’s not doing anything…

Lets get both servers talking to each other, on the first server run:

gluster peer probe storenode2


We now need a directory which we will use for the brick in our Gluster volume, run this command on both servers:

mkdir -p /gluster/bricks/data1/brick1

Everything should be now prepared for the volume to be created, run the following command on storenode1

gluster vol create data1 replica 2 storenode1:/gluster/bricks/data1/brick1 storenode2:/gluster/bricks/data1/brick1


This will create a Gluster volume named data1 with 2 replicas which are then specified.

If this command returns ok we should be good to start the volume:

gluster vol start data1


We can check the status of the volume:

gluster vol info data1


Looks good!


In order to start using the volume we have just created it needs to be mounted on our systems, lets create a directory on both servers where we will mount the volume:

mkdir /data/data1 -p

We need to ensure the glusterfs client tools are installed (it should have been installed during the initial gluster install, but it’s worth checking)

yum -y install glusterfs-fuse Now lets mount the volume:

mount -t glusterfs storenode1:data1 /data/data1

If that goes well we can add the mount statement to fstab:

echo "storenode1:data /data/data1 glusterfs defaults 0 0" >> /etc/fstab

Then repeat on storenode2:

mount -t glusterfs storenode2:data1 /data/data1

echo "storenode2:data /data/data1 glusterfs defaults 0 0" >> /etc/fstab

We now have a persistent mount for our gluster volume, each server mounts its own presentation of the gluster volume. Notice the mount paths are very similar to NFS, however they are slightly different, the format is hostname:volumename

We can test the Gluster side of things now by creating a file on one server and seeing it exists on the other

[root@storenode1 ~]# echo "hello world" >> /data/data1/test

[root@storenode2 ~]# cat /data/data1/test


If you see the text “hello world” in the output then the Gluster setup is complete!

CTDB and Samba

All the above is good and well, but we need to present this storage to an end user don’t we?

The traditional way to present storage as a file share is using samba, however as we are using multiple servers we want to try and make use of them. This method will use traditional samba config files but using an extra overlay, CTDB. CTDB will present storage via cifs, but also create a VIP (Virtual IP) which “hovers” over the servers configured within.

Lets get the packages installed first:

yum -y install ctdb samba samba-common samba-winbind-clients (Resilient Storage subscription needed for RHEL)

On both nodes backup the default config, just in case:

mv /etc/sysconfig/ctdb{,.old}

CTDB requires a shared area in which to create a lock, and we also need a directory to share

On either node:

mkdir /data/data1/lock

mkdir /data/data1/share

In your favourite editor open /data/data1/lock/ctdb and add the following(In my case Vim):

vi /data/data1/lock/ctdb

#CIFS only
#CIFS only

The file we have just created will actually replace the config we backed up earlier but that will exist as a symlink (saves multiplication of config files which are the same) on both hosts:

ln -s /data/data1/lock/ctdb /etc/sysconfig/ctdb

Next we need to ensure the samba service won’t start on boot, but in turn the CTDB service will, on both nodes:

service smb stop

chkconfig smb off

chkconfig ctdb on

The /etc/ctdb/public_addresses file will contain a list of IP addresses which will be used as VIP’s, you can use as many as you like here, some configurations use multiple combinations of VIPs with round-robin DNS for true load-balanced scenarios, for our simple config we will just use the next IP. Note we are creating the file on our shared storage again to ensure that we have the same config on both boxes and will be later linked:

vi /data/data1/lock/public_addresses eth0

Now we need to create the /etc/ctdb/nodes which contains the IP addresses of all servers which will present the storage, again this will be a shared file and linked:

vi /data/data1/lock/nodes

Lets link those two files, on both nodes:

ln -s /data/data1/lock/nodes /etc/ctdb/nodes

ln -s /data/data1/lock/public_addresses /etc/ctdb/public_addresses

The only thing we have left to do now is to modify the samba config file, there are 2 sections we are interested in. Firstly the general config section where we need to enable clustering and point it to the lock directory. Samba (or CTDB in this case) has some strange side effects if shared storage is used, however, it could be used to edit then copy in to place:

On storage node 1:

cp /etc/samba/smb.conf /data/data1/lock/smb.conf

vi /data/data1/lock/smb.conf

And add in the general section near the top:

clustering = yes

idmap backend = tdb2

private dir = /data/data1/lock

The second component is to create the share itself:

[share] comment = Gluster and CTDB based share
path = /data/data1/share
read only = no
guest ok = yes
valid users = jon

Once we are happy with the edit, the file can be copied to the correct location, on both hosts:

cp /data/data1/lock/smb.conf /etc/samba/

We need to ensure the user jon exists on both servers:

useradd jon

smbpasswd -a jon

and type a password.

Configuration is now done, all that is left to do is start the service, on both nodes:

service ctdb start

If the service starts successfully then after a short while the share becomes available, monitor its status using:

ctdb status


Once both nodes get OK, we’re good to go. The share will now be accessible from a Windows PC (or anything that can access SMB/CIFS) using \\\share


If either storage server becomes unavailable the share will still exist.


We now have a resilient, highly available CIFS file server.

The post Windows (CIFS) fileshares using GlusterFS and CTDB for Highly available data appeared first on Jon Archer.

flattr this!

by on March 21, 2013

GlusterFS in AWS

Amazon Web Services provides an highly available hosting for our applications but are they prepared to run on more than one server?

When you design a new application, you can follow best practices’ guides on AWS but if the application is inherited, it requires many modifications or to work with a POSIX shared storage as if it’s local.

That’s where GlusterFS enters the game, beside adding flexibility to storage with horizontal growth opportunities in distributed mode, it has a replicated mode, which lets you replicate a volume (or a single folder in a file system) across multiple servers.


Preliminary considerations

Before realizing a proof of concept with two servers, in different availability zones, replicating an EBS volume with an ext4 filesystem, we will list the cases where GlusterFS should not be used:

  • Sequential files written simultaneously from multiple servers such as logs. The locking system can lead to serious problems if you store logs within GlusterFS. The ideal solution it’s to store them locally then use S3 to archive them.  If necessary we can consolidate multiple server logs before or after storing them in S3.
  • Continuously changing files, eg PHP session files or cache. In this kind of files performance it’s relevant, if we want to unify sessions we must use a database (RDS, DynamoDB, SimpleDB) or memcached (ElastiCache), we can not burden the application with GlusterFS’ replication layer. In case we cannot modify the application to store session externally, we can use a local folder or shared memory (shm) and enable sticky sessions on ELB. Ideally, caching has to be done using memcached or in its absence, a local folder in memory (tmpfs), so that it’s transparent to the application.
  • Complex applications in PHP without cache, it’s advisable to store your code in repositories, either by having version control and deploy across multiple servers easily. If it’s inevitable to place code in GlusterFS, we need to use a cache like APC or XCache so that we’ll avoid to perform stat() for each file include which would slow down the application.


Amazon Linux AMI includes GlusterFS packages in the main repository so there’s no need to add external repositories. If yum complains about the GlusterFS packages just enable the EPEL repo.We can install the packages and start services in each of the nodes:

yum install fuse fuse-libs glusterfs-server glusterfs-fuse nfs-utils
chkconfig glusterd on
chkconfig glusterfsd on
chkconfig rpcbind on
service glusterd start
service rpcbind start

Fuse and nfs packages are needed to mount GlusterFS volumes, we recommend using NFS mode for compatibility.


We prepare an ext4 partition, though we might use any compatible POSIX filesystem; in this case the partition points to an EBS volume, we could also use ephemeral storage, bearing in mind that we need to keep at least one instance running to keep data consistent. These commands must be run on each node:

mkfs.ext4 -m 1 -L gluster /dev/sdg
echo -e "LABEL=gluster\t/export\text4\tnoatime\t0\t2" >> /etc/fstab
mkdir /export
mount /export

Now select one of the nodes to execute the commands to create the GlusterFS volume. Instances should have full access between them, no firewalls o security group limitations:

gluster peer probe $SERVER2
gluster volume create webs replica 2 transport tcp $SERVER1:/export $SERVER2:/export
gluster volume start webs
gluster volume set webs auth.allow '*'
gluster volume set webs performance.cache-size 256MB

We must replace $SERVER1 and $SERVER2 for the instances’ DNS names, being 1 the local instance and 2 the remote. We can use either the public or the internal DNS since Amazon returns the internal IP in any case. If we do not work with VPC then we don’t have fixed internal IPs, so we’ll have to use a dynamic DNS or assign Elastic IPs to instances.

Two non-standard options were defined, the first is auth.allow which allow access to all the IPS, as we will restrict access by Security Groups, and the second is performance.cache-size that allows us to allocate part of the cache memory to improve performance.

Volume it’s already created, now we have to select a mount point or create it if it doesn’t exist, mount the partition and modify the fstab if we want it automatically mounted on reboot. What must be done on both nodes:

mkdir -p /home/webs
mount -t nfs -o _netdev,noatime,vers=3 localhost:/webs /home/webs
# If we want to mount it automatically, we need to modify /etc/fstab
echo -e "localhost:/webs\t/home/webs\tnfs\t_netdev,noatime,vers=3\t0\t0" >> /etc/fstab
chkconfig netfs on

Now we can store content in /home/webs, it will be automatically replicated to the other instance. We can force an update by running a simple ls -l on the folder to be updated, since stat() forces GlusterFS to check the health of the reply.


by on January 6, 2013

Distributed Replicated Storage Across Four Storage Nodes With GlusterFS 3.2.x On CentOS 6.3

Distributed Replicated Storage Across Four Storage Nodes With GlusterFS 3.2.x On CentOS 6.3

This tutorial shows how to combine four single storage servers (running CentOS 6.3) to a distributed replicated storage with GlusterFS. Nodes 1 and 2 (replication1) as well as 3 and 4 (replication2) will mirror each other, and replication1 and replication2 will be combined to one larger storage server (distribution). Basically, this is RAID10 over network.

If you lose one server from replication1 and one from replication2,
the distributed volume continues to work. The client system (CentOS 6.3
as well) will be able to access the storage as if it was a local

GlusterFS is a clustered file-system capable of scaling to several
peta-bytes. It aggregates various storage bricks over Infiniband RDMA or
TCP/IP interconnect into one large parallel network file system.
Storage bricks can be made of any commodity hardware such as x86_64
servers with SATA-II RAID and Infiniband HBA.