Linux scale out NFSv4 using NFS-Ganesha and GlusterFS — one step at a time

Gluster

2015-10-12

NFS-Ganesha 2.3 is rapidly winding down to release and it has a bunch of new things in it that make it fairly compelling. A lot of people are also starting to use Red Hat Gluster Storage with the NFS-Ganesha NFS server that is part of that package. Setting up a highly available NFS-Ganesha system using GlusterFS is not exactly trivial. This blog post will “eat the elephant” one bite at a time.

Some people might wonder why use NFS-Ganesha — a user space NFS server — when kernel NFS (knfs) already supports NFSv4? The answer is simple really. NFSv4 in the kernel doesn’t scale. It doesn’t scale out, and it’s a single point of failure. This blog post will show how to set up a resilient, highly available system with no single point of failure.

Crawl

Let’s start small and simple. We’ll set up a single NFS-Ganesha server on CentOS 7, serving a single disk volume.

Start by setting up a CentOS 7 machine. You may want to create a separate volume for the NFS export. We’ll leave this as an exercise for the reader. do not install any NFS.

1. Install EPEL, NFS-Ganesha and GlusterFS. Use the yum repos on download.gluster.org. Repo files are at
nfs-ganesha.repo and glusterfs-epel.repo. Copy them to /etc/yum.repos.d.
% yum -y install epel-release % yum -y install glusterfs-server glusterfs-fuse glusterfs-cli glusterfs-ganesha % yum -y install nfs-ganesha-xfs

2. Create a directory to mount the export volume, make a file system on the export volume, and finally mount it:
% mkdir -p /bricks/demo % mkfs.xfs /dev/sdb % mount /dev/sdb /bricks/demo

3. Gluster recommends not creating volumes on the root directory of the brick. If something goes wrong it’s easier rm -rf the directory than it is to try and clean it or remake the file system. Create a couple subdirs on the brick:
% mkdir /bricks/demo/vol % mkdir /bricks/demo/scratch

4. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /bricks/demo/scratch;

	# Pseudo Path (required for NFS v4)
	Pseudo = /bricks/demo/scratch;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = XFS;
	}
}

5. Start ganesha:
% systemctl start nfs-ganesha

6. Wait one minute for NFS grace to end, then mount the volume:

% mount localhost:/scratch /mnt

Walk

7. Now we’ll create a simple gluster volume and use NFS_Ganesha to serve it. We also need to disable gluster’s nfs (gnfs).

% gluster volume create simple $hostname:/bricks/demo/simple % gluster volume set simple nfs.disable on % gluster volume start simple

8. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /simple;

	# Pseudo Path (required for NFS v4)
	Pseudo = /simple;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = GLUSTER;
		Hostname = localhost;
		Volume = simple;
	}
}

9. Restart ganesha:

% systemctl stop nfs-ganesha % systemctl start nfs-ganesha

10. Wait one minute for NFS grace to end, then mount the volume:

% mount localhost:/simple /mnt

Copy a file to the NFS volume. You’ll see it on the gluster brick in /bricks/demo/simple.

Run

Now for the part you’ve been waiting for. For this we’ll start from scratch. This will be a four node cluster: node0, node1, node2, and node3.

1. Tear down anything left over from the above.

2. Ensure that all nodes are resolvable either in DNS or /etc/hosts:

node0% cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.16.3.130 node0 172.16.3.131 node1 172.16.3.132 node2 172.16.3.133 node3 172.16.3.140 node0v 172.16.3.141 node1v 172.16.3.142 node2v 172.16.3.143 node3v

3. Set up passwordless ssh among the four nodes. On node1 create a keypair and deploy it to all the nodes:

node0% ssh-keygen -f /var/lib/glusterd/nfs/secret.pem node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node0 node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node1 node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node2 node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node3 node0% scp /var/lib/glusterd/nfs/secret.* node1:/var/lib/glusterd/nfs/ node0% scp /var/lib/glusterd/nfs/secret.* node2:/var/lib/glusterd/nfs/ node0% scp /var/lib/glusterd/nfs/secret.* node3:/var/lib/glusterd/nfs/

You can confirm that it works with:
node0% ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/nfs/secret.pem root@node1

4. Start glusterd on all nodes:
node0% systemctl enable glusterd && systemctl start glusterd node1% systemctl enable glusterd && systemctl start glusterd node2% systemctl enable glusterd && systemctl start glusterd node3% systemctl enable glusterd && systemctl start glusterd

5. From node0, peer probe the other nodes:
node0% gluster peer probe node1 peer probe: success node0% gluster peer probe node2 peer probe: success node0% gluster peer probe node3 peer probe: success

You can confirm their status with:
node0% gluster peer status Number of Peers: 3


    Hostname: node1

    Uuid: ca8e1489-0f1b-4814-964d-563e67eded24

    State: Peer in Cluster (Connected)
    Hostname: node2

    Uuid: 37ea06ff-53c2-42eb-aff5-a1afb7a6bb59

    State: Peer in Cluster (Connected)

Hostname: node3 Uuid: e1fb733f-8e4e-40e4-8933-e215a183866f State: Peer in Cluster (Connected)

6. Create the /etc/ganesha/ganesha-ha.conf file on node0. Here’s what mine looks like:

# Name of the HA cluster created.
# must be unique within the subnet
HA_NAME="demo-cluster"
#
# The gluster server from which to mount the shared data volume.
HA_VOL_SERVER="node0"
#
# You may use short names or long names; you may not use IP addresses.
# Once you select one, stay with it as it will be mildly unpleasant to clean up if you switch later on. Ensure that all names - short and/or long - are in DNS or /etc/hosts on all machines in the cluster.
#
# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster. Hostname is specified.
HA_CLUSTER_NODES="node0,node1,node2,node3"
#
# Virtual IPs for each of the nodes specified above.
VIP_node0="172.16.3.140"
VIP_node1="172.16.3.141"
VIP_node2="172.16.3.142"
VIP_node3="172.16.3.143"

7. Enable the Gluster shared state volume:
node0% gluster volume set all cluster.enable-shared-storage enable

Wait a few moments for it to be mounted everywhere. You can check that it’s mounted at /run/gluster/shared_storage (or /var/run/gluster/shared_storage) on all the nodes.

8. Enable and start the Pacemaker pcsd on all nodes:
node0% systemctl enable pcsd && systemctl start pcsd node1% systemctl enable pcsd && systemctl start pcsd node2% systemctl enable pcsd && systemctl start pcsd node3% systemctl enable pcsd && systemctl start pcsd

9. Set a password for the user ‘hacluster’ on all nodes. Use the same password for all nodes:
node0% echo demopass | passwd --stdin hacluster node1% echo demopass | passwd --stdin hacluster node2% echo demopass | passwd --stdin hacluster node3% echo demopass | passwd --stdin hacluster

10. Perform cluster auth between the nodes. Username is ‘hacluster’, Password is the one you used in step 9:
node0% pcs cluster auth node0 node0% pcs cluster auth node1 node0% pcs cluster auth node2 node0% pcs cluster auth node3

11. Create the Gluster volume to export. We’ll create a 2×2 distribute-replicate volume. Start the volume:
node0% gluster volume create cluster-demo replica 2 node0:/home/bricks/demo node1:/home/bricks/demo node2:/home/bricks/demo node3:/home/bricks/demo node0% gluster volume start cluster-demo

12. Enable ganesha, i.e. start the ganesha.nfsd:
node0% gluster nfs-ganesha enable

13. Export the volume:
node0% gluster vol set cluster-demo ganesha.enable on

14. And finally mount the NFS volume from a client using one of the virtual IP addresses:
nfs-client% mount node0v:/cluster-demo /mnt

Linux scale out NFSv4 using NFS-Ganesha and GlusterFS — one step at a time

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...