The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

Linux scale out NFSv4 using NFS-Ganesha and GlusterFS — one step at a time

Gluster
2015-10-12

NFS-Ganesha 2.3 is rapidly winding down to release and it has a bunch of new things in it that make it fairly compelling. A lot of people are also starting to use Red Hat Gluster Storage with the NFS-Ganesha NFS server that is part of that package. Setting up a highly available NFS-Ganesha system using GlusterFS is not exactly trivial. This blog post will “eat the elephant” one bite at a time.

Some people might wonder why use NFS-Ganesha — a user space NFS server — when kernel NFS (knfs) already supports NFSv4? The answer is simple really. NFSv4 in the kernel doesn’t scale. It doesn’t scale out, and it’s a single point of failure. This blog post will show how to set up a resilient, highly available system with no single point of failure.

Crawl

Let’s start small and simple. We’ll set up a single NFS-Ganesha server on CentOS 7, serving a single disk volume.

Start by setting up a CentOS 7 machine. You may want to create a separate volume for the NFS export. We’ll leave this as an exercise for the reader. do not install any NFS.

1. Install EPEL, NFS-Ganesha and GlusterFS. Use the yum repos on download.gluster.org. Repo files are at
nfs-ganesha.repo and glusterfs-epel.repo. Copy them to /etc/yum.repos.d.

    % yum -y install epel-release
    % yum -y install glusterfs-server glusterfs-fuse glusterfs-cli glusterfs-ganesha
    % yum -y install nfs-ganesha-xfs

2. Create a directory to mount the export volume, make a file system on the export volume, and finally mount it:

    % mkdir -p /bricks/demo
    % mkfs.xfs /dev/sdb
    % mount /dev/sdb /bricks/demo

3. Gluster recommends not creating volumes on the root directory of the brick. If something goes wrong it’s easier rm -rf the directory than it is to try and clean it or remake the file system. Create a couple subdirs on the brick:

    % mkdir /bricks/demo/vol
    % mkdir /bricks/demo/scratch

4. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /bricks/demo/scratch;

	# Pseudo Path (required for NFS v4)
	Pseudo = /bricks/demo/scratch;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = XFS;
	}
}

5. Start ganesha:

    % systemctl start nfs-ganesha

6. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/scratch /mnt

Walk

7. Now we’ll create a simple gluster volume and use NFS_Ganesha to serve it. We also need to disable gluster’s nfs (gnfs).


    % gluster volume create simple $hostname:/bricks/demo/simple
    % gluster volume set simple nfs.disable on
    % gluster volume start simple

8. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /simple;

	# Pseudo Path (required for NFS v4)
	Pseudo = /simple;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = GLUSTER;
		Hostname = localhost;
		Volume = simple;
	}
}

9. Restart ganesha:


    % systemctl stop nfs-ganesha
    % systemctl start nfs-ganesha

10. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/simple /mnt

Copy a file to the NFS volume. You’ll see it on the gluster brick in /bricks/demo/simple.

Run

Now for the part you’ve been waiting for. For this we’ll start from scratch. This will be a four node cluster: node0, node1, node2, and node3.

1. Tear down anything left over from the above.

2. Ensure that all nodes are resolvable either in DNS or /etc/hosts:


    node0% cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    172.16.3.130 node0
    172.16.3.131 node1
    172.16.3.132 node2
    172.16.3.133 node3

    172.16.3.140 node0v
    172.16.3.141 node1v
    172.16.3.142 node2v
    172.16.3.143 node3v

3. Set up passwordless ssh among the four nodes. On node1 create a keypair and deploy it to all the nodes:


    node0% ssh-keygen -f /var/lib/glusterd/nfs/secret.pem
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node0
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node1
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node2
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node3
    node0% scp /var/lib/glusterd/nfs/secret.* node1:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node2:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node3:/var/lib/glusterd/nfs/

You can confirm that it works with:

    node0% ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/nfs/secret.pem root@node1

4. Start glusterd on all nodes:

    node0% systemctl enable glusterd && systemctl start glusterd
    node1% systemctl enable glusterd && systemctl start glusterd
    node2% systemctl enable glusterd && systemctl start glusterd
    node3% systemctl enable glusterd && systemctl start glusterd

5. From node0, peer probe the other nodes:

    node0% gluster peer probe node1
    peer probe: success
    node0% gluster peer probe node2
    peer probe: success
    node0% gluster peer probe node3
    peer probe: success

You can confirm their status with:

    node0% gluster peer status
    Number of Peers: 3

    Hostname: node1
    Uuid: ca8e1489-0f1b-4814-964d-563e67eded24
    State: Peer in Cluster (Connected)

    Hostname: node2
    Uuid: 37ea06ff-53c2-42eb-aff5-a1afb7a6bb59
    State: Peer in Cluster (Connected)

    Hostname: node3
    Uuid: e1fb733f-8e4e-40e4-8933-e215a183866f
    State: Peer in Cluster (Connected)

6. Create the /etc/ganesha/ganesha-ha.conf file on node0. Here’s what mine looks like:

# Name of the HA cluster created.
# must be unique within the subnet
HA_NAME="demo-cluster"
#
# The gluster server from which to mount the shared data volume.
HA_VOL_SERVER="node0"
#
# You may use short names or long names; you may not use IP addresses.
# Once you select one, stay with it as it will be mildly unpleasant to clean up if you switch later on. Ensure that all names - short and/or long - are in DNS or /etc/hosts on all machines in the cluster.
#
# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster. Hostname is specified.
HA_CLUSTER_NODES="node0,node1,node2,node3"
#
# Virtual IPs for each of the nodes specified above.
VIP_node0="172.16.3.140"
VIP_node1="172.16.3.141"
VIP_node2="172.16.3.142"
VIP_node3="172.16.3.143"

7. Enable the Gluster shared state volume:

    node0% gluster volume set all cluster.enable-shared-storage enable

Wait a few moments for it to be mounted everywhere. You can check that it’s mounted at /run/gluster/shared_storage (or /var/run/gluster/shared_storage) on all the nodes.

8. Enable and start the Pacemaker pcsd on all nodes:

    node0% systemctl enable pcsd && systemctl start pcsd
    node1% systemctl enable pcsd && systemctl start pcsd
    node2% systemctl enable pcsd && systemctl start pcsd
    node3% systemctl enable pcsd && systemctl start pcsd

9. Set a password for the user ‘hacluster’ on all nodes. Use the same password for all nodes:

    node0% echo demopass | passwd --stdin hacluster
    node1% echo demopass | passwd --stdin hacluster
    node2% echo demopass | passwd --stdin hacluster
    node3% echo demopass | passwd --stdin hacluster

10. Perform cluster auth between the nodes. Username is ‘hacluster’, Password is the one you used in step 9:

    node0% pcs cluster auth node0
    node0% pcs cluster auth node1
    node0% pcs cluster auth node2
    node0% pcs cluster auth node3

11. Create the Gluster volume to export. We’ll create a 2×2 distribute-replicate volume. Start the volume:

    node0% gluster volume create cluster-demo replica 2 node0:/home/bricks/demo node1:/home/bricks/demo node2:/home/bricks/demo node3:/home/bricks/demo
    node0% gluster volume start cluster-demo

12. Enable ganesha, i.e. start the ganesha.nfsd:

    node0% gluster nfs-ganesha enable

13. Export the volume:

    node0% gluster vol set cluster-demo ganesha.enable on

14. And finally mount the NFS volume from a client using one of the virtual IP addresses:

    nfs-client% mount node0v:/cluster-demo /mnt

BLOG

  • 06 Dec 2020
    Looking back at 2020 – with g...

    2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

    Read more
  • 27 Apr 2020
    Update from the team

    It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

    Read more
  • 03 Feb 2020
    Building a longer term focus for Gl...

    The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

    Read more