all posts tagged Linux


by on October 12, 2015

Linux scale out NFSv4 using NFS-Ganesha and GlusterFS — one step at a time

NFS-Ganesha 2.3 is rapidly winding down to release and it has a bunch of new things in it that make it fairly compelling. A lot of people are also starting to use Red Hat Gluster Storage with the NFS-Ganesha NFS server that is part of that package. Setting up a highly available NFS-Ganesha system using GlusterFS is not exactly trivial. This blog post will “eat the elephant” one bite at a time.

Some people might wonder why use NFS-Ganesha — a user space NFS server — when kernel NFS (knfs) already supports NFSv4? The answer is simple really. NFSv4 in the kernel doesn’t scale. It doesn’t scale out, and it’s a single point of failure. This blog post will show how to set up a resilient, highly available system with no single point of failure.

Crawl

Let’s start small and simple. We’ll set up a single NFS-Ganesha server on CentOS 7, serving a single disk volume.

Start by setting up a CentOS 7 machine. You may want to create a separate volume for the NFS export. We’ll leave this as an exercise for the reader. do not install any NFS.

1. Install EPEL, NFS-Ganesha and GlusterFS. Use the yum repos on download.gluster.org. Repo files are at
nfs-ganesha.repo and glusterfs-epel.repo. Copy them to /etc/yum.repos.d.

    % yum -y install epel-release
    % yum -y install glusterfs-server glusterfs-fuse glusterfs-cli glusterfs-ganesha
    % yum -y install nfs-ganesha-xfs

2. Create a directory to mount the export volume, make a file system on the export volume, and finally mount it:

    % mkdir -p /bricks/demo
    % mkfs.xfs /dev/sdb
    % mount /dev/sdb /bricks/demo

3. Gluster recommends not creating volumes on the root directory of the brick. If something goes wrong it’s easier rm -rf the directory than it is to try and clean it or remake the file system. Create a couple subdirs on the brick:

    % mkdir /bricks/demo/vol
    % mkdir /bricks/demo/scratch

4. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /bricks/demo/scratch;

	# Pseudo Path (required for NFS v4)
	Pseudo = /bricks/demo/scratch;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = XFS;
	}
}

5. Start ganesha:

    % systemctl start nfs-ganesha

6. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/scratch /mnt

Walk

7. Now we’ll create a simple gluster volume and use NFS_Ganesha to serve it. We also need to disable gluster’s nfs (gnfs).


    % gluster volume create simple $hostname:/bricks/demo/simple
    % gluster volume set simple nfs.disable on
    % gluster volume start simple

8. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /simple;

	# Pseudo Path (required for NFS v4)
	Pseudo = /simple;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = GLUSTER;
		Hostname = localhost;
		Volume = simple;
	}
}

9. Restart ganesha:


    % systemctl stop nfs-ganesha
    % systemctl start nfs-ganesha

10. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/simple /mnt

Copy a file to the NFS volume. You’ll see it on the gluster brick in /bricks/demo/simple.

Run

Now for the part you’ve been waiting for. For this we’ll start from scratch. This will be a four node cluster: node0, node1, node2, and node3.

1. Tear down anything left over from the above.

2. Ensure that all nodes are resolvable either in DNS or /etc/hosts:


    node0% cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    172.16.3.130 node0
    172.16.3.131 node1
    172.16.3.132 node2
    172.16.3.133 node3

    172.16.3.140 node0v
    172.16.3.141 node1v
    172.16.3.142 node2v
    172.16.3.143 node3v

3. Set up passwordless ssh among the four nodes. On node1 create a keypair and deploy it to all the nodes:


    node0% ssh-keygen -f /var/lib/glusterd/nfs/secret.pem
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node0
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node1
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node2
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node3
    node0% scp /var/lib/glusterd/nfs/secret.* node1:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node2:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node3:/var/lib/glusterd/nfs/

You can confirm that it works with:

    node0% ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/nfs/secret.pem root@node1

4. Start glusterd on all nodes:

    node0% systemctl enable glusterd && systemctl start glusterd
    node1% systemctl enable glusterd && systemctl start glusterd
    node2% systemctl enable glusterd && systemctl start glusterd
    node3% systemctl enable glusterd && systemctl start glusterd

5. From node0, peer probe the other nodes:

    node0% gluster peer probe node1
    peer probe: success
    node0% gluster peer probe node2
    peer probe: success
    node0% gluster peer probe node3
    peer probe: success

You can confirm their status with:

    node0% gluster peer status
    Number of Peers: 3

    Hostname: node1
    Uuid: ca8e1489-0f1b-4814-964d-563e67eded24
    State: Peer in Cluster (Connected)

    Hostname: node2
    Uuid: 37ea06ff-53c2-42eb-aff5-a1afb7a6bb59
    State: Peer in Cluster (Connected)

    Hostname: node3
    Uuid: e1fb733f-8e4e-40e4-8933-e215a183866f
    State: Peer in Cluster (Connected)

6. Create the /etc/ganesha/ganesha-ha.conf file on node0. Here’s what mine looks like:

# Name of the HA cluster created.
# must be unique within the subnet
HA_NAME="demo-cluster"
#
# The gluster server from which to mount the shared data volume.
HA_VOL_SERVER="node0"
#
# You may use short names or long names; you may not use IP addresses.
# Once you select one, stay with it as it will be mildly unpleasant to clean up if you switch later on. Ensure that all names - short and/or long - are in DNS or /etc/hosts on all machines in the cluster.
#
# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster. Hostname is specified.
HA_CLUSTER_NODES="node0,node1,node2,node3"
#
# Virtual IPs for each of the nodes specified above.
VIP_node0="172.16.3.140"
VIP_node1="172.16.3.141"
VIP_node2="172.16.3.142"
VIP_node3="172.16.3.143"

7. Enable the Gluster shared state volume:

    node0% gluster volume set all cluster.enable-shared-storage enable

Wait a few moments for it to be mounted everywhere. You can check that it’s mounted at /run/gluster/shared_storage (or /var/run/gluster/shared_storage) on all the nodes.

8. Enable and start the Pacemaker pcsd on all nodes:

    node0% systemctl enable pcsd && systemctl start pcsd
    node1% systemctl enable pcsd && systemctl start pcsd
    node2% systemctl enable pcsd && systemctl start pcsd
    node3% systemctl enable pcsd && systemctl start pcsd

9. Set a password for the user ‘hacluster’ on all nodes. Use the same password for all nodes:

    node0% echo demopass | passwd --stdin hacluster
    node1% echo demopass | passwd --stdin hacluster
    node2% echo demopass | passwd --stdin hacluster
    node3% echo demopass | passwd --stdin hacluster

10. Perform cluster auth between the nodes. Username is ‘hacluster’, Password is the one you used in step 9:

    node0% pcs cluster auth node0
    node0% pcs cluster auth node1
    node0% pcs cluster auth node2
    node0% pcs cluster auth node3

11. Create the Gluster volume to export. We’ll create a 2×2 distribute-replicate volume. Start the volume:

    node0% gluster volume create cluster-demo replica 2 node0:/home/bricks/demo node1:/home/bricks/demo node2:/home/bricks/demo node3:/home/bricks/demo
    node0% gluster volume start cluster-demo

12. Enable ganesha, i.e. start the ganesha.nfsd:

    node0% gluster nfs-ganesha enable

13. Export the volume:

    node0% gluster vol set cluster-demo ganesha.enable on

14. And finally mount the NFS volume from a client using one of the virtual IP addresses:

    nfs-client% mount node0v:/cluster-demo /mnt

by on September 22, 2015

Docker Global Hack Day #3, Bangalore Edition

We organized Docker Global Hack Day at Red Hat Office on 19th Sep’15. Though there were lots RSVPs, the turn up for the event was less than expected. We started the day by showing the recording of kick-off event.

Docker Global Hack #3The teams here worked on four different ideas, out of which two submitted to the Global Hack github page. The four ideas on which teams worked on are:-

Alan and Fayiz worked on PaaS idea, which can be used for setting up dev and qa environment.

Archit was winner of Docker Global Hack Day #2 as well for the same project. He updated the same project in this hackathon. His project is about crowd source analysis by using distributed computing  through  Docker.

  • Visualizing Docker Networking – Himanshu Roy

Himanshu was exploring the idea of visualizing multi-host Docker networking.

  • Spreading and collating containers on GlusterFS with runC -Mohamed Ashiq Liyazudeen, Hari Gowtham

By looking at runC demo in the kick-off video, we thought it would be good if we can run containers on GlusterFS and use it move containers around by saving and restoring the containers on shared volume.

I did not work on specific idea but I was helping teams and other attendees with their questions. I also worked on my upcoming tutorial at Linux Con, Europe on Data and Networking management with containers.

May be because of long weekend and other events we got less participation. Hopefully we do better next time.

by on August 4, 2015

Gluster Community Packages

The Gluster Community currently provides GlusterFS packages for the following distributions:

                            3.5 3.6 3.7
Fedora 21                    ¹   ×   ×
Fedora 22                    ×   ¹   ×
Fedora 23                    ×   ×   ¹
Fedora 24                    ×   ×   ¹
RHEL/CentOS 5                ×   ×
RHEL/CentOS 6                ×   ×   ×
RHEL/CentOS 7                ×   ×   ×
Ubuntu 12.04 LTS (precise)   ×   ×
Ubuntu 14.04 LTS (trusty)    ×   ×   ×
Ubuntu 15.04 (vivid)             ×   ×
Ubuntu 15.10 (wily)
Debian 7 (wheezy)            ×   ×
Debian 8 (jessie)            ×   ×   ×
Debian 9 (squeeze)           ×   ×   ×
SLES 11                      ×   ×
SLES 12                          ×   ×
OpenSuSE 13                  ×   ×   ×
RHELSA 7                             ×

(Packages are also available in NetBSD and maybe FreeBSD.)

Most packages are available from download.gluster.org

Ubuntu packages are available from Launchpad

As can be seen, the old distributions don’t have pkgs of the latest GlusterFS, usually due to dependencies that are too old or missing. Similarly, new distributions don’t have pkgs of the older versions, for the same reason.

[1] In Fedora, Fedora Updates, or Fedora Updates-Testing for Primary architectures. Secondary architectures seem to be slow to sync with Primary; RPMs for aarch64 are often available from download.gluster.org

by on April 8, 2015

GlusterFS-3.4.7 Released

The Gluster community is please to announce the release of GlusterFS-3.4.7.

The GlusterFS 3.4.7 release is focused on bug fixes:

  • 33608f5 cluster/dht: Changed log level to DEBUG
  • 076143f protocol: Log ENODATA & ENOATTR at DEBUG in removexattr_cbk
  • a0aa6fb build: argp-standalone, conditional build and build with gcc-5
  • 35fdb73 api: versioned symbols in libgfapi.so for compatibility
  • 8bc612d cluster/dht: set op_errno correctly during migration.
  • 8635805 cluster/dht: Fixed double UNWIND in lookup everywhere code

GlusterFS 3.4.7 packages for a variety of popular distributions are available from http://download.gluster.org/pub/gluster/glusterfs/3.4/3.4.7/

by on March 27, 2015

GlusterFS 3.4.7beta4 is now available for testing

The 4th beta for GlusterFS 3.4.7 is now available for testing. A handful of bugs have been fixed since the 3.4.6 release, check the references below for details.

Bug reporters are encouraged to verify the fixes, and we invite others to test this beta to check for regressions. The ETA for 3.4.7 GA is tentatively set for April 6 so time is short for testing. Please note that the 3.4 release will EOL when 3.7 is released.

Packages for different distributions can be found on the main download site.

Release Notes for GlusterFS 3.4.7

GlusterFS 3.4.7 consists entirely of bug fixes. The Release Notes for 3.4.0, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.4.5, and 3.4.6 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.4 stable release.

The following changes are included in 3.4.7:

  • 33608f5 cluster/dht: Changed log level to DEBUG
  • 076143f protocol: Log ENODATA & ENOATTR at DEBUG in removexattr_cbk
  • a0aa6fb build: argp-standalone, conditional build and build with gcc-5
  • 35fdb73 api: versioned symbols in libgfapi.so for compatibility
  • 8bc612d cluster/dht: set op_errno correctly during migration.
  • 8635805 cluster/dht: Fixed double UNWIND in lookup everywhere code

Known Issues:

  • memory leak in glusterfs fuse bridge
  • File replicas differ in content even as heal info lists 0 entries in replica 2 setup
by on March 17, 2015

GlusterFS 3.4.7beta2 is now available for testing

The 2nd beta for GlusterFS 3.4.7 is now available for testing. A handful of bugs have been fixed since the 3.4.6 release, check the references below for details.

Bug reporters are encouraged to verify the fixes, and we invite others to test this beta to check for regressions. The ETA for 3.4.7 GA is not set, but will likely be before the end of March. Please note that the 3.4 release will EOL when 3.7 is released.

Packages for different distributions can be found on the main download site.

Release Notes for GlusterFS 3.4.7

GlusterFS 3.4.7 consists entirely of bug fixes. The Release Notes for 3.4.0, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.4.5, and 3.4.6 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.4 stable release.

The following changes are included in 3.4.7:

  • a0aa6fb build: argp-standalone, conditional build and build with gcc-5
  • 35fdb73 api: versioned symbols in libgfapi.so for compatibility
  • 8bc612d cluster/dht: set op_errno correctly during migration.
  • 8635805 cluster/dht: Fixed double UNWIND in lookup everywhere code

Known Issues:

  • memory leak in glusterfs fuse bridge
  • File replicas differ in content even as heal info lists 0 entries in replica 2 setup

And if you’re wondering what happened to the first beta, it was made, but did not build with gcc-5.

by on January 27, 2015

GlusterFS 3.6.2 GA released

The release source tar file and packages for Fedora {20,21,rawhide},
RHEL/CentOS {5,6,7}, Debian {wheezy,jessie}, Pidora2014, and Raspbian
wheezy are available at
http://download.gluster.org/pub/gluster/glusterfs/3.6/3.6.2/

(Ubuntu packages will be available soon.)

This release fixes the following bugs. Thanks to all who submitted bugs
and patches and reviewed the changes.

1184191 – Cluster/DHT : Fixed crash due to null deref
1180404 – nfs server restarts when a snapshot is deactivated
1180411 – CIFS:[USS]: glusterfsd OOM killed when 255 snapshots were
browsed at CIFS mount and Control+C is issued
1180070 – [AFR] getfattr on fuse mount gives error : Software caused
connection abort
1175753 – [readdir-ahead]: indicate EOF for readdirp
1175752 – [USS]: On a successful lookup, snapd logs are filled with
Warnings “dict OR key (entry-point) is NULL”
1175749 – glusterfs client crashed while migrating the fds
1179658 – Add brick fails if parent dir of new brick and existing
brick is same and volume was accessed using libgfapi and smb.
1146524 – glusterfs.spec.in – synch minor diffs with fedora dist-git
glusterfs.spec
1175744 – [USS]: Unable to access .snaps after snapshot restore after
directories were deleted and recreated
1175742 – [USS]: browsing .snaps directory with CIFS fails with
“Invalid argument”
1175739 – [USS]: Non root user who has no access to a directory, from
NFS mount, is able to access the files under .snaps under that directory
1175758 – [USS] : Rebalance process tries to connect to snapd and in
case when snapd crashes it might affect rebalance process
1175765 – USS]: When snapd is crashed gluster volume stop/delete
operation fails making the cluster in inconsistent state
1173528 – Change in volume heal info command output
1166515 – [Tracker] RDMA support in glusterfs
1166505 – mount fails for nfs protocol in rdma volumes
1138385 – [DHT:REBALANCE]: Rebalance failures are seen with error
message ” remote operation failed: File exists”
1177418 – entry self-heal in 3.5 and 3.6 are not compatible
1170954 – Fix mutex problems reported by coverity scan
1177899 – nfs: ls shows “Permission denied” with root-squash
1175738 – [USS]: data unavailability for a period of time when USS is
enabled/disabled
1175736 – [USS]:After deactivating a snapshot trying to access the
remaining activated snapshots from NFS mount gives ‘Invalid argument’ error
1175735 – [USS]: snapd process is not killed once the glusterd comes back
1175733 – [USS]: If the snap name is same as snap-directory than cd to
virtual snap directory fails
1175756 – [USS] : Snapd crashed while trying to access the snapshots
under .snaps directory
1175755 – SNAPSHOT[USS]:gluster volume set for uss doesnot check any
boundaries
1175732 – [SNAPSHOT]: nouuid is appended for every snapshoted brick
which causes duplication if the original brick has already nouuid
1175730 – [USS]: creating file/directories under .snaps shows wrong
error message
1175754 – [SNAPSHOT]: before the snap is marked to be deleted if the
node goes down than the snaps are propagated on other nodes and glusterd
hungs
1159484 – ls -alR can not heal the disperse volume
1138897 – NetBSD port
1175728 – [USS]: All uss related logs are reported under
/var/log/glusterfs, it makes sense to move it into subfolder
1170548 – [USS] : don’t display the snapshots which are not activated
1170921 – [SNAPSHOT]: snapshot should be deactivated by default when
created
1175694 – [SNAPSHOT]: snapshoted volume is read only but it shows rw
attributes in mount
1161885 – Possible file corruption on dispersed volumes
1170959 – EC_MAX_NODES is defined incorrectly
1175645 – [USS]: Typo error in the description for USS under “gluster
volume set help”
1171259 – mount.glusterfs does not understand -n option

Regards,
Kaleb, on behalf of Raghavendra Bhat, who did all the work.

by on November 5, 2014

Installing GlusterFS 3.4.x, 3.5.x or 3.6.0 on RHEL or CentOS 6.6

With the release of RHEL-6.6 and CentOS-6.6, there are now glusterfs packages in the standard channels/repositories. Unfortunately, these are only the client-side packages (like glusterfs-fuse and glusterfs-api). Users that want to run a Gluster Server on a current RHEL or CentOS now have difficulties installing any of todays current version of the Gluster Community packages.

The most prominent issue is that the glusterfs package from RHEL has a version of 3.6.0.28, and that is higher than the last week released version of 3.6.0. RHEL is shipping a pre-release that was created while the Gluster Community was still developing 3.6. An unfortunate packaging decision added a .28 to the version, where most other pre-releases would fall-back to a (rpm-)version like 3.6.0-0.1.something.bla.el6. The difference might look minor, but the result is a major disruption in the much anticipated 3.6 community release.

For the immediate need to fix this in a most easy way for our community users, we have decided to release version 3.6.1 later this week (maybe on Thursday November 6). This version is higher than the version in RHEL/CentOS, and therefore yum will prefer the package from the community repository over the one available in RHEL/CentOS. This is also the main reason why there have been no 3.6.0 packages provided on the download server.

Installing an older stable release (like 3.4 or 3.5) on RHEL/CentOS 6.6 requires a different approach. At the moment we can offer two solutions that can be used. We are still working on making this easier, until that is finalized, some manual actions are required.

Lets assume you want to verify if todays announced glusterfs-3.5.3beta2 packages indeed fix that bug you reported. (These steps apply to the other versions as well, this just happens to be what I have been testing.)

Option A: use exclude in the yum repository files for RHEL/CentOS

  1. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  2. edit /etc/yum.repos.d/redhat.repo or /etc/yum.repos.d/CentOS-Base.repo and under each repository that you find, add the following line

    exclude=glusterfs*

This prevents yum from installing the glusterfs* packages from the standard RHEL/CentOS repositories, but allows those packages from others. The Red Hat Customer Portal has an article about this configuration too.

Option B: install and configure yum-plugin-priorities

Using yum-plugin-priorities is probably a more stable solution. This does not require changes to the standard RHEL/CentOS repositories. However, an additional package needs to get installed.

  1. enable the optional repository when on RHEL, CentOS users can skip this step

    # subscription-manager repos --list | grep optional-rpms
    # subscription-manager repos --enable=*optional-rpms

  2. install the yum-plugin-priorities package:

    # yum install yum-plugin-priorities

  3. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  4. edit the /etc/yum.repos.d/glusterfs-353beta2-epel.repo file and add the following option to each repository definition:

    priority=50

The default priority for repositories is 99. The repositories with the lowest number have the highest priority. As long as the RHEL/CentOS repositories do not have the priority option set, the packages from the glusterfs-353beta2-epel.repo will get preferred by yum.

When using the yum-plugin-priorities approach, we highly recommend that you check if all your repositories have a suitable (or missing) priority option. In case some repositories have the option set, but yum-plugin-priorities was not installed yet, the order of the repositories might have changed. Because of this, we do not want to force using yum-plugin-priorities on all the Gluster Community users that run on RHEL/CentOS.

In case users still have issues installing the Gluster Community packages on RHEL or CentOS, we recommend getting in touch with us on the Gluster Users mailinglist (archive) or in the #gluster IRC channel on Freenode.

by on

Installing GlusterFS 3.4.x, 3.5.x or 3.6.0 on RHEL or CentOS 6.6

With the release of RHEL-6.6 and CentOS-6.6, there are now glusterfs packages in the standard channels/repositories. Unfortunately, these are only the client-side packages (like glusterfs-fuse and glusterfs-api). Users that want to run a Gluster Server on a current RHEL or CentOS now have difficulties installing any of todays current version of the Gluster Community packages.

The most prominent issue is that the glusterfs package from RHEL has a version of 3.6.0.28, and that is higher than the last week released version of 3.6.0. RHEL is shipping a pre-release that was created while the Gluster Community was still developing 3.6. An unfortunate packaging decision added a .28 to the version, where most other pre-releases would fall-back to a (rpm-)version like 3.6.0-0.1.something.bla.el6. The difference might look minor, but the result is a major disruption in the much anticipated 3.6 community release.

For the immediate need to fix this in a most easy way for our community users, we have decided to release version 3.6.1 later this week (maybe on Thursday November 6). This version is higher than the version in RHEL/CentOS, and therefore yum will prefer the package from the community repository over the one available in RHEL/CentOS. This is also the main reason why there have been no 3.6.0 packages provided on the download server.

Installing an older stable release (like 3.4 or 3.5) on RHEL/CentOS 6.6 requires a different approach. At the moment we can offer two solutions that can be used. We are still working on making this easier, until that is finalized, some manual actions are required.

Lets assume you want to verify if todays announced glusterfs-3.5.3beta2 packages indeed fix that bug you reported. (These steps apply to the other versions as well, this just happens to be what I have been testing.)

Option A: use exclude in the yum repository files for RHEL/CentOS

  1. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  2. edit /etc/yum.repos.d/redhat.repo or /etc/yum.repos.d/CentOS-Base.repo and under each repository that you find, add the following line

    exclude=glusterfs*

This prevents yum from installing the glusterfs* packages from the standard RHEL/CentOS repositories, but allows those packages from others. The Red Hat Customer Portal has an article about this configuration too.

Option B: install and configure yum-plugin-priorities

Using yum-plugin-priorities is probably a more stable solution. This does not require changes to the standard RHEL/CentOS repositories. However, an additional package needs to get installed.

  1. enable the optional repository when on RHEL, CentOS users can skip this step

    # subscription-manager repos --list | grep optional-rpms
    # subscription-manager repos --enable=*optional-rpms

  2. install the yum-plugin-priorities package:

    # yum install yum-plugin-priorities

  3. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  4. edit the /etc/yum.repos.d/glusterfs-353beta2-epel.repo file and add the following option to each repository definition:

    priority=50

The default priority for repositories is 99. The repositories with the lowest number have the highest priority. As long as the RHEL/CentOS repositories do not have the priority option set, the packages from the glusterfs-353beta2-epel.repo will get preferred by yum.

When using the yum-plugin-priorities approach, we highly recommend that you check if all your repositories have a suitable (or missing) priority option. In case some repositories have the option set, but yum-plugin-priorities was not installed yet, the order of the repositories might have changed. Because of this, we do not want to force using yum-plugin-priorities on all the Gluster Community users that run on RHEL/CentOS.

In case users still have issues installing the Gluster Community packages on RHEL or CentOS, we recommend getting in touch with us on the Gluster Users mailinglist (archive) or in the #gluster IRC channel on Freenode.

by on September 30, 2014

Gluster, CIFS, ZFS – kind of part 2

A while ago I put together a post detailing the installation and configuration of 2 hosts running glusterfs, which was then presented as CIFS based storage.

http://jonarcher.info/2014/06/windows-cifs-fileshares-using-glusterfs-ctdb-highly-available-data/

This post gained a bit of interest through the comments and social networks, one of the comments I got was from John Mark Walker suggesting I look at the samba-gluster vfs method instead of mounting the filesystem using fuse (directly access the volume from samba, instead of mounting then presenting). On top of this I’ve also been looking quite a bit at ZFS, whereas previously I had a Linux RAID as the base filesystem. So here is a slightly different approach to my previous post.

Getting prepared

As before, we’re looking at 2 hosts, virtual in the case of this build but more than likely physical in a real world scenario, either way it’s irrelevant. Both of these hosts are running CentOS 6 minimal installs (I’ll update to 7 at a later date), static IP addresses assigned and DNS entries created. I’ll also be running everything under a root session, if you don’t do the same just prefix the commands with sudo. For purposes of this I have also disabled SELINUX and removed all firewall rules. I will one day leave SELINUX enabled in this configuration but for now lets leave it out of the equation.

In my case these names and addresses are as follows:

arcstor01 – 192.168.1.210

arcstor02 – 192.168.1.211

First off lets get the relevant repositories installed (EPEL, ZFS and Gluster)

yum localinstall --nogpgcheck http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el6.noarch.rpm
curl -o /etc/yum.repos.d/gluster.repo http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo
curl -o /etc/yum.repos.d/glusterfs-samba-epel.repo http://download.gluster.org/pub/gluster/glusterfs/samba/EPEL.repo/glusterfs-samba-epel.repo

Local filesystem

As previously mentioned, this configuration will be hosted from 2 virtual machines, each will have 3 disks. 1 for the OS, and the other 2 to be used in a ZFS pool.

First off we need to install ZFS itself, once you have the above zfs-release repo installed this can be done with the following command:

yum install kernel-devel zfs

Perform this on both hosts.

We can now create a zfs pool. In my case the disk device names are vdX but they could be sdX,

fdisk -l

can help you identify the device names, whatever they are just replace them in the following commands.

Create a ZFS pool

zpool create -f  -m /gluster gluster mirror /dev/vdb /dev/vdc

this command will create a zfs pool mounted at /gluster, without -m /gluster it would mount at /{poolname} while in this case it’s the same I just added the option for clarity. The volume name is gluster, the redundancy level is mirrored which is similar to RAID1, there are a number of raid levels available in ZFS all are best explained here: http://www.zfsbuild.com/2010/05/26/zfs-raid-levels/. The final element to the command is where to host the pool, in our case on /dev/vdb and /dev/vdc. The -f option specified is to force creation of the pool, this is required remove the need to create partitions prior to the creation of the pool.

Running the command

zpool status

Will return the status of the created pool, which if successful should look something similar to:

[root@arcstor01 ~]# zpool status
 pool: gluster
 state: ONLINE
 scan: none requested
 config:
NAME STATE READ WRITE CKSUM
 gluster ONLINE 0 0 0
 mirror-0 ONLINE 0 0 0
 vdb1 ONLINE 0 0 0
 vdc1 ONLINE 0 0 0

errors: No known data errors

A quick ls and df will also show us that the /gluster mountpoint is present and the pool is mounted, the df should show the size as being half the sum of both drives in the pool:

[root@arcstor01 ~]# ls /
 bin boot cgroup dev etc gluster home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp usr var
 [root@arcstor01 ~]# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/vda1 15G 1.2G 13G 9% /
 tmpfs 498M 0 498M 0% /dev/shm
 gluster 20G 0 20G 0% /gluster

If this is the case, rinse and repeat on host 2. If this is also successful then we now have a resilient base filesystem on which to host our gluster volumes. There is a bucket load more to ZFS and it’s capabilities but it’s way outside the confines of this configuration, well worth looking into though.

Glusterising our pool

So now we have a filesystem, lets make it better. Next up, installing glusterfs, enabling it then preparing the directories, for this part it is pretty much identical to the previous post:

yum install glusterfs-server -y

chkconfig glusterd on

service glusterd start

mkdir  -p /gluster/bricks/share/brick1

This needs to be done on both hosts.

Now only on host1 lets make the two nodes friends, create and then start the gluster volume:

# gluster peer probe arcstor02
peer probe: success.

# gluster vol create share replica 2 arcstor01:/gluster/bricks/share/brick1 arcstor02:/gluster/bricks/share/brick1
volume create: share: success: please start the volume to access data

# gluster vol start share
volume start: share: success

[root@arcstor01 ~]# gluster vol info share

Volume Name: data1
Type: Replicate
Volume ID: 73df25d6-1689-430d-9da8-bff8b43d0e8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: arcstor01:/gluster/bricks/share1/brick1
Brick2: arcstor02:/gluster/bricks/share1/brick1

If all goes well above we should have a gluster volume ready to go, this volume will be presented via samba directly. For this configuration a locally available shared area is required, for this we will create another gluster volume to mount locally in which to store lockfiles and shared config files.

mkdir  -p /gluster/bricks/config/brick1
gluster vol create config replica 2 arcstor01:/gluster/bricks/config/brick1 arcstor02:/gluster/bricks/config/brick1
gluster vol start config
mkdir  /opt/samba-config
mount -t glusterfs localhost:config /opt/samba-config

The share volume could probably be used by using a different path in the samba config but for simplicity we’ll keep them seperate for now.
The mountpoint for /opt/samba-config will need to be added to fstab to ensure it mounts at boot time.

echo "localhost:config /opt/samba-config glusterfs defaults,_netdev 0 0" >>/etc/fstab

Should take care of that, remember that needs to be on both hosts.

Samba and CTDB

We now have a highly resilient datastore which could withstand both disk and host downtime, but we need to make that datastore available for consumption and also highly available in the process, for this we will use CTDB, as in the previous post. CTDB is a cluster version of the TDB database which sits under Samba. The majority of this section will be the same as the previous post except for the extra packages and a slightly different config for samba. Lets install the required packages:

yum -y install ctdb samba samba-common samba-winbind-clients samba-client samba-vfs-glusterfs

For the majority of config files we will create them in our shared config volume and symlink them to their expected location. First file we need to create is /etc/sysconfig/ctdb but we will do this as /opt/samba-config/ctdb then link it afterwards

Note: The files which are created in the shared area should be done only on one host, but the linking needs to be done on both.

vi /opt/samba-config/ctdb
CTDB_RECOVERY_LOCK=/opt/samba-config/lockfile
 #CIFS only
 CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
 CTDB_MANAGES_SAMBA=yes
 #CIFS only
 CTDB_NODES=/etc/ctdb/nodes

We’ll need to remove the existing file in /etc/sysconfig then we can create the symlink

rm /etc/sysconfig/ctdb
ln -s /opt/samba-config/ctdb /etc/sysconfig/ctdb

Although we are using Samba the service we will be using is CTDB which allows for the extra clustering components, we need to stop and disable the samba services and enable the ctdb ones:

service smb stop
chkconfig smb off
chkconfig ctdb on

With this configuration being a cluster of essentially a single datapoint we should really use a single entry point, for this a 3rd “floating” or virtual IP address is employed, more than one could be used but lets keep this simple – 192.168.1.212. We also need to create a ctdb config file which contains a list of all the nodes in the cluster. Both these files need to be created in the shared location:

vi /opt/samba-config/public_addresses
192.168.1.212/24 eth0
vi /opt/samba-config/nodes
192.168.1.210
192.168.1.211

They both then need to be linked to their expected locations – neither of these exist so don’t need to be removed.

ln -s /opt/samba-config/nodes /etc/ctdb/nodes
ln -s /opt/samba-config/public_addresses /etc/ctdb/public_addresses

The last step is to modify the samba configuration to present the volume via cifs, I seemed to have issues using a linked file for samba so will only use the shared area for storing a copy of the config which can then be copied to both nodes to keep them identical.

cp /etc/samba/smb.conf /opt/samba-config/

Lets edit that file:

vi /opt/samba-config/smb.conf

Near the top add the following options

clustering = yes
idmap backend = tdb2
private dir = /opt/samba-config/

These turn the clustering (CTDB) features on and specify the shared directory where samba will create lockfiles. You can test starting ctdb at this point to ensure all is working, on both hosts:

cp /opt/samba-config/smb.conf /etc/samba/
service ctdb start

It should start OK, then health status of the cluster can be checked with

ctdb status

At this point I was finding that CTDB was not starting correctly, after a little bit of logwatching I found an error in the samba logs suggesting:

Failed to create pipe directory /run/samba/ncalrpc - No such file or directory

Also, to be search engine friendly the CTDB logfile was outputting

50.samba OUTPUT:ERROR: Samba tcp port 445 is not responding

This was a red herring, the port wasn’t responding as the samba part of CTDB wasn’t starting, 50.samba is a script in /etc/ctdb/events/ which actually starts the smb process.

So I created the directory /run/samba and restarted ctdb and the issue seems to have disappeared.

Now we have a started service, we can go ahead and add the configuration for the share. A regular samba share would look something like:

[share]
 comment = just a share
 path = /share
 read only = no
 guest ok = yes
 valid users = jon

In the previous post this would have been ideal if our gluster volume was mounted at share, but for this we are removing a layer and want samba to talk directly to gluster rather than via the fuse layer. This is achieved using a VFS object, we installed the samba-vfs-glusterfs package earlier. The configuration is slightly different within the smb.conf file also. Adding the following to our file should enable access to the share volume we created:

[share]
 comment = gluster vfs share
 path = /
 read only = No
 guest ok = Yes
 kernel share modes = No
 vfs objects = glusterfs
 glusterfs:loglevel = 7
 glusterfs:logfile = /var/log/samba/glusterfs-testvol.log
 glusterfs:volume = share

Notice the glusterfs: options near the bottom, these are specific to the glusterfs vfs object which is called further up (vfs objects = glusterfs). Another point to note is that the path is / this is relative to the volume rather than the filesystem, so a path to /test would be a test directory inside the gluster volume.

We can now reload the samba config, lets restart for completeness (on both nodes)

service ctdb restart

From a cifs client you should now be able to browse to \192.168.1.212share (or whatever IP you specified as the floating IP).

ctdb

 

All done!

To conclude, here we have created a highly resilient, highly available, very scalable storage solution using some fantastic technologies. We have created a single access method (Cifs on a floating  IP) to a datastore which is then stored on multiple hosts, which in turn store upon multiple disks. Talk about redundancy!

Useful links:

http://www.centos.org

http://zfsonlinux.org/

http://www.gluster.org/

http://ctdb.samba.org/

 

The post Gluster, CIFS, ZFS – kind of part 2 appeared first on Jon Archer.

flattr this!