all posts tagged gluster

by on February 27, 2017

Announcing Gluster 3.10

Release notes for Gluster 3.10.0

The Gluster community is pleased to announce the release of Gluster 3.10.

This is a major Gluster release that includes some substantial changes. The features revolve around, better support in container environments, scaling to larger number of bricks per node, and a few usability and performance improvements, among other bug fixes. This releases marks the completion of maintenance releases for Gluster 3.7 and 3.9. Moving forward, Gluster versions 3.10 and 3.8 are actively maintained.  

The most notable features and changes are documented here as well as in our full release notes on Github. A full list of bugs that has been addressed is included on that page as well.

Major changes and features

Brick multiplexing

Multiplexing reduces both port and memory usage. It does not improve performance vs. non-multiplexing except when memory is the limiting factor, though there are other related changes that improve performance overall (e.g. compared to 3.9).

Multiplexing is off by default. It can be enabled with

# gluster volume set all cluster.brick-multiplex on

Support to display op-version information from clients

To get information on what op-version are supported by the clients, users can invoke the gluster volume status command for clients. Along with information on hostname, port, bytes read, bytes written and number of clients connected per brick, we now also get the op-version on which the respective clients operate. Following is the example usage:

# gluster volume status <VOLNAME|all> clients

Support to get maximum op-version in a heterogeneous cluster

A heterogeneous cluster operates on a common op-version that can be supported across all the nodes in the trusted storage pool. Upon upgrade of the nodes in the cluster, the cluster might support a higher op-version. Users can retrieve the maximum op-version to which the cluster could be bumped up to by invoking the gluster volume getcommand on the newly introduced global option, cluster.max-op-version. The usage is as follows:

# gluster volume get all cluster.max-op-version

Support for rebalance time to completion estimation

Users can now see approximately how much time the rebalance operation will take to complete across all nodes.

The estimated time left for rebalance to complete is displayed as part of the rebalance status. Use the command:

# gluster volume rebalance <VOLNAME> status

Separation of tier as its own service

This change is to move the management of the tier daemon into the gluster service framework, thereby improving it stability and manageability by the service framework.

This has no change to any of the tier commands or user facing interfaces and operations.


Statedump support for gfapi based applications

gfapi based applications now can dump state information for better trouble shooting of issues. A statedump can be triggered in two ways:

  1. by executing the following on one of the Gluster servers,
  2. # gluster volume statedump <VOLNAME> client <HOST>:<PID>
    • <VOLNAME> should be replaced by the name of the volume
    • <HOST> should be replaced by the hostname of the system running the gfapi application
    • <PID> should be replaced by the PID of the gfapi application
  3. through calling glfs_sysrq(<FS>, GLFS_SYSRQ_STATEDUMP) within the application
    • <FS> should be replaced by a pointer to a glfs_t structure

All statedumps (*.dump.* files) will be located at the usual location, on most distributions this would be /var/run/gluster/.

Disabled creation of trash directory by default

From now onwards trash directory, namely .trashcan, will not be be created by default upon creation of new volumes unless and until the feature is turned ON and the restrictions on the same will be applicable as long as features.trash is set for a particular volume.

Implemented parallel readdirp with distribute xlator

Currently the directory listing gets slower as the number of bricks/nodes increases in a volume, though the file/directory numbers remain unchanged. With this feature, the performance of directory listing is made mostly independent of the number of nodes/bricks in the volume. Thus scale doesn’t exponentially reduce the directory listing performance. (On a 2, 5, 10, 25 brick setup we saw ~5, 100, 400, 450% improvement consecutively)

To enable this feature:

# gluster volume set <VOLNAME> performance.readdir-ahead on
# gluster volume set <VOLNAME> performance.parallel-readdir on

To disable this feature:

# gluster volume set <VOLNAME> performance.parallel-readdir off

If there are more than 50 bricks in the volume it is good to increase the cache size to be more than 10Mb (default value):

# gluster volume set <VOLNAME> performance.rda-cache-limit <CACHE SIZE>

md-cache can optionally -ve cache security.ima xattr

From kernel version 3.X or greater, creating of a file results in removexattr call on security.ima xattr. This xattr is not set on the file unless IMA feature is active. With this patch, removxattr call returns ENODATA if it is not found in the cache.

The end benefit is faster create operations where IMA is not enabled.

To cache this xattr use,

# gluster volume set <VOLNAME> performance.cache-ima-xattrs on

The above option is on by default.

Added support for CPU extensions in disperse computations

To improve disperse computations, a new way of generating dynamic code targeting specific CPU extensions like SSE and AVX on Intel processors is implemented. The available extensions are detected on run time. This can roughly double encoding and decoding speeds (or halve CPU usage).

This change is 100% compatible with the old method. No change is needed if an existing volume is upgraded.

You can control which extensions to use or disable them with the following command:

# gluster volume set <VOLNAME> disperse.cpu-extensions <type>

Valid values are:

  • none: Completely disable dynamic code generation
  • auto: Automatically detect available extensions and use the best one
  • x64: Use dynamic code generation using standard 64 bits instructions
  • sse: Use dynamic code generation using SSE extensions (128 bits)
  • avx: Use dynamic code generation using AVX extensions (256 bits)

The default value is ‘auto’. If a value is specified that is not detected on run-time, it will automatically fall back to the next available option.

Bugs addressed

Bugs addressed since release-3.9 are listed in our full release notes.

by on February 21, 2017

Gluster Monthly Newsletter, January/February 2017

3.10 is at RC1 and is tracking towards a February GA release! Read more about RC1 release —


Find us at Vault next month!  


Our weekly community meeting has changed: we’ll be meeting every other week instead of weekly, moving the time to 15:00 UTC, and our agenda is at:

We hope this means that more people can join us. Kaushal outlines the changes on the mailing list:


Previous Gluster talks from January/February, now with more recordings!



Software Defined Storage DevRoom:  



The next generation of GlusterFS management – Kaushal Madappa


SELinux Support over GlusterFS  – Jiffin Tony Thottan  


Hyper-converged, persistent storage for containers with GlusterFS – Jose Rivera, Mohamed Ashiq


Upcoming talks:



Challenges in Management Services for Distributed Storage – Mrugesh Karnik  


Improving Performance of Directory Operations in Gluster – Manoj Pillai  


Persistent Storage for Containers with Gluster in Containers – Michael Adam –


Provisioning NFSv4 Storage Using NFS-Ganesha, Gluster, and Pacemaker HA – Kaleb S. Keithley


Next Generation File Replication System In GlusterFS – Rafi Kavungal Chundattu Parambil, Red Hat


Noteworthy threads:


Gustave Dahl asks for guidance on converting to shards:

Ziemowit Pierzycki wants to know about high-availability with KVM

Alessandro Briosi asks about gluster and multipath

Kaushal announces Gluster D2 v4.0dev-5

Niels de Vos announces 3.8.9  

Olivier Lambert asks about removing an artificial limitation of disperse volume

Daniele Antolini has questions about heterogeneous bricks




Jeff Darcy provides an update on multiplexing status

Dan Lambright requests a new maintainer for Gluster tiering

Xavier Hernandez asks about creating new options for multiple gluster versions

Avra Sengupta posts a Leader Election Xlator Design Document

Jeff Darcy posts Acknowledgements for brick multiplexing

Menaka Mohan provides an Outreachy intern update

Jeff Darcy starts a discussion around logging in a multi-brick daemon

Xavier Hernandez requests reviews on a number of patches

Niels de Vos asks Should glusterfs-3.10 become the new default with its first release?

Michael Scherer asks about C99 requirement in Gluster




From gluster-users, Michael Scherer corrects an erroneous mass unsubscription on gluster-users list

From gluster-devel, Nigel Babu notes an upcoming outage in March:  

Nigel Babu posts 2017 Infrastructure Plans  

Shyam starts a discussion (and bug) around changing from bugzilla to github:  


Gluster Top 5 Contributors in the last 30 days:

Jeff Darcy, Poornima Gurusiddaiah, Atin Mukherjee, Kaleb S. Keithley, Xavier Hernandez



Upcoming CFPs:

Open Source Summit Japan –  – March 4

LinuxCon Beijing –  – March 18

OpenSource Summit Los Angeles –  – May 6


by on February 16, 2017

GlusterFS 3.8.9 is an other Long-Term-Maintenance update

We are proud to announce the General Availability of yet the next update to the Long-Term-Stable releases for GlusterFS 3.8. Packages are being prepared to hit the mirrors expected to hit the repositories of distributions and the Gluster download server over the next few days. Details on which versions are part of which distributions can be found on the Community Packages in the documentation. The release notes are part of the git repository, the downloadable tarball and are included in this post for easy access.

Release notes for Gluster 3.8.9

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7 and 3.8.8contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 16 patches have been merged, addressing 14 bugs:
  • #1410852: glusterfs-server should depend on firewalld-filesystem
  • #1411899: DHT doesn't evenly balance files on FreeBSD with ZFS
  • #1412119: ganesha service crashed on all nodes of ganesha cluster on disperse volume when doing lookup while copying files remotely using scp
  • #1412888: Extra lookup/fstats are sent over the network when a brick is down.
  • #1412913: [ganesha + EC]posix compliance rename tests failed on EC volume with nfs-ganesha mount.
  • #1412915: Spurious split-brain error messages are seen in rebalance logs
  • #1412916: [ganesha+ec]: Contents of original file are not seen when hardlink is created
  • #1412922: ls and move hung on disperse volume
  • #1412941: Regression caused by enabling client-io-threads by default
  • #1414655: Upcall: Possible memleak if inode_ctx_set fails
  • #1415053: geo-rep session faulty with ChangelogException "No such file or directory"
  • #1415132: Improve output of "gluster volume status detail"
  • #1417802: debug/trace: Print iatts of individual entries in readdirp callback for better debugging experience
  • #1420184: [Remove-brick] Hardlink migration fails with "lookup failed (No such file or directory)" error messages in rebalance logs
by on January 19, 2017

Gluster Community Newsletter, December 2016

Important happenings in Gluster:

Come see us at DevConf and FOSDEM!

Gluster has a big presence at both DevConf.CZ ( as well as FOSDEM! We’ll be exhibiting at FOSDEM with a Gluster stand, and we’ve got an Open Source Software DevRoom. Our schedule for FOSDEM:  


Our weekly community meeting has changed: we’ll be meeting every other week instead of weekly, moving the time to 15:00 UTC, and our agenda is at:

We hope this means that more people can join us. Kaushal outlines the changes on the mailing list:  

Our annual community survey has closed, thanks to everyone who participated!

We’ll be posting the results as part of the official January newsletter, along with recordings of the talks at DevConf and FOSDEM.


Upcoming talks:

DevConf: Hyper-converged, persistent storage for containers with GlusterFS

FOSDEM: SELinux Support over GlusterFS ( )

Hyper-converged, persistent storage for containers with GlusterFS (


Noteworthy threads:


A lovely holiday gift from Lindsay Mathieson about stress testing Gluster

Vladimir asks about GlusterFS best practices  

Aravinda VK shares glustercli-python project updates

Alexandr Porunov asks how to properly set ACLs in GlusterFS

Atin Mukherjee responds to an issue of replica brick not working

Shyam annouces 3.10: Feature list frozen

Yonex has questions on file operation failure on simple distributed volume

Shyam has our 3.10 Features Review



Kaushal comments that etherpads and archiv ing will be going away as of Feb 2017  

Hari Gowtham has a  3.10 feature proposal : Volume expansion on tiered volumes   

Samikshan Bairagya has a feature proposal for 3.10 release: Support to retrieve maximum supported op-version  

Prasanna Kalever has a 3.10 feature proposal : Gluster Block Storage CLI Integration

Kaleb Keithley has a 3.10 feature proposal, switch to storhaug for ganesha and samba HA setup

Poornima Gurusiddaiah has a 3.10 feature proposal : Parallel readdirp l



Michael Scherer announces that salt is no longer used in infra:   

Gluster Top 5 Contributors in December: 

Niels de Vos, Mohammed Rafi KC,  Kaleb Keithley, Soumya Koduri, Sakshi Bansal


Upcoming CFPs:

Open Source Summit Japan (Mar 4)

LinuxCon + ContainerCon + CloudOpen China (Mar 18)  

Open Source Summit North America (LinuxCon + ContainerCon + CloudOpen + Community Leadership Conference) (May 6)


by on January 15, 2017

An other Gluster 3.8 Long-Term-Maintenance update with the 3.8.8 release

The Gluster team has been busy over the end-of-year holidays and this latest update to the 3.8 Long-Term-Maintenance release intends to fix quite a number of bugs. Packages have been built for many different distributions and are available from the download server. The release-notes for 3.8.8 have been included below for the ease of reference. All users on the 3.8 version are recommended to update to this current release.

Release notes for Gluster 3.8.8

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6 and 3.8.7 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 38 patches have been merged, addressing 35 bugs:
  • #1375849: [RFE] enable sharding with virt profile - /var/lib/glusterd/groups/virt
  • #1378384: log level set in glfs_set_logging() does not work
  • #1378547: Asynchronous Unsplit-brain still causes Input/Output Error on system calls
  • #1389781: build: python on Debian-based dists use .../lib/python2.7/dist-packages instead of .../site-packages
  • #1394635: errors appear in brick and nfs logs and getting stale files on NFS clients
  • #1395510: Seeing error messages [snapview-client.c:283:gf_svc_lookup_cbk] and [dht-helper.c:1666ht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/
  • #1399423: GlusterFS client crashes during remove-brick operation
  • #1399432: A hard link is lost during rebalance+lookup
  • #1399468: Wrong value in Last Synced column during Hybrid Crawl
  • #1399915: [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
  • #1401029: OOM kill of nfs-ganesha on one node while fs-sanity test suite is executed.
  • #1401534: fuse mount point not accessible
  • #1402697: glusterfsd crashed while taking snapshot using scheduler
  • #1402728: Worker restarts on log-rsync-performance config update
  • #1403109: Crash of glusterd when using long username with geo-replication
  • #1404105: Incorrect incrementation of volinfo refcnt during volume start
  • #1404583: Upcall: Possible use after free when log level set to TRACE
  • #1405004: [Perf] : pcs cluster resources went into stopped state during Multithreaded perf tests on RHGS layered over RHEL 6
  • #1405130: `gluster volume heal split-brain' does not heal if data/metadata/entry self-heal options are turned off
  • #1405450: tests/bugs/snapshot/bug-1316437.t test is causing spurious failure
  • #1405577: [GANESHA] failed to create directory of hostname of new node in var/lib/nfs/ganesha/ in already existing cluster nodes
  • #1405886: Fix potential leaks in INODELK cbk in protocol/client
  • #1405890: Fix spurious failure in bug-1402841.t-mt-dir-scan-race.t
  • #1405951: NFS-Ganesha:Volume reset for any option causes reset of ganesha enable option and bring down the ganesha services
  • #1406740: Fix spurious failure in tests/bugs/replicate/bug-1402730.t
  • #1408414: Remove-brick rebalance failed while rm -rf is in progress
  • #1408772: [Arbiter] After Killing a brick writes drastically slow down
  • #1408786: with granular-entry-self-heal enabled i see that there is a gfid mismatch and vm goes to paused state after migrating to another host
  • #1410073: Fix failure of split-brain-favorite-child-policy.t in CentOS7
  • #1410369: Dict_t leak in dht_migration_complete_check_task and dht_rebalance_inprogress_task
  • #1410699: [geo-rep]: Config commands fail when the status is 'Created'
  • #1410708: glusterd/geo-rep: geo-rep config command leaks fd
  • #1410764: Remove-brick rebalance failed while rm -rf is in progress
  • #1411011: atime becomes zero when truncating file via ganesha (or gluster-NFS)
  • #1411613: Fix the place where graph switch event is logged
by on November 30, 2016

Gluster Community Newsletter, November 2016

Important happenings for Gluster this month:

Gluster 3.9 is out!


Gluster’s Annual Community Survey is open until December 9th!

Results to come out in the December newsletter.


Gluster is supporting the Software Defined Storage DevRoom at FOSDEM, speakers to be announced by December 11th, 2016.


We also have a stand at FOSDEM, with a call for volunteers:



Lindsay Mathieson asks for suggestions around Improving IOPS

Thomas Wakefield requests assistance on implementing Gluster in a university setting

ML Wong asks for deatils around

Andrew Boag promotes a Recent Gluster-related talk at OpenStack summit

Dan-Joe Lopez posts about Automation of single server addition to replica

songxin has a question about info and info.tmp

Saravanakumar Arumugam asks for help with FSFE pads to github wiki / alternative etherpad – info. required

ML Wong encounters an issue with 3.7.16 with sharding corrupts VMDK files whenadding and removing bricks

Olivier Lambert has a question around corruption using gluster and iSCSI with LIO

Alexandr Porunov is interested in enabling shared_storage

Abhishek Paliwal has an issues with  duplicate UUID entries in “gluster peer status” command

Shyam announces release 3.10 schedule



Nokia et al posts about an issue with the size of fstat is less than the size of the syslog file –

Jonathan Holloway has an update on Glusto-tests and libraries  –

Nigel asks for assistance with NFS Debugging for Glusto tests and Glusto help in general

Raghavendra Gowdappa requests Feedback on DHT option “cluster.readdir-optimize”

Abhishek Paliwal has a question around getting “Transport endpoint is not connected” in glusterfs mount log file

Sander Eikelenboom asks: Is it possible to turn an existing filesystem (with data) into a GlusterFS brick ?

Humble Devassy Chirammal announces a Container Repo Change + > 50K downloads of Gluster Container images

Lindsay Mathieson has a Feature Request: Lock Volume Settings

Kaushal M announces a new dev release for GlusterD2 – v4.0dev-3

Ankireddypalle Reddy requests Hole punch support




Shyam notes a FB: Branch creation


Top 5 contributors:


Nigel Babu, Krutika Dhananjay, Pranith Kumar K, Poornima Gurusiddaiah, Rajesh Joseph


Upcoming CFPs:


KubeCon Europe – December 16th –

Vault  – January 14th –

Incontro DevOps – January 8th –

Red Hat Summit  – December 16th –


by on November 23, 2016

Announcing Gluster 3.9

The Gluster community is pleased to announce the release of Gluster 3.9.

This is a major release that includes a number of changes. Many improvements contribute to better support of Gluster with containers and running your storage on the same server as your hypervisors. Additionally, we’ve focused on integrating with other projects in the open source ecosystem. This releases marks the completion of maintenance releases for Gluster 3.6.  Moving forward, Gluster versions 3.9, 3.8 and 3.7 are all actively maintained.  

Our release notes are included below, including a full list of bugs fixed and a link to our upgrade guide.

Major changes and features

Introducing reset-brick command

Notes for users:
The reset-brick command provides support to reformat/replace the disk(s)
represented by a brick within a volume. This is helpful when a disk goes bad etc

Start reset process –

gluster volume reset-brick VOLNAME HOSTNAME:BRICKPATH start

The above command kills the respective brick process. Now the brick can be reformatted.

To restart the brick after modifying configuration –


If the brick was killed to replace the brick with same brick path, restart with following command –

gluster volume reset-brick VOLNAME HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit force


  1. resetting a brick kills a brick process in concern. During this
    period the brick will not be available for IO’s.
  2. Replacing a brick with this command will work only if both the brick paths
    are same and belong to same volume.

Get node level status of a cluster

Notes for users:
The get-state command provides node level status of a trusted storage pool from
the point of view of glusterd in a parseable format. Using get-state command,
external applications can invoke the command on all nodes of the cluster, and
parse and collate the data obtained from all these nodes to get a complete
picture of the state of the cluster.

# gluster get-state <glusterd> [odir <path/to/output/dir] [file <filename>]

This would dump data points that reflect the local state representation of the
cluster as maintained in glusterd (no other daemons are supported as of now)
to a file inside the specified output directory. The default output directory
and filename is /var/run/gluster and glusterd_state_<timestamp> respectively.

Following are the sections in the output:

  1. Global: UUID and op-version of glusterd
  2. Global options: Displays cluster specific options that have been set
    explicitly through the volume set command.
  3. Peers: Displays the peer node information including its hostname and
    connection status
  4. Volumes: Displays the list of volumes created on this node along with
    detailed information on each volume.
  5. Services: Displays the list of the services configured on this node along
    with their corresponding statuses.


  1. This only supports glusterd.
  2. Does not provide complete cluster state. Data to be collated from all nodes
    by external application to get the complete cluster state.

Multi threaded self-heal for Disperse volumes

Notes for users:
Users now have the ability to configure multi-threaded self-heal in disperse volumes using the following commands:

Option below can be used to control number of parallel heals in SHD
# gluster volume set <volname> disperse.shd-max-threads [1-64] # default is 1
Option below can be used to control number of heals that can wait in SHD
# gluster volume set <volname> disperse.shd-wait-qlength [1-65536] # default is 1024

Hardware extention acceleration in Disperse volumes

Notes for users:
If the user has hardware that has special instructions which can be used in erasure code calculations on the client it will be automatically used. At the moment this support is added for cpu-extentions: x64, sse, avx

Lock revocation feature

Notes for users:

  1. Motivation: Prevents cluster instability by mis-behaving clients causing bricks to OOM due to inode/entry lock pile-ups.
  2. Adds option to strip clients of entry/inode locks after N seconds
  3. Adds option to clear ALL locks should the revocation threshold get hit
  4. Adds option to clear all or granted locks should the max-blocked threshold get hit (can be used in combination w/ revocation-clear-all).
  5. Adds logging to indicate revocation event & reason
  6. Options are:
# gluster volume set <volname> features.locks-revocation-secs <integer; 0 to disable>
# gluster volume set <volname> features.locks-revocation-clear-all [on/off]
# gluster volume set <volname> features.locks-revocation-max-blocked <integer>

On demand scrubbing for Bitrot Detection:

Notes for users: With ‘ondemand’ scrub option, you don’t need to wait for the scrub-frequency
to expire. As the option name itself says, the scrubber can be initiated on demand to detect
the corruption. If the scrubber is already running, this option is a no op.

# gluster volume bitrot <volume-name> scrub ondemand

Improvements in Gluster NFS-Ganesha integration

Notes for users:
With this release the major change done is to store all the ganesha related configuration files in the shared storage volume mount point instead of having separate local copy in ‘/etc/ganesha’ folder on each node.

For new users, before enabling nfs-ganesha

  1. create a directory named nfs-ganesha in the shared storage mount point (/var/run/gluster/shared_storage/)
  2. Create ganesha.conf & ganesha-ha.conf in that directory with the required details filled in.

For existing users, before starting nfs-ganesha service do the following :

  1. Copy all the contents of /etc/ganesha directory (including .export_added file) to /var/run/gluster/shared_storage/nfs-ganesha from any of the ganesha nodes
  2. Create symlink using /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf on /etc/ganesha one each node in ganesha-cluster
  3. Change path for each export entry in ganesha.conf file
Example: if a volume "test" was exported, then ganesha.conf shall have below export entry -
 %include "/etc/ganesha/exports/export.test.conf" export entry.
Change that line to
 %include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.test.conf"

In addition, following changes have been made –

  • The entity “HA_VOL_SERVER= ” in ganesha-ha.conf is no longer required.
  • A new resource-agent called portblock (available in >= resource-agents-3.9.5 package) is added to the cluster configuration to speed up the nfs-client connections post IP failover or failback. This may be noticed while looking at the cluster configuration status using the command pcs status.

Availability of python bindings to libgfapi

The official python bindings for GlusterFS libgfapi C library interface is
mostly API complete. The complete API reference and documentation can be
found at

The python bindings have been packaged and has been made available over

Small file improvements in Gluster with md-cache (Experimental)

Notes for users:
With this release, metadata cache on the client side is integrated with the
cache-invalidation feature so that the clients can cache longer without
compromising on consistency. By enabling, the metadata cache and cache
invalidation feature and extending the cache timeout to 600s, we have seen
performance improvements in metadata operation like creates, ls/stat, chmod,
rename, delete. The perf improvements is significant in SMB access of gluster
volume, but as a cascading effect the improvements is also seen on FUSE/Native
access and NFS access.

Use the below options in the order mentioned, to enable the features:

  # gluster volume set <volname> features.cache-invalidation on
  # gluster volume set <volname> features.cache-invalidation-timeout 600
  # gluster volume set <volname> performance.stat-prefetch on
  # gluster volume set <volname> performance.cache-invalidation on
  # gluster volume set <volname> performance.cache-samba-metadata on     # Only for SMB access
  # gluster volume set <volname> 600

Real time Cluster notifications using Events APIs

Let us imagine we have a Gluster monitoring system which displays
list of volumes and its state, to show the realtime status, monitoring
app need to query the Gluster in regular interval to check volume
status, new volumes etc. Assume if the polling interval is 5 seconds
then monitoring app has to run gluster volume info command ~17000
times a day!

With Gluster 3.9 release, Gluster provides close to realtime
notification and alerts for the Gluster cluster state changes and
alerts. Webhooks can be registered to listen to Events emitted by
Gluster. More details about this new feature is available here. Guide/Events APIs

Geo-replication improvements

Documentation improvements:

Upstream documentation is rewritten to reflect the latest version of
Geo-replication. Removed the stale/duplicate documentation. We are
still working on to add Troubleshooting, Cluster expand/shrink notes
to it. Latest version of documentation is available here Guide/Geo Replication

Geo-replication Events are available for Events API consumers:

Events APIs is the new Gluster feature available with 3.9 release,
most of the events from Geo-replication are added to eventsapi.

Read more about the Events APIs and Geo-replication events here Guide/Events APIs

New simplified command to setup Non root Geo-replication

Non root Geo-replication setup was not easy with multiple manual
steps. Non root Geo-replication steps are simplified. Read more about
the new steps in Admin guide. Guide/Geo Replication/#slave-user-setup

New command to generate SSH keys(Alternative command to gsec_create)

gluster system:: execute gsec_create command generates ssh keys in
every Master cluster nodes and copies to initiated node. This command
silently ignores error if any node is down in cluster. It will not
collect SSH keys from that node. When Geo-rep create push-pem command
is issued it will copy public keys from those nodes which were up
during gsec_create. This causes Geo-rep to go to Faulty when that
master node tries to make the connection to slave nodes. With the new
command, output shows if any Master node was down while generating ssh
keys. Read more about `gluster-georep-sshkey Guide/Geo Replication/#setting-up-the-environment-for-geo-replication

Logging improvements

New logs are added, now from the log we can clearly understand what is
going on. Note: This feature may change logging format of existing log
messages, Please update your parsers if used to parse Geo-rep logs.


New Configuration options available: changelog-log-level

All the changelog related log messages are logged in
/var/log/glusterfs/geo-replication/<SESSION>/*.changes.log in Master
nodes. Log level was hard coded as TRACE for Changelog logs. New
configuration option provided to modify the changelog log level and
defaulted to INFO

Behavior changes

  • #1221623: Earlier the ports GlusterD
    used to allocate for the daemons like brick processes, quotad, shd et all
    were persistent through the volume’s life cycle, so every restart of the
    process(es) or a node reboot will try to use the same ports which were
    allocated for the first time. With release-3.9 onwards, GlusterD will try to
    allocate a fresh port once a daemon is restarted or the node is rebooted.
  • #1348944: with 3.9 release the default
    log file for glusterd has been renamed to glusterd.log from

Known Issues

  • #1387878:add-brick on a vm-store
    configuration which has sharding enabled is leading to vm corruption. To work
    around this issue, one can scale up by creating more volumes until this issue
    is fixed.

Bugs addressed

A total of 571 patches has been sent, addressing 422 bugs:

  • #762184: Support mandatory locking in glusterfs
  • #789278: Issues reported by Coverity static analysis tool
  • #1005257: [PATCH] Small typo fixes
  • #1175711: posix: Set correct d_type for readdirp() calls
  • #1193929: GlusterFS can be improved
  • #1198849: Minor improvements and cleanup for the build system
  • #1200914: pathinfo is wrong for striped replicated volumes
  • #1202274: Minor improvements and code cleanup for libgfapi
  • #1207604: [rfe] glusterfs snapshot cli commands should provide xml output.
  • #1211863: RFE: Support in md-cache to use upcall notifications to invalidate its cache
  • #1221623: glusterd: add brick command should re-use the port for listening which is freed by remove-brick.
  • #1222915: usage text is wrong for use-readdirp mount default
  • #1223937: Outdated autotools helper config.* files
  • #1225718: [FEAT] DHT – rebalance – rebalance status o/p should be different for ‘fix-layout’ option, it should not show ‘Rebalanced-files’ , ‘Size’, ‘Scanned’ etc as it is not migrating any files.
  • #1227667: Minor improvements and code cleanup for protocol server/client
  • #1228142: clang-analyzer: adding clang static analysis support
  • #1231224: Misleading error messages on brick logs while creating directory (mkdir) on fuse mount
  • #1236009: do an explicit lookup on the inodes linked in readdirp
  • #1254067: remove unused variables
  • #1266876: cluster/afr: AFR2 returns empty readdir results to clients if brick is added back into cluster after re-imaging/formatting
  • #1278325: DHT: Once remove brick start failed in between Remove brick commit should not be allowed
  • #1285152: store afr pending xattrs as a volume option
  • #1292020: quota: client gets IO error instead of disk quota exceed when the limit is exceeded
  • #1294813: [geo-rep]: Multiple geo-rep session to the same slave is allowed for different users
  • #1296043: Wrong usage of dict functions
  • #1302277: Wrong XML output for Volume Options
  • #1302948: tar complains: <fileName>: file changed as we read it
  • #1303668: packaging: rpmlint warning and errors – Documentation URL 404
  • #1305031: AFR winds a few reads of a file in metadata split-brain.
  • #1306398: Tiering and AFR may result in data loss
  • #1311002: NFS+attach tier:IOs hang while attach tier is issued
  • #1311926: [georep]: If a georep session is recreated the existing files which are deleted from slave doesn’t get sync again from master
  • #1315666: Data Tiering:tier volume status shows as in-progress on all nodes of a cluster even if the node is not part of volume
  • #1316178: changelog/rpc: Memory leak- rpc_clnt_t object is never freed
  • #1316389: georep: tests for logrotate, create+rename and hard-link rename
  • #1318204: Input / Output when chmoding files on NFS mount point
  • #1318289: [RFE] Add arbiter brick hotplug
  • #1318591: Glusterd not operational due to snapshot conflicting with nfs-ganesha export file in “/var/lib/glusterd/snaps”
  • #1319992: RFE: Lease support for gluster
  • #1320388: [GSS]-gluster v heal volname info does not work with enabled ssl/tls
  • #1321836: gluster volume info –xml returns 0 for nonexistent volume
  • #1322214: [HC] Add disk in a Hyper-converged environment fails when glusterfs is running in directIO mode
  • #1322805: [scale] Brick process does not start after node reboot
  • #1322825: IO-stats, client profile is overwritten when it is on the same node as bricks
  • #1324439: SAMBA+TIER : Wrong message display.On detach tier success the message reflects Tier command failed.
  • #1325831: gluster snap status xml output shows incorrect details when the snapshots are in deactivated state
  • #1326410: /var/lib/glusterd/$few-directories not owned by any package, causing it to remain after glusterfs-server is uninstalled
  • #1327171: Disperse: Provide description of disperse.eager-lock option.
  • #1328224: RFE : Feature: Automagic unsplit-brain policies for AFR
  • #1329211: values for Number of Scrubbed files, Number of Unsigned files, Last completed scrub time and Duration of last scrub are shown as zeros in bit rot scrub status
  • #1330032: rm -rf to a dir gives directory not empty(ENOTEMPTY) error
  • #1330097: ganesha exported volumes doesn’t get synced up on shutdown node when it comes up.
  • #1330583: glusterfs-libs postun ldconfig: relative path `1’ used to build cache
  • #1331254: Disperse volume fails on high load and logs show some assertion failures
  • #1331287: No xml output on gluster volume heal info command with –xml
  • #1331323: [Granular entry sh] – Implement renaming of indices in index translator
  • #1331423: distaf: Add io_libs to namespace package list
  • #1331720: implement meta-lock/unlock for lock migration
  • #1331721: distaf: Add README and HOWTO to distaflibs as well
  • #1331860: Wrong constant used in length based comparison for XATTR_SECURITY_PREFIX
  • #1331969: Ganesha+Tiering: Continuous “0-glfs_h_poll_cache_invalidation: invalid argument” messages getting logged in ganesha-gfapi logs.
  • #1332020: multiple regression failures for tests/basic/quota-ancestry-building.t
  • #1332021: multiple failures for testcase: tests/basic/inode-quota-enforcing.t
  • #1332054: multiple failures of tests/bugs/disperse/bug-1236065.t
  • #1332073: EINVAL errors while aggregating the directory size by quotad
  • #1332134: bitrot: Build generates Compilation Warning.
  • #1332136: Detach tier fire before the background fixlayout is complete may result in failure
  • #1332156: SMB:while running I/O on cifs mount and doing graph switch causes cifs mount to hang.
  • #1332219: tier: avoid pthread_join if pthread_create fails
  • #1332413: Wrong op-version for mandatory-locks volume set option
  • #1332419: geo-rep: address potential leak of memory
  • #1332460: [features/worm] – when disabled, worm xl should simply pass requested fops to its child xl
  • #1332465: glusterd + bitrot : Creating clone of snapshot. error “xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/features/ cannot open shared object file:
  • #1332473: tests: ‘tests/bitrot/br-state-check.t’ fails in netbsd
  • #1332501: Mandatory locks are not migrated during lock migration
  • #1332566: [granular entry sh] – Add more tests
  • #1332798: [AFR]: “volume heal info” command is failing during in-service upgrade to latest.
  • #1332822: distaf: Add library functions for gluster snapshot operations
  • #1332885: distaf: Add library functions for gluster bitrot operations and generic library utility functions generic to all components
  • #1332952: distaf: Add library functions for gluster quota operations
  • #1332994: Self Heal fails on a replica3 volume with ‘disk quota exceeded’
  • #1333023: readdir-ahead does not fetch xattrs that md-cache needs in it’s internal calls
  • #1333043: Fix excessive logging due to NULL dict in dht
  • #1333263: [features/worm] Unwind FOPs with op_errno and add gf_worm prefix to functions
  • #1333317: rpc_clnt will sometimes not reconnect when using encryption
  • #1333319: Unexporting a volume sometimes fails with “Dynamic export addition/deletion failed”.
  • #1333370: [FEAT] jbr-server handle lock/unlock fops
  • #1333738: distaf: Add GlusterBaseClass ( to distaflibs-gluster.
  • #1333912: client ID should logged when SSL connection fails
  • #1333925: libglusterfs: race conditions and illegal mem access in timer
  • #1334044: [RFE] Eventing for Gluster
  • #1334164: Worker dies with [Errno 5] Input/output error upon creation of entries at slave
  • #1334208: distaf: Add library functions for gluster rebalance operations
  • #1334269: GlusterFS 3.8 fails to build in the CentOS Community Build System
  • #1334270: glusterd: glusterd provides stale port information when a volume is recreated with same brick path
  • #1334285: Under high read load, sometimes the message “XDR decoding failed” appears in the logs and read fails
  • #1334314: changelog: changelog_rollover breaks when number of fds opened is more than 1024
  • #1334444: SAMBA-VSS : Permission denied issue while restoring the directory from windows client 1 when files are deleted from windows client 2
  • #1334620: stop all gluster processes should also include glusterfs mount process
  • #1334621: set errno in case of inode_link failures
  • #1334721: distaf: Add library functions for gluster tiering operations
  • #1334839: [Tiering]: Files remain in hot tier even after detach tier completes
  • #1335019: Add graph for decompounder xlator
  • #1335091: mount/fuse: Logging improvements
  • #1335231: features/locks: clang compile warning in posix.c
  • #1335232: features/index: clang compile warnings in index.c
  • #1335429: Self heal shows different information for the same volume from each node
  • #1335494: Modifying peer ops library
  • #1335531: Modified volume options are not syncing once glusterd comes up.
  • #1335652: Heal info shows split-brain for .shard directory though only one brick was down
  • #1335717: PREFIX is not honoured during build and install
  • #1335776: rpc: change client insecure port ceiling from 65535 to 49151
  • #1335818: Revert “features/shard: Make o-direct writes work with sharding:
  • #1335858: Files present in the .shard folder even after deleting all the vms from the UI
  • #1335973: [Tiering]: The message ‘Max cycle time reached…exiting migration’ incorrectly displayed as an ‘error’ in the logs
  • #1336197: failover is not working with latest builds.
  • #1336328: [FEAT] jbr: Improve code modularity
  • #1336354: Provide a way to configure gluster source location in devel-vagrant
  • #1336373: Distaf: Add gluster specific config file
  • #1336381: ENOTCONN error during parallel rmdir
  • #1336508: rpc-transport: compiler warning format string
  • #1336612: one of vm goes to paused state when network goes down and comes up back
  • #1336630: ERROR and Warning message on writing a file from mount point “null gfid for path (null)” repeated 3 times between”
  • #1336642: [RFE] git-branch-diff: wrapper script for git to visualize backports
  • #1336698: DHT : few Files are not accessible and not listed on mount + more than one Directory have same gfid + (sometimes) attributes has ?? in ls output after renaming Directories from multiple client at same time
  • #1336793: assorted typos and spelling mistakes from Debian lintian
  • #1336818: Add ability to set oom_score_adj for glusterfs process
  • #1336853: scripts: bash-isms in scripts
  • #1336945: [NFS-Ganesha] : stonith-enabled option not set with new versions of cman,pacemaker,corosync and pcs
  • #1337160: distaf: Added libraries to setup nfs-ganesha in gluster through distaf
  • #1337227: [tiering]: error message shown during the failure of detach tier commit isn’t intuitive
  • #1337405: Some of VMs go to paused state when there is concurrent I/O on vms
  • #1337473: upgrade path when slave volume uuid used in geo-rep session
  • #1337597: Mounting a volume over NFS with a subdir followed by a / returns “Invalid argument”
  • #1337650: log flooded with Could not map name=xxxx to a UUID when config’d with long hostnames
  • #1337777: tests/bugs/write-behind/1279730.t fails spuriously
  • #1337791: tests/basic/afr/tarissue.t fails regression
  • #1337899: Misleading error message on rebalance start when one of the glusterd instance is down
  • #1338544: fuse: In fuse_first_lookup(), dict is not un-referenced in case create_frame returns an empty pointer.
  • #1338634: AFR : fuse,nfs mount hangs when directories with same names are created and deleted continuously
  • #1338733: __inode_ctx_put: fix mem leak on failure
  • #1338967: common-ha: ganesha.nfsd not put into NFS-GRACE after fail-back
  • #1338991: DHT2: Tracker bug
  • #1339071: dht/rebalance: mark hardlink migration failure as skipped for rebalance process
  • #1339149: Error and warning messages related to xlator/features/ adding up to the client log on performing IO operations
  • #1339166: distaf: Added timeout value to wait for rebalance to complete and removed older rebalance library file
  • #1339181: Full heal of a sub-directory does not clean up name-indices when granular-entry-heal is enabled.
  • #1339214: gfapi: set mem_acct for the variables created for upcall
  • #1339471: [geo-rep]: Worker died with [Errno 2] No such file or directory
  • #1339472: [geo-rep]: Monitor crashed with [Errno 3] No such process
  • #1339541: Added libraries to setup CTDB in gluster through distaf
  • #1339553: gfapi: in case of handle based APIs, close glfd after successful create
  • #1339689: RFE – capacity info (df -h on a mount) is incorrect for a tiered volume
  • #1340488: does not have a correct shebang
  • #1340623: Directory creation(mkdir) fails when the remove brick is initiated for replicated volumes accessing via nfs-ganesha
  • #1340853: [geo-rep]: If the session is renamed, geo-rep configuration are not retained
  • #1340936: Automount fails because /sbin/mount.glusterfs does not accept the -s option
  • #1341007: gfapi : throwing warning message for unused variable in glfs_h_find_handle()
  • #1341009: Log parameters such as the gfid, fd address, offset and length of the reads upon failure for easier debugging
  • #1341294: build: RHEL7 unpackaged files /var/lib/glusterd/hooks/…/S57glusterfind-delete-post.{pyc,pyo}
  • #1341474: [geo-rep]: Snapshot creation having geo-rep session is broken
  • #1341650: conservative merge happening on a x3 volume for a deleted file
  • #1341768: After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status
  • #1341796: [quota+snapshot]: Directories are inaccessible from activated snapshot, when the snapshot was created during directory creation
  • #1342171: O_DIRECT support for sharding
  • #1342259: [features/worm] – write FOP should pass for the normal files
  • #1342298: reading file with size less than 512 fails with odirect read
  • #1342356: [RFE] Python library for creating Cluster aware CLI tools for Gluster
  • #1342420: [georep]: Stopping volume fails if it has geo-rep session (Even in stopped state)
  • #1342796: self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha
  • #1342979: [geo-rep]: Add-Brick use case: create push-pem force on existing geo-rep fails
  • #1343038: IO ERROR when multiple graph switches
  • #1343286: enabling glusternfs with nfs.rpc-auth-allow to many hosts failed
  • #1343333: [RFE] Simplify Non Root Geo-replication Setup
  • #1343374: Gluster fuse client crashed generating core dump
  • #1343838: Implement API to get page aligned iobufs in iobuf.c
  • #1343906: [Stress/Scale] : I/O errors out from gNFS mount points during high load on an erasure coded volume,Logs flooded with Error messages.
  • #1343943: Old documentation link in log during Geo-rep MISCONFIGURATION
  • #1344277: [disperse] mkdir after re balance give Input/Output Error
  • #1344340: Unsafe access to inode->fd_list
  • #1344396: fd leak in disperse
  • #1344407: fail delete volume operation if one of the glusterd instance is down in cluster
  • #1344686: tiering : Multiple brick processes crashed on tiered volume while taking snapshots
  • #1344714: removal of file from nfs mount crashs ganesha server
  • #1344836: [Disperse volume]: IO hang seen on mount with file ops
  • #1344885: inode leak in brick process
  • #1345727: Bricks are starting when server quorum not met.
  • #1345744: [geo-rep]: Worker crashed with “KeyError: “
  • #1345748: SAMBA-DHT : Crash seen while rename operations in cifs mount and windows access of share mount
  • #1345846: quota : rectify quota-deem-statfs default value in gluster v set help command
  • #1345855: Possible crash due to a timer cancellation race
  • #1346138: [RFE] Non root Geo-replication Error logs improvements
  • #1346211: cleanup glusterd-georep code
  • #1346551: wrong understanding of function’s parameter
  • #1346719: [Disperse] dd + rm + ls lead to IO hang
  • #1346821: cli core dumped while providing/not wrong values during arbiter replica volume
  • #1347249: libgfapi : variables allocated by glfs_set_volfile_server is not freed
  • #1347354: glusterd: SuSE build system error for incorrect strcat, strncat usage
  • #1347686: IO error seen with Rolling or non-disruptive upgrade of an distribute-disperse(EC) volume from 3.7.5 to 3.7.9
  • #1348897: Add relative path validation for gluster copy file utility
  • #1348904: [geo-rep]: If the data is copied from .snaps directory to the master, it doesn’t get sync to slave [First Copy]
  • #1348944: Change the glusterd log file name to glusterd.log
  • #1349270: ganesha.enable remains on in volume info file even after we disable nfs-ganesha on the cluster.
  • #1349273: Geo-rep silently ignores config parser errors
  • #1349276: Buffer overflow when attempting to create filesystem using libgfapi as driver on OpenStack
  • #1349284: [tiering]: Files of size greater than that of high watermark level should not be promoted
  • #1349398: nfs-ganesha disable doesn’t delete nfs-ganesha folder from /var/run/gluster/shared_storage
  • #1349657: process glusterd set TCP_USER_TIMEOUT failed
  • #1349709: Polling failure errors getting when volume is started&stopped with SSL enabled setup.
  • #1349723: Added libraries to get server_brick dictionaries
  • #1350017: Change distaf glusterbase class and mount according to the config file changes
  • #1350168: distaf: made changes to create_volume function
  • #1350173: distaf: Adding samba_ops library
  • #1350188: distaf: minor import changes in
  • #1350191: race condition when set ctx->timer in function gf_timer_registry_init
  • #1350237: Gluster/NFS does not accept dashes in hostnames in exports/netgroups files
  • #1350245: distaf: Add library functions for gluster volume operations
  • #1350248: distaf: Modified get_pathinfo function in
  • #1350256: Distaf: Modifying the ctdb_libs to get server host from the server dict
  • #1350258: Distaf: add a sample test case to the framework
  • #1350327: Protocol client not mounting volumes running on older versions.
  • #1350371: ganesha/glusterd : remove ‘HA_VOL_SERVER’ from ganesha-ha.conf
  • #1350383: distaf: Modified distaf gluster config file
  • #1350427: distaf: Modified tier_attach() to get bricks path for attaching tier from the available bricks in server
  • #1350744: GlusterFS 3.9.0 tracker
  • #1350793: build: remove absolute paths from glusterfs spec file
  • #1350867: RFE: FEATURE: Lock revocation for features/locks xlator
  • #1351021: [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart
  • #1351071: [geo-rep] Stopped geo-rep session gets started automatically once all the master nodes are upgraded
  • #1351134: [SSL] : gluster v set help does not show ssl options
  • #1351537: [Bitrot] Need a way to set scrub interval to a minute, for ease of testing
  • #1351880: gluster volume status <volume> client” isn’t showing any information when one of the nodes in a 3-way Distributed-Replicate volume is shut down
  • #1352019: RFE: Move throttling code to libglusterfs from bitrot
  • #1352277: a two node glusterfs seems not possible anymore?!
  • #1352279: [scale]: Bricks not started after node reboot.
  • #1352423: should find_library(“c”) be used instead of find_library(“libc”) in geo-replication/syncdaemon/
  • #1352634: qemu libgfapi clients hang when doing I/O
  • #1352671: RFE: As a part of xattr invalidation, send the stat info as well
  • #1352854: GlusterFS – Memory Leak – High Memory Utilization
  • #1352871: [Bitrot]: Scrub status- Certain fields continue to show previous run’s details, even if the current run is in progress
  • #1353156: [RFE] CLI to get local state representation for a cluster
  • #1354141: several problems found in failure handle logic
  • #1354221: noisy compilation warnning with Wstrict-prototypes
  • #1354372: Fix timing issue in tests/bugs/glusterd/bug-963541.t
  • #1354439: nfs client I/O stuck post IP failover
  • #1354489: service file is executable
  • #1355604: afr coverity fixes
  • #1355628: Upgrade from 3.7.8 to 3.8.1 doesn’t regenerate the volfiles
  • #1355706: [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of getting ignored.
  • #1355956: RFE : move ganesha related configuration into shared storage
  • #1356032: quota: correct spelling mistakes in quota src files
  • #1356068: observing ” Too many levels of symbolic links” after adding bricks and then issuing a replace brick
  • #1356504: Move gf_log->gf_msg in index feature
  • #1356508: [RFE] Handle errors during SSH key generation(gsec_create)
  • #1356528: memory leak in glusterd-georeplication
  • #1356851: [Bitrot+Sharding] Scrub status shows incorrect values for ‘files scrubbed’ and ‘files skipped’
  • #1356868: File not found errors during rpmbuild: /var/lib/glusterd/hooks/1/delete/post/{c,o}
  • #1356888: Correct code in socket.c to avoid fd leak
  • #1356998: syscalls: readdir_r() is deprecated in newer glibc
  • #1357210: add several fops support in io-threads
  • #1357226: add a basis function to reduce verbose code
  • #1357397: Trash translator fails to create ‘internal_op’ directory under already existing trash directory
  • #1357463: Error: quota context not set inode (gfid:nnn) [Invalid argument]
  • #1357490: libglusterfs : update correct memory segments in glfs-message-id
  • #1357821: Make install fails second time without uninstall
  • #1358114: tests: ./tests/bitrot/br-stub.t fails intermittently
  • #1358195: Fix spurious failure of tests/bugs/glusterd/bug-1111041.t
  • #1358196: Tiering related core observed with “uuid_is_null () message”.
  • #1358244: [SNAPSHOT]: The PID for snapd is displayed even after snapd process is killed.
  • #1358594: Enable gfapi test cases in Gluster upstream regression
  • #1358608: Memory leak observed with upcall polling
  • #1358671: Add Events for Volume Set and Reset
  • #1358922: missunderstanding about GF_PROTOCOL_DICT_SERIALIZE
  • #1358936: coverity: iobuf_get_page_aligned calling iobuf_get2 should check the return pointer
  • #1358944: jbr resource leak, forget free “path”
  • #1358976: Fix spurious failures in split-brain-favorite-child-policy.t
  • #1359001: Fix spurious failures in ec.t
  • #1359190: Glusterd crashes upon receiving SIGUSR1
  • #1359370: glfs: fix glfs_set_volfile_server doc
  • #1359711: [GSS] Rebalance crashed
  • #1359717: Fix failure of ./tests/bugs/snapshot/bug-1316437.t
  • #1360169: Fix bugs in compound fops framework
  • #1360401: RFE: support multiple bricks within one process
  • #1360402: Clients can starve under heavy load
  • #1360647: gfapi: deprecate the rdma support for management connections
  • #1360670: Add output option --xml to man page of gluster
  • #1360679: Bricks doesn’t come online after reboot [ Brick Full ]
  • #1360682: tests: ./tests/bitrot/bug-1244613.t fails intermittently
  • #1360693: [RFE] Add a count of snapshots associated with a volume to the output of the vol info command
  • #1360809: [RFE] Capture events in GlusterD
  • #1361094: Auto generate header files during Make
  • #1361249: posix: leverage FALLOC_FL_ZERO_RANGE in zerofill fop
  • #1361300: Direct io to sharded files fails when on zfs backend
  • #1361678: thread CPU saturation limiting throughput on write workloads
  • #1361983: Move USE_EVENTS in gf_events API
  • #1361999: Remove ganesha xlator code from gluster code base
  • #1362144: Python library to send Events
  • #1362151: [libgfchangelog]: If changelogs are not available for the requested time range, no proper error message
  • #1362397: Mem leak in meta_default_readv in meta xlators
  • #1362520: Per xlator logging not working
  • #1362602: [Open SSL] : Unable to mount an SSL enabled volume via SMB v3/Ganesha v4
  • #1363591: Geo-replication user driven Events
  • #1363721: [HC]: After bringing down and up of the bricks VM’s are getting paused
  • #1363948: Spurious failure in tests/bugs/glusterd/bug-1089668.t
  • #1364026: glfs_fini() crashes with SIGSEGV
  • #1364420: [RFE] History Crawl performance improvement
  • #1364449: posix: honour fsync flags in posix_do_zerofill
  • #1364529: api: revert glfs_ipc_xd intended for 4.0
  • #1365455: [AFR]: Files not available in the mount point after converting Distributed volume type to Replicated one.
  • #1365489: glfs_truncate missing
  • #1365506: gfapi: use const qualifier for glfs_*timens()
  • #1366195: [Bitrot – RFE]: On demand scrubbing option to scrub
  • #1366222: “heal info –xml” not showing the brick name of offline bricks.
  • #1366226: Move alloca0 definition to common-utils
  • #1366284: fix bug in protocol/client lookup callback
  • #1367258: Log EEXIST errors at DEBUG level
  • #1367478: Second gluster volume is offline after daemon restart or server reboot
  • #1367527: core: use <sys/sysmacros.h> for makedev(3), major(3), minor(3)
  • #1367665: rotated FUSE mount log is using to populate the information after log rotate.
  • #1367771: Introduce graceful mode in
  • #1367774: Support for Client side Events
  • #1367815: [Bitrot – RFE]: Bitrot Events
  • #1368042: make fails if Events APIs are disabled
  • #1368349: tests/bugs/cli/bug-1320388.t: Infrequent failures
  • #1368451: [RFE] Implement multi threaded self-heal for ec volumes
  • #1368842: Applications not calling glfs_h_poll_upcall() have upcall events cached for no use
  • #1368882: log level set in glfs_set_logging() does not work
  • #1368931: [ RFE] Quota Events
  • #1368953: spurious netbsd run failures in tests/basic/glusterd/volfile_server_switch.t
  • #1369124: fix unused variable warnings from out-of-tree builds generate XDR headers and source files i…
  • #1369331: Memory leak with a replica 3 arbiter 1 configuration
  • #1369401: NetBSD hangs at /tests/features/lock_revocation.t
  • #1369430: Track the client that performed readdirp
  • #1369432: IATT cache invalidation should be sent when permission changes on file
  • #1369524: segment fault while join thread reaper_thr in fini()
  • #1369530: protocol/server: readlink rsp xdr failed while readlink got an error
  • #1369638: DHT stale layout issue will be seen often with md-cache prolonged cache of lookups
  • #1369721: EventApis will not work if compiled using ./configure –disable-glupy
  • #1370053: fix EXPECT_WITHIN
  • #1370074: Fix mistakes in self-heald.t
  • #1370406: build: eventtypes.h is missing
  • #1370445: Geo-replication server side events
  • #1370862: dht: fix the broken build
  • #1371541: Spurious regressions in ./tests/bugs/gfapi/bug-1093594.t
  • #1371543: Add cache invalidation stat in profile info
  • #1371775: gluster system:: uuid get hangs
  • #1372278: [RFE] Provide snapshot events for the new eventing framework
  • #1372586: Fix the test case
  • #1372686: [RFE]Reducing number of network round trips
  • #1373529: Node remains in stopped state in pcs status with “/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]” messages in logs.
  • #1373735: Event pushed even if Answer is No in the Volume Stop and Delete prompt
  • #1373740: [RFE]: events from protocol server
  • #1373743: [RFE]: AFR events
  • #1374153: [RFE] History Crawl performance improvement
  • #1374167: disperse: Integrate important events with events framework
  • #1374278: rpc/xdr: generated files are filtered with a sed extended regex
  • #1374298: “gluster vol status all clients –xml” doesn’t generate xml if there is a failure in between
  • #1374324: [RFE] Tier Events
  • #1374567: [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding parent file) in a disperse volume
  • #1374581: Geo-rep worker Faulty with OSError: [Errno 21] Is a directory
  • #1374597: [geo-rep]: AttributeError: ‘Popen’ object has no attribute ‘elines’
  • #1374608: geo-replication *changes.log does not respect the log-level configured
  • #1374626: Worker crashes with EINVAL errors
  • #1374630: [geo-replication]: geo-rep Status is not showing bricks from one of the nodes
  • #1374639: glusterfs: create a directory with 0464 mode return EIO error
  • #1374649: Support for rc.d and init for Service management
  • #1374841: Implement SIMD support on EC
  • #1375042: bug-963541.t spurious failure
  • #1375537: gf_event python fails with ImportError
  • #1375543: [geo-rep]: defunct tar process while using tar+ssh sync
  • #1375570: Detach tier commit is allowed when detach tier start goes into failed state
  • #1375914: posix: Integrate important events with events framework
  • #1376331: Rpm installation fails with conflicts error for eventsconfig.json file
  • #1376396: /var/tmp/rpm-tmp.KPCugR: line 2: /bin/systemctl: No such file or directory
  • #1376477: [RFE] DHT Events
  • #1376874: RFE : move ganesha related configuration into shared storage
  • #1377288: The GlusterFS Callback RPC-calls always use RPC/XID 42
  • #1377386: glusterd experiencing repeated connect/disconnect messages when shd is down
  • #1377570: EC: Set/unset dirty flag for all the update operations
  • #1378814: Files not being opened with o_direct flag during random read operation (Glusterfs 3.8.2)
  • #1378948: removal of file from nfs mount crashes ganesha server
  • #1379028: Modifications to AFR Events
  • #1379287: warning messages seen in glusterd logs for each ‘gluster volume status’ command
  • #1379528: Poor smallfile read performance on Arbiter volume compared to Replica 3 volume
  • #1379707: gfapi: Fix fd ref leaks
  • #1379996: Volume restart couldn’t re-export the volume exported via ganesha.
  • #1380252: glusterd fails to start without installing glusterfs-events package
  • #1383591: glfs_realpath() should not return malloc()’d allocated memory
  • #1383692: GlusterFS fails to build on old Linux distros with linux/oom.h missing
  • #1383913: spurious heal info as pending heal entries never end on an EC volume while IOs are going on
  • #1385224: arbiter volume write performance is bad with sharding
  • #1385236: invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (–>/usr/lib64/glusterfs/3.8.4/xlator/cluster/ 0-dict: !this || !value for key=link-count [Invalid argument]
  • #1385451: “nfs.disable: on” is not showing in Vol info by default for the 3.7.x volumes after updating to 3.9.0
  • #1386072: Spurious permission denied problems observed
  • #1386178: eventsapi/georep: Events are not available for Checkpoint and Status Change
  • #1386338: pmap_signin event fails to update brickinfo->signed_in flag
  • #1387099: Boolean attributes are published as string
  • #1387492: Error and warning message getting while removing glusterfs-events package
  • #1387502: Incorrect volume type in the “glusterd_state” file generated using CLI “gluster get-state”
  • #1387564: [Eventing]: UUID is showing zeros in the event message for the peer probe operation.
  • #1387894: Regression caused by enabling client-io-threads by default
  • #1387960: Sequential volume start&stop is failing with SSL enabled setup.
  • #1387964: [Eventing]: ‘gluster vol bitrot <volname> scrub ondemand’ does not produce an event
  • #1387975: Continuous warning messages getting when one of the cluster node is down on SSL setup.
  • #1387981: [Eventing]: ‘gluster volume tier <volname> start force’ does not generate a TIER_START event
  • #1387984: Add a test script for compound fops changes in AFR
  • #1387990: [RFE] Geo-replication Logging Improvements
  • #1388150: geo-replica slave node goes faulty for non-root user session due to fail to locate gluster binary
  • #1388323: fuse mount point not accessible
  • #1388350: Memory Leaks in snapshot code path
  • #1388470: throw warning to show that older tier commands are depricated and will be removed.
  • #1388563: [Eventing]: ‘VOLUME_REBALANCE’ event messages have an incorrect volume name
  • #1388579: crypt: changes needed for openssl-1.1 (coming in Fedora 26)
  • #1388731: [GSS]glusterfind pre session hangs indefinitely in RHGS 3.1.3
  • #1388912: glusterfs can’t self heal character dev file for invalid dev_t parameters
  • #1389675: Experimental translators and 4.0 features need to be disabled for release-3.9
  • #1389742: build: incorrect Requires: for portblock resource agent
  • #1390837: write-behind: flush stuck by former failed write
  • #1391448: md-cache: Invalidate cache entry in case of OPEN with O_TRUNC
  • #1392286: gfapi clients crash while using async calls due to double fd_unref
  • #1392718: Quota version not changing in the quota.conf after upgrading to 3.7.1 from 3.6.1
  • #1392844: Hosted Engine VM paused post replace-brick operation
  • #1392869: The FUSE client log is filling up with posix_acl_default and posix_acl_access messages

Upgrade Guide

A guide to upgrading from 3.7 and 3.8 can be found on our documentation site.

by on October 31, 2016

Gluster Community Newsletter, October 2016

Important happenings for Gluster this month:

A great Gluster Developer Summit this month, thanks to all who participated.

Find all of our recorded talks with slides at:


Changes to the Community Meeting

We’re trying out something new for a few weeks, we’ve removed our updates from the Community Meeting and made it an open floor instead.

Have something on your mind? has details to join the Community Meeting if you’ve never come by. We’d love to have you!


Our annual users survey will come out in November, we’ll give it until mid-December for responses and post the results. What else would you want to see on the user survey?


From the mailing lists:


Lindsay Mathieson asks about healing delays:

Pranith Kumar Karampuri asks what application workloads are too slow for you on gluster?


Hari Gowtham provides new commands for supporting add/remove brick and rebalance on tiered volume

Jeff Darcy posts on Memory-management ideas

New style community meetings – No more status updates from Kaushal M

Notes from Gluster Developer Summit from Amye Scavarda


Move of the formicary server to the new space  Michael Scherer

Top Five: Niels de Vos, Pranith Kumar K,  Kaushal M,  Aravinda VK,  Atin Mukherjee

Calls for Papers:

February 4-5, 2017 – Rotating deadlines, November 8th for Storage DevRoom

DevConf – – Jan 27-29 —  November 11th 2016

Vault: –  December 17th

by on

Gluster tiering and small file performance

Gluster can have trouble delivering good performance for small file workloads. This problem is acute for features such as tiering and RDMA, which employ expensive hardware such as SSDs or infiniband. In such workloads the hardware’s benefits are unrealized, so there is little return on the investment.

A major contributing factor to this problem has been excessive network overhead in fetching file and directory metadata. Their aggregated costs exceed the benefits of the hardware’s accelerated data transfers. This fetch is called a LOOKUP. Note that for larger file sizes, the picture changes. For large files the improved transfer times exceed the LOOKUP costs,  so in those cases RDMA and tiering features work well.

The chart below depicts the problem with RDMA. Large read-file workloads perform well, small read-file workloads perform poorly.

Screen Shot 2016-10-28 at 3.57.45 PM

The following examples use the “smallfile” [1] utility as a workload generator. I run a large 28 brick tiered volume “vol1”. The configuration’s hot tier is a 2×2 ram disk, and the cold tier is a 2 x (8 + 4) HDD. I run from a single client, mounted over FUSE. The entire working set of files resides on the hot tier. The experiments using tiering can also be found in the SNIA SDC presentation here [3].  

Running Gluster’s profile against a tiered volume generates a count of the number of LOOKUPs and depicts the problem.

$ ./  --top /mnt/p66.b --host-set gprfc066 --threads 8 \
  --files 5000 --file-size 64 --record-size 64 --fsync N --operation read
$ gluster volume profile vol1 info cumulative|grep -E 'Brick|LOOKUP'..
Brick: gprfs018:/t4
     93.29     386.48 us     100.00 us    2622.00 us          20997      LOOKUP

.. 20K LOOKUPs are sent to each brick, on the first run.

The purpose behind most LOOKUPs is to confirm the existence and permissions of a given directory and file. The client sends such LOOKUPs for each level of the path. This phenomena has been dubbed the “path traversal problem.” It is a well known issue with distributed storage systems [2]. The round trip time for each LOOKUP is not small and the cumulative effect is big. Alas, Gluster has suffered from it for years.

The utility opens a file, does an IO, and then closes it. The path is 4 levels deep (p66/file_srcdir/gprfc066/thrd_00/<file>).

The 20K figure can be derived. There are 5000 files, and 4 levels of directories. 5000*4=20K.

The DHT and tier translators must validate on which brick the file resides. To do this, the first LOOKUPs received are sent to all subvolumes. The brick that has the file is called the “cached subvolume”. Normally, it is predicted by the distributed hash’s algorithm, unless the set of bricks has recently changed. Subsequent LOOKUPs are sent only to the cached subvolume.

Regardless of this phenomenon, the cached subvolume still receives as many LOOKUPs as the path length, due to the path traversal problem. So when the test is run a second time, gluster profile still shows 20K LOOKUPs, but only on bricks on the hot tier (the tier translator’s cached subvolume), and nearly none on the cold tier. The round trips are still there, and the overall problem persists.

To cope with this “lookup amplification”, a project has been underway to improve Gluster’s meta-data cache translator (md-cache), so the stat information LOOKUP requests could be cached indefinitely on the client. This solution requires client side cache entries to be invalidated if another client modified a file or directory. The invalidation mechanism is called an “upcall.” It is complex and has taken time to be written. But as of October 2016 this new functionality is largely code complete and available in Gluster upstream.

Enabling upcall in md-cache:

$ gluster volume set <volname> features.cache-invalidation on
$ gluster volume set <volname> features.cache-invalidation-timeout 600
$ gluster volume set <volname> performance.stat-prefetch on
$ gluster volume set <volname> performance.cache-samba-metadata on
$ gluster volume set <volname> performance.cache-invalidation on
$ gluster volume set <volname> 600
$ gluster volume set <volname> network.inode-lru-limit: <big number here>

In the example, I used 90000 for the inode-lru-limit.

At the time of this writing, a cache entry will expire after 5 minutes. The code will eventually be changed to allow an entry to never expire. That functionality will come once more confidence is gained in the upcall feature.

With this enabled, gluster profile shows the number of LOOKUPs drops to a negligible number on all subvolumes. As reported by the benchmark, this translates directly to better throughput for small file workloads. YMMV, but in my experiments, I saw tremendous improvements and the SSD benefits were finally enjoyed. 

Screen Shot 2016-10-31 at 12.31.27 PM

Tuning notes..

  • The number of UPCALLs and FORGETs is now visible using Gluster’s profiler.
  • The md-cache hit/miss statistics are visible this way:
$ kill -USR1 `pgrep gluster`

# wait a few seconds for the dump file to be created

$ find /var/run/gluster -name \*dump\* -exec grep -E 'stat_miss|stat_hit' {} \;

Some caveats

  • The md-cache solution requires client side memory, something not all users can dedicate.
  • The “automated” part of gluster tiering is slow. Files are moved between tiers in a single threaded engine, and the SQL query operates in time linear to the number of files. So the set of files residing on the hot tier must be stable.

[1] Smallfile utility


[3] SNIA SDC 2016 “Challenges with persistent memory in distributed storage systems”.


by on October 7, 2016

Compiling GlusterFS with a gcc plug-in – an experiment

Back in February of this year Martin Kletzander gave a talk at on GCC plug-ins. It would seem that gcc plug-ins are a feature that has gone largely overlooked for many years.

I came back from DevConf inspired to try it out. A quick search showed me I was not alone – a colleague here at Red Hat had also seen Martin’s talk, and he wrote about his experiment here source

I had had something similar in mind to what Richard had done. I wanted to check all the struct definitions, i.e. all instances of all the variables of any particular type, and make sure the defined sizes were consistent throughout the GlusterFS sources.

Using Richard’s plug-in I found that while things were generally good, there were a couple structs that appeared to have mismatched sizes. The only thing was, Richard’s plug-in didn’t tell me where those structs were defined. And unfortunately GlusterFS has a lot of cut-and-pasted code, so it wasn’t a matter of a simple grep to find them.

As both Martin and Richard note, the GCC plug-in framework is not well documented. It was not obvious how to do something that seems – on the surface – like it should be trivial. But, with a bit of detective work, I was able to solve it. (And after the fact the change was, in fact, quite simple; finding it however took some time.)

/* plugin: public domain example code written by
 * Richard W.M. Jones, with modifications by Kaleb S. KEITHLEY

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <gcc-plugin.h>
#include <tree.h>
#include <print-tree.h>

int plugin_is_GPL_compatible;

static FILE *log_fp;

static void
plugin_finish_type (void *event_data, void *user_data)
  tree type = (tree) event_data;

  if (user_data) {
    char *c = (char *) user_data;
    fprintf (log_fp, "user_data %x %x %x %x\n", c[0], c[1], c[2], c[3]);

  /* We only care about structs, not any other type definition. */
  if (TREE_CODE (type) == RECORD_TYPE) {
    /* This is useful for working out how to navigate the tree below. */
    /* debug_tree (type); */

    /* If the type is not complete, we can't do anything. */
    if (!COMPLETE_TYPE_P (type)) {
      /* fprintf (log_fp, "struct '%s' has incomplete type\n", name); */

    /* Struct name? */
    tree name_tree = TYPE_NAME (type);

    /* Ignore unnamed structs. */
    if (!name_tree) {
      /* fprintf (log_fp, "ignoring unnamed struct\n"); */

    const char *name;
    if (TREE_CODE (name_tree) == IDENTIFIER_NODE)
      name = IDENTIFIER_POINTER (name_tree);
    else if (TREE_CODE (name_tree) == TYPE_DECL && DECL_NAME (name_tree))
      name = IDENTIFIER_POINTER (DECL_NAME (name_tree));
      name = "unknown struct name"; /* should never happen? */

    tree decl = type;
    for (; decl; decl = TREE_CHAIN (decl)) {
      if (DECL_P (decl))

    const char *filename = DECL_SOURCE_FILE (decl);
    int lineno = DECL_SOURCE_LINE (decl);

    /* Get the size of the struct that has been defined. */
    tree size_tree = TYPE_SIZE (type);
    if (TREE_CODE (size_tree) == INTEGER_CST &&
        !TYPE_P (size_tree) && TREE_CONSTANT (size_tree)) {
      size_t size = TREE_INT_CST_LOW (size_tree);
      fprintf (log_fp, "struct '%s' has size %zu [bits] in %s, line %d\n", name, size, filename, lineno);
      fprintf (log_fp, "struct '%s' has non-constant size\n", name);

  fflush (log_fp);

plugin_init (struct plugin_name_args *plugin_info,
             struct plugin_gcc_version *version)
  const char *logfile = NULL;
  size_t i;

  /* Open the log file. */
  for (i = 0; i argc; ++i) {
    if (strcmp (plugin_info->argv[i].key, "log") == 0) {
      logfile = plugin_info->argv[i].value;

  if (!logfile) {
    fprintf (stderr, "structsizes plugin: missing parameter: -fplugin-arg-structsizes-log=\n");
    exit (EXIT_FAILURE);

  log_fp = fopen (logfile, "a");
  if (log_fp == NULL) {
    perror (logfile);
    exit (EXIT_FAILURE);

  fprintf (log_fp, "Loaded structsizes plugin (GCC %s.%s.%s)\n",
           version->basever, version->devphase, version->revision);

  register_callback (plugin_info->base_name, PLUGIN_FINISH_TYPE,
                     plugin_finish_type, NULL);

  return 0;

Compile the plug-in using

    gcc -g -I`gcc -print-file-name=plugin`/include \
        -fpic -shared -o

and when you compile your source, to use the plug-in you must add

    ... -fplugin=./ \
        -fplugin-arg-structsizes-log=<logfile> ...

to the compiler command line options.

For the purposes of compiling GlusterFS with this plug-in, I changed GlusterFS’s file like this:

    CFLAGS="${CFLAGS} -g -O2 -fplugin=/path/to/ \

and then ran && ./configure && make

to build GlusterFS.

Afterwards you can

    sort -u < /tmp/ss2.out > outfile

to reduce the output to something digestible.

And in the end, the seemingly mismatched struct sizes weren’t really a problem. They were two different types – that happened to have the same type name – in two different translators.