all posts tagged libgfapi
If you saw our Gluster Spotlight (“Integration Nation”) last week, you’ll recall that Javi and Jaime from the OpenNebula project were discussing their recent advances with GlusterFS and libgfapi access. Here’s a post where they go into some detail about it:
The good news is that for some time now qemu and libvirt have native support for GlusterFS. This makes possible for VMs running from images stored in Gluster to talk directly with its servers making the IO much faster.
In this case, they use GFAPI for direct virtual machine access in addition to the FUSE-based GlusterFS client mount for image registration as an example of using the best tool for a particular job. As they explain, OpenNebula administrators expect a mounted, POSIX filesystem for many operations, so the FUSE-based mount fits best with their workflow while GFAPI works when lower latency and better performance are called for.
Read the full post here.
The GFAPI integration is slated for the 4.6 release of OpenNebula. To get an early look at the code, check out their Git repository. Documentation is available here.
This post describes modifications to the Linux Target driver to work with Gluster’s “gfapi” . It is a follow up to an earlier post on Gluster’s block IO performance over iSCSI. Those tests used FUSE, which incurred data copies and context switches.
That “FUSE penalty” can be avoided using libgfapi. The libgfapi library can be inserted into the Linux target driver rather easily, as it has a nice extensible framework. In this manner IO commands may be forwarded directly to the storage server.
The diagram below depicts gluster’s datapath when FUSE is present and absent. Using gfapi, the red boxes are removed (a kernel trip and extra hop).
To make that happen, I’ve modified the Linux target driver to have a “gluster” module. The implementation was mostly a one–to-one replacement of POSIX APIs with “gluster” APIs. For example read and writes were translated into gf_read and gf_writes. Some performance results are at the end of this post.
Feel free to download it off the forge. I expect it will be open sourced after some cleanup, additional testing, etc.
Interestingly, Ceph has already done this. Their plug-in to the Linux target driver uses librados to access the Ceph OSD. It is not enabled by default – you need to recompile the target driver to use it.
- The Linux target driver has been replaced by LIO in RHEL7. LIO is a kernel entity similar to SCST. It got into version 2.6.38 (March 2011). The performance implications of using it are unclear. I’ll take a look at that in January.
- Performance seemed to get worse as the number of streams goes up. This may be due to the synchronous gfapi calls used. A future post will look into some of the options to increase parallelism, such as:
- The target API has an option to set the number of worker threads.
- gfapi has asynchronous APIs.
- Same 2 node configuration as in this earlier post.
- The performance improvement of gfapi in lieu of FUSE varied. But generally, libgfapi bought a 10-20% speedup.
- I’ll update this post with random read performance.
The Gluster Community would like to congratulate the OpenStack Foundation and developers on the Havana release. With performance-boosting enhancements for OpenStack Block Storage (Cinder), Compute (Nova) and Image Service (Glance), as well as a native template language for OpenStack Orchestration (Heat), the OpenStack Havana release points the way to continued momentum for the OpenStack community. The many storage-related features in the Havana release coupled with the growing scope of typical OpenStack deployments demonstrate the need for scale-out, open software-defined storage solutions. The fusion of GlusterFS open software-defined storage with OpenStack software is a match made in cloud heaven.
Naturally, the Gluster Community would like to focus on OpenStack enhancements that pertain directly to our universe:
- OpenStack Image Service (Glance)
- OpenStack Cinder can now be used as a block-storage back-end for the Image Service. For Gluster users, this means that Glance can point to the same image as Cinder, which means it is not necessary to copy the entire image before deploying, saving some valuable time.
- OpenStack Compute (Nova)
- OpenStack integration with GlusterFS utilizing the QEMU/libgfapi integration reduces the kernel space to user space context switching to significantly boost performance.
- When connecting to NFS or GlusterFS backed volumes, Nova now uses the mount options set in the Cinder configuration. Previously, the mount options had to be set on each Compute node that would access the volumes. This allows operators to more easily automate the scaling of their storage platforms.
- QEMU-assisted snapshotting is now used to provide the ability to create cinder volume snapshots, including GlusterFS.
- OpenStack Orchestration (Heat)
- Initial support for native template language (HOT). For OpenStack operators, this presents an easier way to orchestrate services in application stacks.
- OpenStack Object Storage (Swift)
- There is nothing in the OpenStack Havana release notes pertaining to GlusterFS and Swift integration but we always like to talk about the fruits of our collaboration with Swift developers. We are dedicated to using the upstream Swift project API/proxy layer in our integration, and the Swift team has been a pleasure to work with, so kudos to them.
- OpenStack Data processing (Savanna)
- This incubating project enables users to easily provision and manage Apache Hadoop clusters on OpenStack. It’s a joint project between Red Hat, Mirantis and HortonWorks and points the way towards “Analytics as a Service”. It’s not an official part of OpenStack releases yet, but it’s come very far very quickly, and we’re excited about the data processing power it will spur.
To give an idea of the performance improvements in the GlusterFS-QEMU integration that Nova now takes advantage of, consider the early benchmarks below published by Bharata Rao, a developer at IBM’s Linux Technology Center.
FIO READ numbers
|QEMU GlusterFS block driver (FUSE bypass)
FIO WRITE numbers
|QEMU GlusterFS block driver (FUSE bypass)
“Base” refers to an operation directly on a disk filesystem.
Havana vs. Pre-Havana
This is a snapshot to show the difference between the Havanna and Grizzly releases with GlusterFS.
|Glance – Could point to the filesystem images mounted with GlusterFS, but had to copy VM image to deploy it
||Can now point to Cinder interface, removing the need to copy image
|Cinder – Integrated with GlusterFS, but only with Fuse mounted volumes
||Can now use libgfapi-QEMU integration for KVM hypervisors
|Nova – No integration with GlusterFS
||Can now use the libgfapi-QEMU integration
|Swift – GlusterFS maintained a separate repository of changes to Swift proxy layer
||Swift patches now merged upstream, providing a cleaner break between API and implementation
The Orchestration feature we are excited about is not Gluster-specific, but has several touch points with GlusterFS, especially in light of the newly-introduced Manila FaaS project for OpenStack (https://launchpad.net/manila). Imagine being able to orchestrate all of your storage services with Heat, building the ultimate in scale-out cloud applications with open software-defined storage that scales with your application as needed.
We’re very excited about the Havana release and we look forward to working with the global OpenStack community on this and future releases. Download the latest GlusterFS version, GlusterFS 3.4, from the Gluster Community at gluster.org, and check out the performance with a GlusterFS 3.4-backed OpenStack cloud.