all posts tagged use case
Louis Zuckerman, CTO of Picture Marketing, is working on not one, but two interesting projects for Gluster. Zuckerman is working on a Java filesystem backed by GlusterFS and Java Native Interface (JNI) bindings for GlusterFS’s native library (libgfapi).
Zuckerman says he’s using GlusterFS with storing media for Picture Marketing. “Brand ambassadors use our mobile apps to take pictures and videos at events and upload them to our online platform. After processing the uploads our system stores the media in a GlusterFS cluster. From there it is served to event attendees through custom web sites made specifically for the events.”
According to Zuckerman GlusterFS “is ideal for our use case.”
“Over the last two years we’ve enjoyed excellent reliability and superb performance from our cluster in EC2,” says Zuckerman. “Thanks to GlusterFS’ scale-out architecture we can grow our processing and web app clusters to accommodate increased demand for our online services. This is critical for our business since our system has been used by over half the top 100 brands in the US, at major sports venues, retail stores, and all kinds of events where brand ambassadors interact with customers.”
Scratching The Itch
While GlusterFS provided the features and stability that Picture Marketing needs, Zuckerman had to roll up his sleeves a bit to ensure he could run it on his system of choice.
Zuckerman began working with GlusterFS in late 2010 on EC2, and worked on packaging Gluster for 32-bit systems because the Gluster only provided 64-bit packages. “At that time Gluster only provided 64-bit packages, and the downstream packages provided by Debian (and thus Ubuntu) were stuck at a version a year older due to bugs. I fixed the bugs in Debian and became co-maintainer of the Debian project’s GlusterFS packages (helping out lead maintainer Patrick Matthaei whenever I can). I’ve also been providing my own packages specially tailored for Ubuntu since that time.”
That work led to Zuckerman being tapped as the official Debian and Ubuntu packager for GlusterFS, and to a seat on Gluster’s community advisory board. Not that he wants to keep all the fun and glory to himself. “I’d like to see more people get involved with the packaging process. I’m grateful for those who take the time to report bugs in the packages, and try to help anyone interested in rolling their own based on my or Debian’s sources.”
After tackling the packaging problem, Zuckerman started working on a few projects of interest around Java and GlusterFS.
Building a Filesystem Service Provider for Java 7
Currently, Zuckerman says that the projects are for fun. “Java is one of the languages I know fairly well and I thought that implementing an NIO.2 filesystem provider would be a fun challenge. (It sure is!) The project is actually a pair of related software packages: a Java JNI wrapper around the libgfapi C library (libgfapi-jni), and an implementation of the NIO.2 filesystem service provider API (glusterfs-java-filesystem) that uses the JNI library.”
He notes that Hiram Chirino was “instrumental” in getting the libgfapi-jni off the ground, and “probably would not have been able to make a JNI wrapper for the libgfapi C library without his support and the JNI code generator, HawtJNI” which is written by Chirino.
He also says he’d like to find a few co-contributors for the projects. “The Java projects are still in infancy and I have lots of plans for new features. Unfortunately I don’t have as much free time to put into coding as I would like so things are progressing slowly.”
Overall, Zuckerman says that he’s had a good experience working with the Gluster community. “I have enjoyed a good rapport with the GlusterFS developers, and other community members, since I first began using GlusterFS back in late 2010,” says Zuckerman.
“I’ve asked lots of questions over the years and the developers are extremely knowledgeable, helpful, and kind in their support of users. That was a big motivation for me to get involved, and stay involved, with the project. I like the software and get along well with the people who make it.”
Have questions about Zuckerman’s projects? You can find him on Freenode as semiosis and on Twitter as @pragmaticism. Questions about Gluster development in general? Check out the #gluster channel on irc.gnu.org, or join the mailing lists to get help from the Gluster community.
Cutting Edge, a visual effects company that’s worked on films such as The Great Gatsby and I, Frankenstein, had outgrown its NAS storage system and was in search of a way to boost its storage capacity and performance in the face of several large upcoming projects. The Australia-based firm turned to GlusterFS as an alternative to making a massive investment in an enterprise SAN.
I spoke to Dan Mons, R&D SysAdmin at Cutting Edge and architect of the company’s GlusterFS deployment, about how he tapped Gluster to meet Cutting Edge’s growing storage needs.
“We’ve had three feature films roll through our Gluster storage since it went in, and to be 100% honest we couldn’t have done them without Gluster,” Mons said. “The flexibility it offers us for storage is amazing.”
The GlusterFS storage solution that Mons assembled consists of 24 total GlusterFS 3.4.1 nodes, each running CentOS 6.4 and outfitted with 34TB of RAID6 storage. These nodes are assembled into four six-node clusters, which provide the company’s Brisbane and Sydney offices each with its own production and backup cluster pair.
Each cluster hosts a distributed-replicated GlusterFS volume, which keeps data accessible in the event of node failure. Nightly rsync operations between the production and backup clusters at each location provide an additional layer of data protection.
Users in Cutting Edge’s Sydney and Brisbane offices have access to 107TB of production storage, and read-only access to another 107TB on each location’s the backup cluster.
Mons explained that given data volume, time and bandwidth issues, it isn’t feasible to synchronize completely the data generated at the two offices, but that the company’s artists have access to scripts to sync particular folders between the locations when it’s necessary to collaborate with co-workers in another office.
With a client pool that runs the gamut from Linux-powered render machines and individual workstations to machines running OS X, Windows, and a handful of specialty OSes, ensuring access to their data across multiple platforms and protocols has been one of the trickier parts of the Cutting Edge deployment.
The Linux machines that comprise that majority of the company’s client mix access the cluster via the GlusterFS FUSE client, which provides access to all six nodes in the production cluster directly, for maximum bandwidth distribution. Older Linux and machines running speciality OSes tap the cluster via Gluster’s NFS support, with DNS round robin for distributing the load.
Mons explained that while the OS X-based machines in his company’s environment are able to access the GlusterFS cluster normally via NFS or CIFS mounts using command line tools, he’s run into various issues with the OS X Finder application and with Carbon or Cocoa-based OS X applications.
To work around these issues, the team at Cutting Edge set up a separate Linux server that mounts the GlusterFS volume with the FUSE client, and then re-exports that as AFP via Netatalk3. This method works, but at the cost of performance and of compatibility with some of the firm’s pipeline processes. Ideally, Mons would like to see a FUSE client become available for OS X.
The company’s Windows-based machines access the cluster via Samba, installed on each node in the cluster, with DNS round robin for distributing the load and Active Directory for authentication. Mons said that his team encountered file locking issues with certain applications, most of which they were able to resolve, although they’ve continued to experience issues with Photoshop and Microsoft Office on Windows.
Since their March 2013 deployment, the Cutting Edge storage solution has undergone updates from GlusterFS 3.3.1 to 3.4.0, and most recently, to 3.4.1, all of which have gone smoothly. Mons noted that the latest GlusterFS updates have brought noticable speed and NFS stability improvements, benefiting legacy and turnkey systems for which the FUSE client is not an option.
Looking ahead, Cutting Edge plans to add new node pairs to their production and backup clusters in early 2014, as their production clusters are nearing 90% capacity, with more project data on the way.
Mons told me that he’s begun testing Samba with Gluster’s recent libgfapi enhancements, which appear to boost file browsing performance in his environment. Along similar lines, Mons is looking forward to seeing support for storing directory and file information in extended attributes make its way into GlusterFS, which promise to speed list directory and disk usage operations.
Theron Conrey writes about using:
BitTorrent Sync as Geo-Replication for Storage
We got a chance to talk about this idea at Linuxcon. I’m not entirely convinced there aren’t some problem edge cases with this solution, but I think it will be hard to tell as long as the BitTorrent sync library is proprietary. I did come up with a special case of Theron’s idea that I believe could work well.
The special case uses the optimization that the synchronization (or file transferring) is unidirectional. This avoids any coherency complications involved if both sides were to write to the same file. Combined with the BitTorrent protocol, this does what normal torrent usage does, except with BitTorrent sync, we’re looking at a folder full of files.
What kind of synchronization would benefit from this model? Repository mirroring! This is exactly a folder full of files, but going in only one direction. Instead of yum or deb mirrors each running rsync, they could use BitTorrent sync, and because of the large amount of available upload bandwidth usually available on these mirrors, “seeding”, wouldn’t be a problem, and the worldwide pool would synchronize faster.
Can we apply this to user mirroring, net installers, and machine updating? Absolutely. I believe someone has already looked into the updates scenario, but it didn’t progress for some reason. The more convincing case is still the server geo-replication of course.
Obviously, using glusterfs with puppet-gluster to host the mirrors could be a good fit. You might not even need to use any gluster replication when you have built-in geo-replication via other mirrors.
If someone works up the open source BitTorrent parts, I’m happy to hack together the puppet parts to turn this into a turn-key solution for mirror hosts.
Hope you liked this idea.
Rock the Vote needed a way to manage the fast growth of the data handled by its Web-based voter registration application. The organization turned to GlusterFS replicated volumes to allow for filesystem size upgrades on its virtualized hosting infrastructure without incurring downtime.
Over its twenty-one year history, Rock the Vote has registered more than five million young people to vote, and has become a trusted source of information about registering to vote and casting a ballot.
Since 2009, Rock the Vote has run a Web-based voter registration application, powered by an open source rails application stack called Rocky.
I talked to Lance Albertson, Associate Director of Operations at the Oregon State University Open Source Lab and primary technical systems operation lead for the service, about how they’re using Gluster to provide for the service’s growing storage requirements.
“During a non-election season,” Albertson explained, “the filesystem use and growth is minimal, however during a presidential election season, the growth of the filesystem can be exponential. So with Gluster we’re trying to solve the sudden growth problem we have.”
Rock the Vote’s voter registration application is served from a virtual machine instance running Gentoo Hardened, with a pair of physical servers running CentOS 6 with Gluster 3.3.0 to host voter registration form data. The storage nodes host a replicated GlusterFS volume, which the registration front end accesses via Gluster’s NFS mount support.
The Gluster-backed iteration of the voter registration application started out in September with a 100GB volume, which the team stepped up incrementally to 350GB as usage grew in the period leading up to the election.
Before implementing Gluster for their storage needs, Rock the Vote’s application hosting team was using local storage within their virtual machines to store the voter form data, which made it difficult to expand storage without bringing their VMs down to do so.
The hosting team shifted storage to an HA NFS cluster, but found the implementation fragile and prone to breakage when adding/removing NFS volumes and shares.
“Gluster allowed us more flexibility in how we manage that storage without downtime,” Albertson continued, “Gluster made it easy to add a volume and grow it as we needed.”
Looking ahead to future election seasons, and forthcoming GlusterFS releases, Albertson told me that the Gluster attribute he’s most interested in is limited-downtime upgrades between version 3.3.0 and future Gluster releases. Albertson is also looking forward to the addition of multi-master support in Gluster’s geo-replication capability, an enhancement planned for the upcoming 3.4 version.