by on September 10, 2014

Ten Stages of Technology Familiarity

Without further ado…

Never heard of it.

Yeah, I hear all the hipsters yammering about it.

I checked out the docs and examples once.

I used it for a side project.

We’re using it for some new projects at work.

We’re using it in production.

Read More

by on September 5, 2014

Clear enormous GlusterFS mount logs

Today Munin was complaining that a partition is nearly full on one of my servers. Looking at the disk usage graph it kinda seems like a slow loris DOS attack… Sure enough, something has gone and filled up the /var/log partition: $ df -h /var/log/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_root-log 9.9G 9.0G […]

Read More

by on September 2, 2014

GlusterFS and NFS-Ganesha integration

Over the past few years, there was an enormous increase in the number of user-space filesystems being developed and deployed. But one of the common challenges which all those filesystems’ users had to face was that there was a huge performance hit when their filesystems were exported via kernel-NFS (well-known and widely used network protocol).To […]

Read More

by on August 26, 2014

Why I still use Vagrant on Linux

Three events last week got me thinking:  I really need to clarify to the world why I use vagrant. Michael Gorsuch RJ : Nowling Vagrant isnt helpful when your using docker. Why not just use docker run? And finally, the Roman and Cos, our …

Read More

by on August 21, 2014

User Story: Chitika Boosts Big Data with GlusterFS

Chitika Inc., an online advertising network based in Westborough, MA, sought to provide its data scientists with faster and simpler access to its massive store of ad impression data. The company managed to boost availability and broaden access to its data by swapping out HDFS for GlusterFS as the filesystem backend for its Hadoop deployment. […]

Read More

by on August 15, 2014

Running CDH5 on GlusterFS

I have recently spent some time getting Cloudera’s CDH 5 distribution of Apache Hadoop to work on GlusterFS using Distributed Replicated 2 Volumes. This is made possible by the fact that Apache Hadoop has a pluggable filesystem architecture that allows the computational components within the CDH 5 distribution to be configured to use alternative filesystems to HDFS. In this case, one can configure CDH 5 to use the Hadoop FileSystem plugin for GlusterFS (glusterfs-hadoop), which allows it to run on Gluster.  I’ve provided a diagram below that illustrates the CDH 5 core processes and how they interact with GlusterFS.

Running a Single CDH 5 Deployment on One or More GlusterFS Volumes

Given that the CDH 5 distribution is comprised of other components besides YARN and MapReduce,
I used the Apache Bigtop System Testing Framework to explicitly validate that Apache Sqoop, Apache Flume, Apache Pig, Apache Hive, Apache Oozie, Apache Mahout, Apache ZooKeeper, Apache Solr and Apache HBase also ran successfully.  Work is Still in Progress to Enable the Use of Impala.  

 If you would like to participate in accelerating the work on Impala, please reach out to us on the Gluster mailing list.

Implementation details for this solution and the specific setup required for all the components are available on the glusterfs-hadoop project wiki. If you have additional questions, feel free to reach out to me on FreeNode (IRC handle jayunit100), @jayunit100 on twitter, or via the Gluster mailing list.

Read More

by on August 14, 2014

Vagrant: More than just VMs

PART 1 :  Vagrant in the Container If you use vagrant to maintain your dev recipes, then your natural prediliction might be to now move to supporting docker.Using vagrant to wrap docker means you can run docker apps from anywhere, and maintain the…

Read More