all posts tagged tools


by on December 3, 2014

Introducing gdash – GlusterFS Dashboard

UPDATE: Added --gluster option to specify the path to gluster. By default it looks for /usr/sbin/gluster, If you installed GlusterFS using source install then use sudo gdash --gluster /usr/local/sbin/gluster. (Those who already installed gdash, can run sudo pip install -U gdash to upgrade.)

gdash is a super-young project, which shows GlusterFS volume information about local, remote clusters. This app is based on GlusterFS's capability of executing gluster volume info and gluster volume status commands for a remote server using --remote-host option.

If you can run gluster volume info --remote-host=<HOST_NAME>, then you can monitor that cluster using gdash. Make sure you allow to access glusterd port(24007) for the machine where you will run gdash.

To install,

sudo pip install gdash

or

sudo easy_install gdash

gdash is created using Python Flask and ember (I used ember-cli).

gdash home screen

gdash home screen

gdash detail screen

gdash Volume details page

Usage

Use case 1 - Local Volumes

Just run sudo gdash, gdash starts running in port 8080. visit http://localhost:8080 to view GlusterFS volumes of local machine.

Use case 2 - Remote Volumes

Run sudo gdash --host 192.168.1.6, visit http://localhost:8080 to view GlusterFS volume information of remote host. Dashboard shows all the volumes which are part of that remote host.

Use case 3 - Multiple clusters

Create a clusters.conf file as example shown below, specify at least one host from each cluster.

[clusters]
cluster1 = host1, host2, host3
cluster2 = host4, host5, host6

Run gdash using,

sudo gdash --clusters ~/clusters.conf

Use case 4 - Multiple teams

If two teams monitoring two clusters and if you don't want to share the other cluster details then, just run below commands in two terminals and give respective URL to each team. Other solution is create two seperate config files and run it separately for different ports.

# Team 1, who monitors cluster1 http://localhost:8001
sudo gdash -p 8001 --clusters ~/clusters.conf --limit-cluster cluster1

# Team 2, who monitors cluster2 http://localhost:8002
sudo gdash -p 8002 --clusters ~/clusters.conf --limit-cluster cluster2

Available Options

usage: gdash [-h] [--port PORT] [--cache CACHE] [--debug] [--host HOST]
             [--clusters CLUSTERS] [--limit-cluster LIMIT_CLUSTER]

GlusterFS dashboard
-------------------

This tool is based on remote execution support provided by
GlusterFS cli for `volume info` and `volume status` commands

optional arguments:
  -h, --help            show this help message and exit
  --port PORT, -p PORT  Port
  --cache CACHE, -c CACHE
                        Cache output in seconds
  --debug               DEBUG
  --host HOST           Remote host which is part of cluster
  --clusters CLUSTERS   Clusters CONF file
  --limit-cluster LIMIT_CLUSTER
                        Limit dashboard only for specified cluster

Code is hosted in github/aravindavk, licensed under MIT.

by on May 12, 2014

gvolinfojson – A utility to convert xml output of gluster volume info to json

Today I wrote a small utility using golang to convert xml output of command gluster volume info to json.

Download the binary from here and copy to /usr/local/bin directory(or any other directory, which is available in PATH).

wget https://github.com/aravindavk/gvolinfojson/releases/download/1.0/gvolinfojson
sudo cp gvolinfojson /usr/local/bin/
sudo chmod +x /usr/local/bin/gvolinfojson

Or

If you have golang installed(make sure $GOPATH/bin is available in PATH), then

go get github.com/aravindavk/gvolinfojson

To use it with gluster volume info command,

sudo gluster volume info --xml | gvolinfojson

Thats it, you will get the json output of volume info command. If you need pretty json output then

sudo gluster volume info --xml | gvolinfojson --pretty

Source code is available here.

C & S Welcome.

by on

gvolinfojson – A utility to convert xml output of gluster volume info to json

Today I wrote a small utility using golang to convert xml output of command gluster volume info to json.

Download the binary from here and copy to /usr/local/bin directory(or any other directory, which is available in PATH).

wget https://github.com/aravindavk/gvolinfojson/releases/download/1.0/gvolinfojson
sudo cp gvolinfojson /usr/local/bin/
sudo chmod +x /usr/local/bin/gvolinfojson

Or

If you have golang installed(make sure $GOPATH/bin is available in PATH), then

go get github.com/aravindavk/gvolinfojson

To use it with gluster volume info command,

sudo gluster volume info --xml | gvolinfojson

Thats it, you will get the json output of volume info command. If you need pretty json output then

sudo gluster volume info --xml | gvolinfojson --pretty

Source code is available here.

C & S Welcome.

by on May 9, 2014

Gluster scale-out tests: an 84 node volume

This post describes recent tests done by Red Hat on an 84 node gluster volume.  Our experiments measured performance characteristics and management behavior. To our knowledge, this is the largest performance test ever done under controlled conditions within the organization (we have heard of larger clusters in the community but do not know any details about them).

Red Hat officially supports up to 64 gluster servers in a cluster and our tests exceed that. But the problems we encounter are not theoretical. The scalability issues appeared to be related to the number of bricks, not the number of servers. If a customer was to use just 16 servers, but have 60 drives on each, they would have 960 bricks and likely see similar issues to what we found.

Summary: With one important exception, our tests show gluster scales linearly on common I/O patterns. The exception is on file create operations. On creates, we observe network overhead increase as the cluster grows. This issue appears to have a solution and a fix is forthcoming.

We also observe that gluster management operations become slower as the number of nodes increases. bz 1044693 has been opened for this.  However, we were using the shared local disk on the hypervisor, rather than the disk dedicated to the VM. When this was changed, performance of the commands increased, e.g. 8 seconds.

Configuration

Configuring an 84 node volume is easier said than done. Our intention was to build a methodology (tools and procedures) to spin up and tear down a large cluster of gluster servers at will.

We do not have 84 physical machines available. But our lab does have very powerful servers (described below). They can run multiple gluster servers at a time in virtual machines.  We ran 12 such VMs on each physical machine. Each virtual machine was bound to its own disk and CPU. Using this technique, we are able to use 7 physical servers to test 84 nodes.

Tools to setup and manage clusters of this many virtual machines are nascent. Much configuration work must be done by hand. The general technique is to create a “golden copy” VM and “clone” it many times. Care must be taken to keep track of IP addresses, host names, and the like. If a single VM is misconfigured , it can be difficult to locate the problem within a large cluster.

Puppet and Chef are good candidates to simplify some of the work, and vagrant can create virtual machines and do underlying resource management, but everything still must be tied together and programmed. Our first implementation did not use the modern tools. Instead, crude but effective bash, expect, and kickstart scripts were written. We hope to utilize puppet in the near term with the help from gluster configuration management guru James Shubin. If you like ugly scripts, they may be found here.

One of the biggest problem areas in this setup was networking. When KVM creates a Linux VM, a hardware address and virtual serial port console exist, and an IP address can be obtained using DHCP. But we have a limited pool of IP addresses on our public subnet- and our lab’s system administrator frowns upon 84 new IP addresses being allocated out of the blue.  Worse, the public network is 1GbE ethernet – too slow for performance testing.

To workaround those problems, we utilized static IP addresses on a private 10GbE ethernet network. This network has its own subnet and is free from lab restrictions. It does not have a DHCP server. To set the static IP address, we wrote an “expect” script which logs into the VM over the serial line, and modifies the network configuration files.

At the hypervisor level, we manually set up the virtual bridge, the disk configurations, and set the virtual-host “tuned” profile.

Once the system was built, it quickly became apparent that another set of tools would be needed to manage the running VMs. For example, it is sometimes necessary to run the same command across all 84 machines. Bash scripts was written to that end, though other tools (pdsh) could have been used.

Test results

With that done, we were ready to do some tests. Our goals were:

  1. To confirm gluster “scales linearly” for large and small files- as new nodes are added, performance increases accordingly
  2. To examine behavior on large systems. Do all the management commands work?

Large files tests: gluster scales nicely.

scaling

Small file tests: gluster scales on reads, but not write-new-file.

smf-scaling

Oops. Small file writes are not scaling linerally. Whats going on here?

Looking at wireshark traces, we observed many LOOKUP calls sent to each of the nodes for every file create operation. As the number of nodes increased, so did the number of LOOKUPs. It turns out that the gluster client was doing a multicast lookup on every node on creates. It does this to confirm the file does not already exist.

The gluster parameter “lookup-unhashed” forces DHT hashing to be used. This will send the LOOKUP to the node where the new file should reside, rather than doing a multicast to all nodes. Below are the results when this setting is enabled.

write-new-file test results with the parameter set (red line). Much better!

lookup-unhashed

 

This parameter is dangerous. If the cluster’s brick topography has changed and the rebalancing was aborted, gluster may find itself in a situation believing a file does not exist, when it really does. In other words, the LOOKUP existence test would generate a false negative because DHT would have the client look to the wrong nodes. This could result in two GFIDs being accessible by the same path.

A fix is being written. It will assign generation counts to bricks. By default DHT will be used on lookups. But if the generation counts indicated a topography change had taken place on the target bricks, the client will revert to the slower broadcast mode of operation.

We observed any management commands dealing with the volume took as long as a minute. For example, the “gluster import” command on the oVirt UI takes more then 30 seconds to complete. Bug 1044693 was opened for this. In all cases the management command worked, but was very slow. See note above in red.

Future

Some additional tests we could do were suggested by gluster engineers. This would be future work:

  1. object enumeration – how well does “ls” scale for large volumes?
  2. What is the largest number of small objects (files) that a machine can handle before it makes sense to add a new node
  3. Snapshot testing for scale-out volumes
  4. Openstack behavior – what happens when the number of VMs goes up? We would look at variance and latency for the worse case.

Proposals to do larger scale-out tests:

  • We could present to gluster volumes partitions of disks. For example, a single 1TB drive could be divided into 10 100GB drives.  This could boost the cluster size by an order of magnitude. Given the disk head would be shared by multiple servers, this technique would only make sense for random I/O tests (where the head is already under stress).
  • Experiment with running gluster servers within containers.

Hardware

Gluster volumes are constructed out out a varying number of bricks embedded within separate virtual machines.   Each virtual machine has:

  • dedicated 7200-RPM SAS disk for Gluster brick
  • a file on hypervisor system disk for the operating system image
  • 2 Westmere or Sandy Bridge cores
  • 4 GB RAM

The KVM hosts are 7 standard Dell R510/R720 servers with these attributes:

  • 2-socket Westmere/Sandy Bridge Intel x86_64
  • 48/64 GB RAM
  • 1 10-GbE interface with jumbo frames (MTU=9000)
  • 12 7200-RPM SAS disks configured in JBOD mode from a Dell PERC H710 (LSI MegaRAID)

For sequential workloads, we use only 8 out of 12 guests in each host so that aggregate disk bandwidth never exceeds network bandwidth.

Clients are 8 standard Dell R610/R620 servers with:

  • 2-socket Westmere/Sandy Bridge Intel x86_64
  • 64 GB RAM
  • 1 10-GbE Intel NIC interface with jumbo frames (MTU=9000)
by on September 23, 2013

glusterdf – df for gluster volumes

A CLI utility to check the disk usage of glusterfs volumes. Using df command we can view the disk usage of only mounted glusterfs volumes. This utility takes care of mounting gluster volumes available in the machine where this command is executed. glusterdf uses libgfapi provided by glusterfs to fetch the statvfs information.

Installation is very simple,

git clone https://github.com/aravindavk/glusterfs-tools.git
cd glusterfs-tools
sudo python setup.py install

You can also clone this project from forge.gluster.org/glusterfs-tools

Once installed, two tools will be available glustervolumes and glusterdf.

sudo glusterdf --help to know more about options available. (same for glustervolumes sudo glustervolumes --help)

Usage examples:

glusterdf -h

sudo glusterdf -h (Disk usage in human readable format)

glusterdf -i

sudo glusterdf -i (View inodes usage information)

sudo glusterdf --status up --type repl -h

sudo glusterdf --status up --type repl -h (View all running replicated volumes)

sudo glusterdf -h --volumewithbrick "/b[12]"

sudo glusterdf -h --volumewithbrick "/b[12]"

sudo glusterdf --status up --type repli -h --json | python -m json.tool

sudo glusterdf --status up --type repli -h --json | python -m json.tool

glusterdf --help

sudo glusterdf --help

In my previous blog(this) I wrote about gfvolumes(now it is glustervolumes). glusterfs-tools is rewritten as python library which can be used with your Python programs.

For example

from glusterfstools import volumes, gfapi
# Get all volumes
vols = volumes.get()
# Get a specific volume information
vol = volumes.get(name="gv1")
# Search volumes by status
down_volumes = volumes.search({"status": "down"})
# Search volumes by type
distribute_volumes = volumes.search({"type": "distribute"})
# Statvfs information
vol_statvfs = gfapi.statvfs("gv1")
# To view information about gluster volumes which are down
# and having bricks like "/gfs"
vols = volumes.search({"status": "down", "volumewithbricks": "/gfs"})
# To view filters available
print (volumes.filters())

volumes.search accepts filters as parameter, extending volume filters is very simple. For example name filter looks like this(src/glusterfstools/volumefilters.py)

@filter("name")
def name_filter(vols, value):
    def is_match(vol, value):
        if value in ['', 'all'] or 
            vol["name"].lower() == value.lower().strip() or 
            re.search(value, vol["name"]):
            return True
        else:
            return False

    return [v for v in vols if is_match(v, value)]

The filter can be used as below

from glusterfstools import volumes

# Filters the volumes with name either gv1 or gv2
filters = {"name": "gv[12]"}
print volumes.search(filters)
by on

glusterdf – df for gluster volumes

A CLI utility to check the disk usage of glusterfs volumes. Using df command we can view the disk usage of only mounted glusterfs volumes. This utility takes care of mounting gluster volumes available in the machine where this command is executed. glusterdf uses libgfapi provided by glusterfs to fetch the statvfs information.

Installation is very simple,

git clone https://github.com/aravindavk/glusterfs-tools.git
cd glusterfs-tools
sudo python setup.py install

You can also clone this project from forge.gluster.org/glusterfs-tools

Once installed, two tools will be available glustervolumes and glusterdf.

sudo glusterdf --help to know more about options available. (same for glustervolumes sudo glustervolumes --help)

Usage examples:

glusterdf -h

sudo glusterdf -h (Disk usage in human readable format)

glusterdf -i

sudo glusterdf -i (View inodes usage information)

sudo glusterdf --status up --type repl -h

sudo glusterdf --status up --type repl -h (View all running replicated volumes)

sudo glusterdf -h --volumewithbrick "/b[12]"

sudo glusterdf -h --volumewithbrick "/b[12]"

sudo glusterdf --status up --type repli -h --json | python -m json.tool

sudo glusterdf --status up --type repli -h --json | python -m json.tool

glusterdf --help

sudo glusterdf --help

In my previous blog(this) I wrote about gfvolumes(now it is glustervolumes). glusterfs-tools is rewritten as python library which can be used with your Python programs.

For example

from glusterfstools import volumes, gfapi
# Get all volumes
vols = volumes.get()
# Get a specific volume information
vol = volumes.get(name="gv1")
# Search volumes by status
down_volumes = volumes.search({"status": "down"})
# Search volumes by type
distribute_volumes = volumes.search({"type": "distribute"})
# Statvfs information
vol_statvfs = gfapi.statvfs("gv1")
# To view information about gluster volumes which are down
# and having bricks like "/gfs"
vols = volumes.search({"status": "down", "volumewithbricks": "/gfs"})
# To view filters available
print (volumes.filters())

volumes.search accepts filters as parameter, extending volume filters is very simple. For example name filter looks like this(src/glusterfstools/volumefilters.py)

@filter("name")
def name_filter(vols, value):
    def is_match(vol, value):
        if value in ['', 'all'] or 
            vol["name"].lower() == value.lower().strip() or 
            re.search(value, vol["name"]):
            return True
        else:
            return False

    return [v for v in vols if is_match(v, value)]

The filter can be used as below

from glusterfstools import volumes

# Filters the volumes with name either gv1 or gv2
filters = {"name": "gv[12]"}
print volumes.search(filters)
by on June 17, 2013

GlusterFS Tools

UPDATE:
Installation and usage is simplified with the new release of glusterfs-tools, refer this blog for more details.

From GlusterFS website

GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads.

Gluster CLI has limited features to view and filter the volume info. I started a small project to enhance Gluster CLI for personal use. As of now it consists of a tool to list Gluster volumes in tabular format. Other intersesting features includes filtering the output based on name, type, status, bricks etc.

Clone the project(I cloned it to /home/aravinda/sandbox/)

cd /home/aravinda/sandbox
git clone https://github.com/aravindavk/glusterfs-tools.git

Create a shellscript to call gftools /usr/local/bin/gfvolumes

#!/bin/bash
python /home/aravinda/sandbox/glusterfs-tools/gftools/volumes.py "$@"

Make gfvolumes executable

chmod +x /usr/local/bin/gfvolumes

Now we can run sudo gfvolumes to see the list of glusterfs volumes. Type gfvolumes --help for help.

All Volumes

All Volumes

Name Filter

Name Filter

Status Filter

Status Filter

Type Filter

Type Filter

Name Filter

Show Bricks

Additionally it can output filtered details in JSON format.

Name Filter

JSON Format

We can easily import this in our python script.

#!/usr/bin/python
from gftools import volumes
gfvols = volumes.GfVolumes()
ok, vols = gfvols.get(name='^gv[0-9]$', status='down') # Various filters available
if ok:
    # Do action

Note: root permission is required to run gluster command, so run gfvolumes as root(sudo gfvolumes)

Future plans:

  1. Adding more filters
  2. Adding more admin tools
  3. Creating RPM/DEB packages

C & S Welcome.

by on

GlusterFS Tools

UPDATE:
Installation and usage is simplified with the new release of glusterfs-tools, refer this blog for more details.

From GlusterFS website

GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads.

Gluster CLI has limited features to view and filter the volume info. I started a small project to enhance Gluster CLI for personal use. As of now it consists of a tool to list Gluster volumes in tabular format. Other intersesting features includes filtering the output based on name, type, status, bricks etc.

Clone the project(I cloned it to /home/aravinda/sandbox/)

cd /home/aravinda/sandbox
git clone https://github.com/aravindavk/glusterfs-tools.git

Create a shellscript to call gftools /usr/local/bin/gfvolumes

#!/bin/bash
python /home/aravinda/sandbox/glusterfs-tools/gftools/volumes.py "$@"

Make gfvolumes executable

chmod +x /usr/local/bin/gfvolumes

Now we can run sudo gfvolumes to see the list of glusterfs volumes. Type gfvolumes --help for help.

All Volumes

All Volumes

Name Filter

Name Filter

Status Filter

Status Filter

Type Filter

Type Filter

Name Filter

Show Bricks

Additionally it can output filtered details in JSON format.

Name Filter

JSON Format

We can easily import this in our python script.

#!/usr/bin/python
from gftools import volumes
gfvols = volumes.GfVolumes()
ok, vols = gfvols.get(name='^gv[0-9]$', status='down') # Various filters available
if ok:
    # Do action

Note: root permission is required to run gluster command, so run gfvolumes as root(sudo gfvolumes)

Future plans:

  1. Adding more filters
  2. Adding more admin tools
  3. Creating RPM/DEB packages

C & S Welcome.