by on September 13, 2016

Making gluster play nicely with others

These days hyperconverged strategies are everywhere. But when you think about it, sharing the finite resources within a physical host requires an effective means of prioritisation and enforcement. Luckily, the Linux kernel already provides an infrastructure for this in the shape of cgroups, and the interface to these controls is now simplified with systemd integration.
So lets look at how you could use these capabilities to make Gluster a better neighbour in a collocated or hyperconverged  model. 
First some common systemd terms, we should to be familiar with;
slice : a slice is a concept that systemd uses to group together resources into a hierarchy. Resource constraints can then be applied to the slice, which defines 
  • how different slices may compete with each other for resources (e.g. weighting)
  • how resources within a slice are controlled (e.g. cpu capping)
unit : a systemd unit is a resource definition for controlling a specific system service
NB. More information about control groups with systemd can be found here

In this article, I’m keeping things simple by implementing a cpu cap on glusterfs processes. Hopefully, the two terms above are big clues, but conceptually it breaks down into two main steps;
  1. define a slice which implements a CPU limit
  2. ensure gluster’s systemd unit(s) start within the correct slice.

So let’s look at how this is done.

Defining a slice

Slice definitions can be found under /lib/systemd/system, but systemd provides a neat feature where /etc/systemd/system can be used provide local “tweaks”. This override directory is where we’ll place a slice definition. Create a file called glusterfs.slice, containing;

[Slice]
CPUQuota=200%

CPUQuota is our means of applying a cpu limit on all resources running within the slice. A value of 200% defines a 2 cores/execution threads limit.

Updating glusterd

Next step is to give gluster a nudge so that it shows up in the right slice. If you’re using RHEL7 or Centos7, cpu accounting may be off by default (you can check in /etc/systemd/system.conf). This is OK, it just means we have an extra parameter to define. Follow these steps to change the way glusterd is managed by systemd

# cd /etc/systemd/system
# mkdir glusterd.service.d
# echo -e “[Service]\nCPUAccounting=true\nSlice=glusterfs.slice” > glusterd.service.d/override.conf

glusterd is responsible for starting the brick and self heal processes, so by ensuring glusterd starts in our cpu limited slice, we capture all of glusterd’s child processes too. Now the potentially bad news…this ‘nudge’ requires a stop/start of gluster services. If your doing this on a live system you’ll need to consider quorum, self heal etc etc. However, with the settings above in place, you can get gluster into the right slice by;

# systemctl daemon-reload
# systemctl stop glusterd
# killall glusterfsd && killall glusterfs
# systemctl daemon-reload
# systemctl start glusterd
You can see where gluster is within the control group hierarchy by looking at it’s runtime settings

# systemctl show glusterd | grep slice
Slice=glusterfs.slice
ControlGroup=/glusterfs.slice/glusterd.service
Wants=glusterfs.slice
After=rpcbind.service glusterfs.slice systemd-journald.socket network.target basic.target

or use the systemd-cgls command to see the whole control group hierarchy
├─1 /usr/lib/systemd/systemd –switched-root –system –deserialize 19
├─glusterfs.slice
│ └─glusterd.service
│   ├─ 867 /usr/sbin/glusterd -p /var/run/glusterd.pid –log-level INFO
│   ├─1231 /usr/sbin/glusterfsd -s server-1 –volfile-id repl.server-1.bricks-brick-repl -p /var/lib/glusterd/vols/repl/run/server-1-bricks-brick-repl.pid 

 │   └─1305 /usr/sbin/glusterfs -s localhost –volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log
├─user.slice
│ └─user-0.slice
│   └─session-1.scope
│     ├─2075 sshd: root@pts/0  
│     ├─2078 -bash
│     ├─2146 systemd-cgls
│     └─2147 less
└─system.slice

At this point gluster is exactly where we want it! 
Time for some more systemd coolness 😉 The resource constraints that are applied by the slice are dynamic, so if you need more cpu, you’re one command away from getting it;

# systemctl set-property glusterfs.slice CPUQuota=350%

Try the ‘systemd-cgtop’ command to show the cpu usage across the complete control group hierarchy.

Now if jumping straight into applying resource constraints to gluster is a little daunting, why not test this approach with a tool like ‘stress‘. Stress is designed to simply consume components of the system – cpu, memory, disk. Here’s an example .service file which uses stress to consume 4 cores

[Unit]
Description=CPU soak task

[Service]
Type=simple
CPUAccounting=true
ExecStart=/usr/bin/stress -c 4
Slice=glusterfs.slice

[Install]
WantedBy=multi-user.target

Now you can tweak the service, and the slice with different thresholds before you move on to bigger things! Use stress to avoid stress :)

And now the obligatory warning. Introducing any form of resource constraint may resort in unexpected outcomes especially in hyperconverged/collocated systems – so adequate testing is key.

With that said…happy hacking :)

Read More

by on September 10, 2016

10 minutes introduction to Gluster Eventing Feature

Demo video is included in the end, or you can directly watch it on Youtube

Gluster Eventing is the new feature as part of Gluster.Next
initiatives, it provides close to realtime notification and alerts for
the Gluster cluster state changes.

Websockets APIs to consume events will be added later. Now we emit
events via another popular mechanism called “Webhooks”.(Many popular
products provide notifications via Webhooks Github, Atlassian,
Dropbox, and many more)

Webhooks are similar to callbacks(over HTTP), on event Gluster will
call the Webhook URL(via POST) which is configured. Webhook is a web server
which listens on a URL, this can be deployed outside of the
Cluster. Gluster nodes should be able to access this Webhook server on
the configured port. We will discuss about adding/testing webhook
later.

Example Webhook written in python,

from flask import Flask, request

app = Flask(__name__)

@app.route("/listen", methods=["POST"])
def events_listener():
    gluster_event = request.json
    if gluster_event is None:
        # No event to process, may be test call
        return "OK"

    # Process gluster_event
    # {
    #  "nodeid": NODEID,
    #  "ts": EVENT_TIMESTAMP,
    #  "event": EVENT_TYPE,
    #  "message": EVENT_DATA
    # }
    return "OK"

app.run(host="0.0.0.0", port=9000)

Eventing feature is not yet available in any of the releases, patch is
under review in upstream master(http://review.gluster.org/14248). If anybody interested in trying it
out can cherrypick the patch from review.gluster.org

git clone http://review.gluster.org/glusterfs
cd glusterfs
git fetch http://review.gluster.org/glusterfs refs/changes/48/14248/5
git checkout FETCH_HEAD
git checkout -b <YOUR_BRANCH_NAME>
./autogen.sh
./configure
make
make install

Start the Eventing using,

gluster-eventing start

Other commands available are stop, restart, reload and
status. gluster-eventing --help for more details.

Now Gluster can send out notifications via Webhooks. Setup a web
server listening to a POST request and register that URL to Gluster
Eventing. Thats all.

gluster-eventing webhook-add <MY_WEB_SERVER_URL>

For example, if my webserver is running at http://192.168.122.188:9000/listen
then register using,

gluster-eventing webhook-add ``http://192.168.122.188:9000/listen``

We can also test if web server is accessible from all Gluster nodes
using webhook-test subcommand.

gluster-eventing webhook-test http://192.168.122.188:9000/listen

With the initial patch only basic events are covered, I will add more
events once this patch gets merged. Following events are available
now.

Volume Create
Volume Delete
Volume Start
Volume Stop
Peer Attach
Peer Detach

Created a small demo to show this eventing feature, it uses Web server
which is included with the patch for Testing.(laptop hostname is sonne)

/usr/share/glusterfs/scripts/eventsdash.py --port 8080

Login to Gluster node and start the eventing,

gluster-eventing start
gluster-eventing webhook-add http://sonne:8080/listen

And then login to VM and run Gluster commands to probe/detach peer,
volume create, start etc and Observe the realtime notifications for
the same where eventsdash is running.

Example,

ssh root@fvm1
gluster peer attach fvm2
gluster volume create gv1 fvm1:/bricks/b1 fvm2:/bricks/b2 force
gluster volume start gv1
gluster volume stop gv1
gluster volume delete gv1
gluster peer detach fvm2

Demo also includes a Web UI which refreshes its UI automatically when
something changes in Cluster.(I am still fine tuning this UI, not yet
available with the patch. But soon will be available as seperate repo
in my github)

FAQ:

  • Will this feature available in 3.8 release?

    Sadly No. I couldn’t get this merged before 3.8 feature freeze :(

  • Is it possible to create a simple Gluster dashboard outside the
    cluster?

    It is possible, along with the events we also need REST APIs to get
    more information from cluster or to perform any action in cluster.
    (WIP REST APIs are available here)

  • Is it possible to filter only alerts or critical notifications?

    Thanks Kotresh for the
    suggestion. Yes it is possible to add event_type and event_group
    information to the dict so that it can be filtered easily.(Not yet
    available now, but will add this feature once this patch gets merged
    in Master)

  • Is documentation available to know more about eventing design and
    internals?

    Design spec available here
    (which discusses about Websockets, currently we don’t have
    Websockets support). Usage documentation is available in the commit
    message of the patch(http://review.gluster.org/14248).

Comments and Suggestions Welcome.

Read More

by on September 2, 2016

Compacting SQLite Databases in GlusterFS

Tiering is a powerful feature in Gluster. It divides the available storage into two parts: the hot tier populated by small fast storage devices like SSDs or a RAMDisk, and the cold tier populated by large slow devices like mechanical HDDs. By placing most recently accessed files in the hot , Gluster can quickly process […]

Read More

by on August 28, 2016

Gluster Community Newsletter, August 2016

Important happenings for Gluster this month:  3.7.14 released 3.8.3 released CFP for Gluster Developer Summit open until August 31st gluster-users: [Gluster-users] release-3.6 end of life http://www.gluster.org/pipermail/gluster-users/2016-August/028078.html – Joe requests a review of the 3.6 EOL proposal [Gluster-users] The out-of-order GlusterFS 3.8.3 release addresses a usability regression http://www.gluster.org/pipermail/gluster-users/2016-August/028155.html Niels de Vos announces 3.8.3 [Gluster-users] GlusterFS-3.7.14 released […]

Read More