all posts tagged glusterfs


by on October 19, 2016

FOSDEM 2017: Software Defined Storage Devroom

Gluster and Ceph are delighted to be hosting a Software Defined Storage devroom at FOSDEM 2017.

Important dates:

  • Nov 16: Deadline for submissions
  • Dec  1: Speakers notified of acceptance
  • Dec  5: Schedule published

This year, we’re looking for conversations about open source software defined storage, use cases in the real world, and where the future lies. We’re inviting any Free/Libre/Open Source Software for software defined storage.
Please include the following information when submitting a proposal:

  • Your name
  • The title of your talk (please be descriptive, as titles will be listed with around 250 from other projects)
  • Short abstract of one or two paragraphs
  • Short bio (with photo)

The deadline for submissions is November 16th 2016. FOSDEM will be held
on the weekend of February 4-5, 2017 and the Software Defined Storage DevRoom will take
place on Sunday, February 5, 2017. Please use the following website to
submit your proposals:

https://penta.fosdem.org/submission/FOSDEM17

 

by on September 13, 2016

Making gluster play nicely with others

These days hyperconverged strategies are everywhere. But when you think about it, sharing the finite resources within a physical host requires an effective means of prioritisation and enforcement. Luckily, the Linux kernel already provides an infrastructure for this in the shape of cgroups, and the interface to these controls is now simplified with systemd integration.

So lets look at how you could use these capabilities to make Gluster a better neighbour in a collocated or hyperconverged  model. 

First some common systemd terms, we should to be familiar with;
slice : a slice is a concept that systemd uses to group together resources into a hierarchy. Resource constraints can then be applied to the slice, which defines 
  • how different slices may compete with each other for resources (e.g. weighting)
  • how resources within a slice are controlled (e.g. cpu capping)
unit : a systemd unit is a resource definition for controlling a specific system service
NB. More information about control groups with systemd can be found here

In this article, I'm keeping things simple by implementing a cpu cap on glusterfs processes. Hopefully, the two terms above are big clues, but conceptually it breaks down into two main steps;
  1. define a slice which implements a CPU limit
  2. ensure gluster's systemd unit(s) start within the correct slice.
So let's look at how this is done.

Defining a slice

Slice definitions can be found under /lib/systemd/system, but systemd provides a neat feature where /etc/systemd/system can be used provide local "tweaks". This override directory is where we'll place a slice definition. Create a file called glusterfs.slice, containing;

[Slice]
CPUQuota=200%

CPUQuota is our means of applying a cpu limit on all resources running within the slice. A value of 200% defines a 2 cores/execution threads limit.

Updating glusterd


Next step is to give gluster a nudge so that it shows up in the right slice. If you're using RHEL7 or Centos7, cpu accounting may be off by default (you can check in /etc/systemd/system.conf). This is OK, it just means we have an extra parameter to define. Follow these steps to change the way glusterd is managed by systemd

# cd /etc/systemd/system
# mkdir glusterd.service.d
# echo -e "[Service]\nCPUAccounting=true\nSlice=glusterfs.slice" > glusterd.service.d/override.conf

glusterd is responsible for starting the brick and self heal processes, so by ensuring glusterd starts in our cpu limited slice, we capture all of glusterd's child processes too. Now the potentially bad news...this 'nudge' requires a stop/start of gluster services. If your doing this on a live system you'll need to consider quorum, self heal etc etc. However, with the settings above in place, you can get gluster into the right slice by;

# systemctl daemon-reload
# systemctl stop glusterd
# killall glusterfsd && killall glusterfs
# systemctl daemon-reload
# systemctl start glusterd


You can see where gluster is within the control group hierarchy by looking at it's runtime settings

# systemctl show glusterd | grep slice
Slice=glusterfs.slice
ControlGroup=/glusterfs.slice/glusterd.service
Wants=glusterfs.slice
After=rpcbind.service glusterfs.slice systemd-journald.socket network.target basic.target

or use the systemd-cgls command to see the whole control group hierarchy

├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 19
├─glusterfs.slice
│ └─glusterd.service
│   ├─ 867 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
│   ├─1231 /usr/sbin/glusterfsd -s server-1 --volfile-id repl.server-1.bricks-brick-repl -p /var/lib/glusterd/vols/repl/run/server-1-bricks-brick-repl.pid 

 │   └─1305 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log
├─user.slice
│ └─user-0.slice
│   └─session-1.scope
│     ├─2075 sshd: root@pts/0  
│     ├─2078 -bash
│     ├─2146 systemd-cgls
│     └─2147 less
└─system.slice

At this point gluster is exactly where we want it! 

Time for some more systemd coolness ;) The resource constraints that are applied by the slice are dynamic, so if you need more cpu, you're one command away from getting it;

# systemctl set-property glusterfs.slice CPUQuota=350%

Try the 'systemd-cgtop' command to show the cpu usage across the complete control group hierarchy.

Now if jumping straight into applying resource constraints to gluster is a little daunting, why not test this approach with a tool like 'stress'. Stress is designed to simply consume components of the system - cpu, memory, disk. Here's an example .service file which uses stress to consume 4 cores

[Unit]
Description=CPU soak task

[Service]
Type=simple
CPUAccounting=true
ExecStart=/usr/bin/stress -c 4
Slice=glusterfs.slice

[Install]
WantedBy=multi-user.target

Now you can tweak the service, and the slice with different thresholds before you move on to bigger things! Use stress to avoid stress :)

And now the obligatory warning. Introducing any form of resource constraint may resort in unexpected outcomes especially in hyperconverged/collocated systems - so adequate testing is key.

With that said...happy hacking :)




by on August 30, 2016

Run Gluster systemd containers [without privileged mode] in Fedora/CentOS

Today we will discuss about how to run gluster systemd containers without ‘privilege’ mode !! Awesome .. Isnt it ?

I owe this blog to few people latest being twitter.com/dglushenok/status/740265552258682882
Here is some details about my docker host setup:
[root@dhcp35-111 ~]# cat /etc/redhat-release
Fedora release 24 (Twenty Four)
[root@dhcp35-111 ~]# docker version
Client:
Version: 1.10.3
API version: 1.22
Package version: docker-1.10.3-21.git19b5791.fc24.x86_64
Go version: go1.6.2
Git commit: 19b5791/1.10.3
Built:
OS/Arch: linux/amd64
Server:
Version: 1.10.3
API version: 1.22
Package version: docker-1.10.3-21.git19b5791.fc24.x86_64
Go version: go1.6.2
Git commit: 19b5791/1.10.3
Built:
OS/Arch: linux/amd64
[root@dhcp35-111 ~]#

I have pulled gluster/gluster-centos image from docker hub and kept in my docker image registry.

[root@dhcp35-111 ~]# docker images |grep gluster
docker.io/gluster/gluster-centos latest 759691b0beca 4 days ago 406.1 MB
gluster/gluster-centos experiment fd8cd51f47fb 2 weeks ago 351.2 MB
gluster/gluster-centos latest 9b46174d3366 3 weeks ago 351.1 MB
gluster/gluster-centos gluster_3_7_centos_7 5809addca906 4 weeks ago 351.1 MB

The beauty is that we don’t need any extra steps to be performed in our host system.

NOTE: We havent submitted ‘privileged’ flag/option with below ‘docker run’ command. The volume mounts like ‘/etc/glusterfs’, ‘/var/lib/glusterd’, ‘/var/log/glusterfs’..etc are kept for glusterfs metadata and logs to be persistent across container spawning.


[root@dhcp35-111 docker-host]# docker run -d --name gluster3 -v /etc/glusterfs:/etc/glusterfs:z -v /var/lib/glusterd:/var/lib/glusterd:z -v /var/log/glusterfs:/var/log/glusterfs:z -v /sys/fs/cgroup:/sys/fs/cgroup:ro gluster/gluster-centos
8b1dd6f0aa88197bdcd022802f7c0c16d642630a21b2b43accfa5ed8023c197a
[root@dhcp35-111 docker-host]#

As we now have the container id ( 8b1dd6f0aa88197bdcd022802f7c0c16d642630a21b2b43accfa5ed8023c197a), let’s get inside the container and examine the service and its behavior.

[root@dhcp35-111 docker-host]# docker exec -ti 8b1dd6f0aa88197bdcd022802f7c0c16d642630a21b2b43accfa5ed8023c197a /bin/bash
[root@8b1dd6f0aa88 /]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 122764 4688 ? Ss 13:34 0:00 /usr/sbin/init
root 22 0.0 0.0 36832 6348 ? Ss 13:34 0:00 /usr/lib/systemd/systemd-journald
root 23 0.0 0.0 118492 2744 ? Ss 13:34 0:00 /usr/sbin/lvmetad -f
root 29 0.0 0.0 24336 2884 ? Ss 13:34 0:00 /usr/sbin/crond -n
rpc 42 0.0 0.0 64920 3244 ? Ss 13:34 0:00 /sbin/rpcbind -w
root 44 0.0 0.2 430272 17300 ? Ssl 13:34 0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 68 0.0 0.0 82572 6212 ? Ss 13:34 0:00 /usr/sbin/sshd -D
root 197 0.0 0.0 11788 2952 ? Ss 13:35 0:00 /bin/bash
root 219 0.0 0.0 47436 3360 ? R+ 13:44 0:00 ps aux
[root@8b1dd6f0aa88 /]#
[root@8b1dd6f0aa88 /]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2016-06-28 13:34:53 UTC; 27s ago
Process: 43 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 44 (glusterd)
CGroup: /system.slice/docker-8b1dd6f0aa88197bdcd022802f7c0c16d642630a21b2b43accfa5ed8023c197a.scope/system.slice/glusterd.service
└─44 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
Jun 28 13:34:51 8b1dd6f0aa88 systemd[1]: Starting GlusterFS, a clustered file-system server...
Jun 28 13:34:53 8b1dd6f0aa88 systemd[1]: Started GlusterFS, a clustered file-system server.
Jun 28 13:35:15 8b1dd6f0aa88 systemd[1]: Started GlusterFS, a clustered file-system server.
[root@8b1dd6f0aa88 /]#
[root@8b1dd6f0aa88 /]# glusterd --version
glusterfs 3.7.11 built on Apr 18 2016 13:20:46
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@8b1dd6f0aa88 /]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@8b1dd6f0aa88 /]# rpm -qa |grep glusterfs
glusterfs-3.7.11-1.el7.x86_64
glusterfs-fuse-3.7.11-1.el7.x86_64
glusterfs-cli-3.7.11-1.el7.x86_64
glusterfs-libs-3.7.11-1.el7.x86_64
glusterfs-client-xlators-3.7.11-1.el7.x86_64
glusterfs-api-3.7.11-1.el7.x86_64
glusterfs-server-3.7.11-1.el7.x86_64
glusterfs-geo-replication-3.7.11-1.el7.x86_64
[root@8b1dd6f0aa88 /]#

Let’s examine this container from docker host and verify these containers are running without privileged mode.

[root@dhcp35-111 docker-host]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8b1dd6f0aa88 gluster/gluster-centos "/usr/sbin/init" 6 minutes ago Up 6 minutes 111/tcp, 245/tcp, 443/tcp, 2049/tcp, 2222/tcp, 6010-6012/tcp, 8080/tcp, 24007/tcp, 38465-38466/tcp, 38468-38469/tcp, 49152-49154/tcp, 49156-49162/tcp gluster3
[root@dhcp35-111 docker-host]# docker inspect 8b1dd6f0aa88|grep -i privil
"Privileged": false,
[root@dhcp35-111 docker-host]#

All is well, but what will be missing if you run these containers without ‘privilged’ mode? Not much! However, if you want to create gluster snapshots from container we may need to export ‘/dev/’ to the container and operations to create devices from container need privileged mode.

by on August 26, 2016

Possible configurations of GlusterFS in Kubernetes/OpenShift setup

In previous blog posts we discussed, how to use GlusterFS as a persistent storage in Kubernetes and Openshift. In nutshell, the GlusterFS can be deployed/used in a kubernetes/openshift environment as : *) Contenarized GlusterFS ( Pod ) *) GlusterFS as Openshift service and Endpoint (Service and Endpoint). *) GlusterFS volume as Persistent Volume (PV) and using GlusterFS volume plugin to bind this PV to a Persistent Volume Claim ( PVC) *) GlusterFS template to deploy GlusterFS pods in an Openshift Environment. All the configuration files that can be used to deploy GlusterFS can be found @ github.com/humblec/glusterfs-kubernetes-openshift/ or github.com/gluster/glusterfs-kubernetes-openshift. Lets see how to use these files to deploy GlusterFS in kubernetes and Openshift. We will start with Deploying GlusterFS pods in an Openshift/Kubernetes Environment. Deploying GlusterFS Pod:
[Update] The pod file is renamed to gluster-pod.yaml in the mentioned repo. More details about Gluster Containers can be found @http://www.slideshare.net/HumbleChirammal/gluster-containers
GlusterFS pods can be deployed in Kubernetes/Openshift, so that Gluster Nodes are deployed in containers and it can provide persistent storage for Openshift/Kubernetes setup. The examples files in this repo are used for this demo. Step 1: Create GlusterFS pod [root@atomic-node2 gluster_pod]# oc create -f gluster-1.yaml Step 2: Get details about the GlusterFS pod. [root@atomic-node2 gluster_pod]# oc describe pod gluster-1 Name: gluster-1 Namespace: default Image(s): gluster/gluster-centos Node: atomic-node1/10.70.43.174 Start Time: Tue, 17 May 2016 10:19:17 +0530 Labels: name=gluster-1 Status: Running Reason: Message: IP: 10.70.43.174 Replication Controllers: Containers: glusterfs: Container ID: docker://ff8f4af700d725dfe0e08939ec011c34ddf9dedc7204e0ced1cc355a56150742 Image: gluster/gluster-centos Image ID: docker://033de9c44a8aabde55ce8a2b751ccf5bc345fdb534ea30e79a8fa70b82dc7761 QoS Tier: cpu: BestEffort memory: BestEffort State: Running Started: Tue, 17 May 2016 10:20:35 +0530 Ready: True Restart Count: 0 Environment Variables: Conditions: Type Status Ready True Volumes: brickpath: Type: HostPath (bare host directory volume) Path: /mnt/brick1 default-token-72d89: Type: Secret (a secret that should populate this volume) SecretName: default-token-72d89 Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 1m 1m 1 {scheduler } Scheduled Successfully assigned gluster-1 to atomic-node1 1m 1m 1 {kubelet atomic-node1} implicitly required container POD Pulled Container image "openshift3/ose-pod:v3.1.1.6" already present on machine 1m 1m 1 {kubelet atomic-node1} implicitly required container POD Created Created with docker id f55ce55e6ea3 1m 1m 1 {kubelet atomic-node1} implicitly required container POD Started Started with docker id f55ce55e6ea3 1m 1m 1 {kubelet atomic-node1} spec.containers{glusterfs} Pulling pulling image "gluster/gluster-centos" 8s 8s 1 {kubelet atomic-node1} spec.containers{glusterfs} Pulled Successfully pulled image "gluster/gluster-centos" 8s 8s 1 {kubelet atomic-node1} spec.containers{glusterfs} Created Created with docker id ff8f4af700d7 8s 8s 1 {kubelet atomic-node1} spec.containers{glusterfs} Started Started with docker id ff8f4af700d7 From above logs, you can see it pulled `gluster/gluster-centos` container image and deployed containers from it. [root@atomic-node2 gluster_pod]# oc get pods NAME READY STATUS RESTARTS AGE gluster-1 1/1 Running 0 1m Examine the container and make sure it has a running GlusterFS daemon. [root@atomic-node2 gluster_pod]# oc exec -ti gluster-1 /bin/bash Examine the processes running in this container and the `glusterd` service information. [root@atomic-node1 /]# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.4 0.0 40780 2920 ? Ss 04:50 0:00 /usr/sbin/init root 20 0.3 0.0 36816 4272 ? Ss 04:50 0:00 /usr/lib/syste root 21 0.0 0.0 118476 1332 ? Ss 04:50 0:00 /usr/sbin/lvme root 37 0.0 0.0 101344 1228 ? Ssl 04:50 0:00 /usr/sbin/gssp rpc 44 0.1 0.0 64904 1052 ? Ss 04:50 0:00 /sbin/rpcbind root 209 0.1 0.1 364716 13444 ? Ssl 04:50 0:00 /usr/sbin/glus root 341 1.1 0.0 13368 1964 ? Ss 04:51 0:00 /bin/bash root 354 0.0 0.0 49020 1820 ? R+ 04:51 0:00 ps aux [root@atomic-node1 /]# service glusterd status Redirecting to /bin/systemctl status glusterd.service ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2016-05-17 04:50:41 UTC; 35s ago Process: 208 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 209 (glusterd) CGroup: /system.slice/docker-ff8f4af700d725dfe0e08939ec011c34ddf9dedc7204e0ced1cc355a56150742.scope/system.slice/glusterd.service └─209 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO... ‣ 209 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO... May 17 04:50:36 atomic-node1 systemd[1]: Starting Gluste... May 17 04:50:41 atomic-node1 systemd[1]: Started Gluster... Hint: Some lines were ellipsized, use -l to show in full. Let's fetch some more details about GlusterFS in this container. [root@atomic-node1 /]# gluster --version glusterfs 3.7.9 built on Mar 20 2016 03:19:49 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@atomic-node1 /]# [root@atomic-node1 /]# mount |grep mnt /dev/mapper/atomic-node1-root on /mnt/brick1 type xfs (rw,relatime,seclabel,attr2,inode64,noquota) This container is built on top of CentOS base image as shown below. [root@atomic-node1 /]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@atomic-node1 /]# In this article we discussed, how to run GlusterFS as a pod in Kubernetes or Openshift setup. [Part 2] covers `how to use GlusterFS as a service, Persistent Volume for a Persistent Volume Claim`. [Part 3] covers `how to use GlusterFS template to deploy GlusterFS pods in an Openshift/kubernetes setup`.
by on August 24, 2016

[Coming Soon] Dynamic Provisioning of GlusterFS volumes in Kubernetes/Openshift!!

In this context I am talking about the dynamic provisioning capability of ‘glusterfs’ plugin in Kubernetes/Openshift. I have submitted a Pull Request to Kubernetes to add this functionality for GlusterFS. At present, there is no existing network storage provisioners in kubernetes eventhough there are cloud providers. The idea here is to make the glusterfs plugin capable of provisioning volumes on demand from kubernetes/openshift .. Cool, Isnt it ? Indeed this is a nice feature to have. That said, an OSE user request for a space for example : 20G and the glusterfs plugin takes this request and create 20G and bound that to the claim. The plugin can use any REST service, but the example patch is based on ‘heketi’. Here is the workflow: Start your kubernetes controller manager with highlighted options:

 ...kube controller-manager --v=3 
 --service-account-private-key-file=/tmp/kube-serviceaccount.key
 --root-ca-file=/var/run/kubernetes/apiserver.crt --enable-hostpath-provisioner=false

 --enable-network-storage-provisioner=true --storage-config=/tmp --net-provider=glusterfs
 --pvclaimbinder-sync-period=15s --cloud-provider= --master=127.0.0.1:8080

 
Create a file called `gluster.json` in `/tmp` directory. The important fields in this config file are ‘endpoint’ and ‘resturl’. The endpoint has to be defined and match the setup. The `resturl` has been filled with the rest service which can take the input and create a gluster volume in the backend. As mentioned earlier I am using `heketi` for the same.

 [hchiramm@dhcp35-111 tmp]$ cat gluster.json
 {
 "endpoint": "glusterfs-cluster",
 "resturl": "http://127.0.0.1:8081",
 "restauthenabled":false,
 "restuser":"",
 "restuserkey":""
 }
 [hchiramm@dhcp35-111 tmp]$
 

We have to define an ENDPOINT and SERVICE. Below are the example configuration files. ENDPOINT : “ip” has to be filled with your gluster trusted pool IP.


[hchiramm@dhcp35-111 ]$ cat glusterfs-endpoint.json
{
"kind": "Endpoints",
"apiVersion": "v1",
"metadata": {
"name": "glusterfs-cluster"
},
"subsets": [
{
"addresses": [
{
"ip": "10.36.4.112"
}
],
"ports": [
{
"port": 1
}
]
},
{
"addresses": [
{
"ip": "10.36.4.112"
}
],
"ports": [
{
"port": 1
}
]
}
]
}

SERVICE: Please note that the Service Name is matching with ENDPOINT name


[hchiramm@dhcp35-111 ]$ cat gluster-service.json
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "glusterfs-cluster"
},
"spec": {
"ports": [
{"port": 1}
]
}
}
[hchiramm@dhcp35-111 ]$

Finally we have a Persistent Volume Claim file as shown below: NOTE: The size of the volume is mentioned as ’20G’:


[hchiramm@dhcp35-111 ]$ cat gluster-pvc.json
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "glusterc",
"annotations": {
"volume.alpha.kubernetes.io/storage-class": "glusterfs"
}
},
"spec": {
"accessModes": [
"ReadOnlyMany"
],
"resources": {
"requests": {
"storage": "20Gi"
}
}
}
}
[hchiramm@dhcp35-111 ]$

Let's start defining the endpoint, service and PVC.


[hchiramm@dhcp35-111 ]$ ./kubectl create -f glusterfs-endpoint.json
endpoints "glusterfs-cluster" created
[hchiramm@dhcp35-111 ]$ ./kubectl create -f gluster-service.json
service "glusterfs-cluster" created
[hchiramm@dhcp35-111 ]$ ./kubectl get ep,service
NAME ENDPOINTS AGE
ep/glusterfs-cluster 10.36.6.105:1 14s
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/glusterfs-cluster 10.0.0.10 1/TCP 9s
svc/kubernetes 10.0.0.1 443/TCP 13m
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
[hchiramm@dhcp35-111 ]$

Now, let's request a claim!

[hchiramm@dhcp35-111 ]$ ./kubectl create -f glusterfs-pvc.json
persistentvolumeclaim "glusterc" created
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv/pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c  20Gi ROX  Bound  default/glusterc 2s
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
pvc/glusterc Bound pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c 0 3s
[hchiramm@dhcp35-111 ]$

Awesome! Based on the request it created a PV and BOUND to the PVClaim!!


[hchiramm@dhcp35-111 ]$ ./kubectl describe pv pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c
Name: pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c
Labels:
Status: Bound
Claim: default/glusterc
Reclaim Policy: Delete
Access Modes: ROX
Capacity: 20Gi
Message:
Source:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: glusterfs-cluster
 Path: vol_038b56756f4e3ab4b07a87494097941c
ReadOnly: false
No events.
[hchiramm@dhcp35-111 ]$
 

Verify the volume exist in backend:

 [root@ ~]# heketi-cli volume list |grep 038b56756f4e3ab4b07a87494097941c
 038b56756f4e3ab4b07a87494097941c
 [root@ ~]#

Let's delete the PV claim --


[hchiramm@dhcp35-111 ]$ ./kubectl delete pvc glusterc
persistentvolumeclaim "glusterc" deleted
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
[hchiramm@dhcp35-111 ]$

It got deleted! Verify it from backend:


 [root@ ~]# heketi-cli volume list |grep 038b56756f4e3ab4b07a87494097941c
 [root@ ~]# 

We can use the Volume for app pods by referring the claim name. Hope this is a nice feature to have !

Please let me know if you have any comments/suggestions.

Also, the patch - https://github.com/kubernetes/kubernetes/pull/30888 is undergoing review in upstream as mentioned earlier and hopefully it will make it soon to the kubernetes release. I will provide an update here as soon as its available in upstream.

by on April 26, 2016

Using LIO with Gluster

In the past, gluster users of have been able to open up their gluster volumes to iSCSI using the tgt daemon. This has been covered in the past on other blogs and also documented on gluster.org.
But, tgt has been superseded in more recent distro's by LIO. LIO provides a number of different local storage options to be utilised as SCSI targets, including; FILEIO, BLOCK, PSCSI and RAMDISK. These SCSI targets are implemented as modules in kernel space, but what isn't immediately obvious is that LIO also provides a userspace framework called TCMU. TCMU enables userspace files to become iSCSI targets.
With LIO, the easiest way to exploit gluster as an iSCSI target was through the FILEIO 'storage engine' over FUSE. However, the high number of context switches incurred within FUSE is likely to reduce the performance potential to your 'client' -  especially for random I/O access patterns.
Until now, FUSE was your only option. But Andy Grover at Red Hat has just changed things. Andy has developed tcmu-runner which utilises the TCMU framework, allowing a glusterfs target to be used over gluster's libgfapi interface. Typically, with libgfapi you can expect less context switching, and improved performance.
For those like me, with short attention spans, here's what the improvement looked like when I compared LIO/FUSE with LIO/gfapi using a couple of fio  based workloads. Read Improvement
Mixed Workload Improvement
In both charts, IOPS and latency significantly improves using LIO/GFAPI, and further still by adopting the arbiter volume. As you can see, for a young project, these results are really encouraging. The bad news is that to try tcmu-runner you'll need to either build systems based on Fedora F24/rawhide or compile it yourself from the github repo. Let's face it, there's always a price to pay for new shiny stuff :) For the remainder of this article, I'll walk through the configuration of LIO and the iSCSI client that I used during my comparisons.

Preparing Your Environment

In the interests of brevity, I'm assuming that you know how to build servers,  create a gluster trusted pool and define volumes. Here's a checklist of the tasks you should do in order to prepare a test environment;
  1. build 3 Fedora24 nodes and install gluster (3.7.11) on each peer/node
  2. on each node, ensure /etc/gluster/glusterd.vol contains the following setting - option rpc-auth-allow-insecure on. This is needed for gfapi access. Once added, you'll need to restart glusterd.
  3. install targetcli (targetcli-2.1.fb43-1) and tcmu-runner (tcmu-runner-1.0.4-1) on each of your gluster nodes
  4. form a gluster trusted pool, and create a replica 3 volume or replica with arbiter volume (or both!)
  5. issue "gluster vol set <vol_name> server.allow-insecure on" to enable libgfapi access to the volume
There are several ways to configure the iSCSI environment, but for my tests I adopted the following approach;
  • two of my three gluster nodes will be iSCSI gateways (LIO targets)
  • each gateway will have it's own iqn (iSCSI Qualified Name)
  • each gateway will only access the gluster volume from itself, so if gluster is down on this node so is the path for any attached client (makes things simple)
  • high availability for the LUN is provided by client side multipathing
Before moving on, you can confirm that targetcli/tcmu-runner are providing the gluster integration by simply running 'ls' from the targetcli.
# targetcli ls o- / ............... o- backstores .... | o- block ....... | o- fileio ...... | o- pscsi ....... | o- ramdisk ..... | o- user:glfs ...    <--- gluster gfapi available through tcmu | o- user:qcow ... o- iscsi ......... o- loopback ...... o- vhost ......
With the preparation complete, let's configure the LIO gateways.

Configuring LIO - Node 1

The following steps provide an example configuration You'll need to make changes to naming etc specific to your test environment.
    1. Mount the volume (called iscsi-pool), and allocate the file that will become the LUN image
# fallocate -l 100G mytest.img
  1. Enter the targetcli shell. The remaining steps all take place within this shell.
    1. Create the backing store connection to the glusterfs file
/backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-3/mytest.img
    1. Create the node's target portal (this is the name the client will connect to). In this example 'iscsi-3' is the node name
/iscsi/ create iqn.2016-04.org.gluster:iscsi-3  
NB. this will create the target IQN and the iscsi portal will be enabled and listening on port 3260
    1. On the client, 'grab' it's iqn from /etc/iscsi/initiatorname.iscsi, then add it to the gateway
/iscsi/iqn.2016-04.org.gluster:iscsi-3/tpg1/acls/ create iqn.1994-05.com.redhat:14a2b41fe9e4  
    1. Add the LUN, "myLUN", to the target and automatically map it to the client(s) 
/iscsi/iqn.2016-04.org.gluster:iscsi-3/tpg1/luns create /backstores/user:glfs/myLUN 0  
  1. Issue saveconfig to commit the configuration (config is stored in /etc/target/saveconfig.json)

Configuring LIO - Node 2

When a LUN is defined by targetcli, a wwn is automatically generated for it. This is neat, but to ensure multipathing works we need the LUN exported by the gateways to share the same wwn - if they don't match, the client will see two devices, not two paths to the same device. So for subsequent nodes, the steps are slightly different.
    1. On the first node, look at /etc/target/saveconfig.json. You'll see a storage object item for the gluster file you've just created, together with the wwn that was assigned (highlighted).
  "storage_objects": [ { "config": "glfs/iscsi-pool@iscsi-3/mytest.img", "name": "myLUN", "plugin": "user", "size": 107374182400, "wwn": "653e4072-8aad-4e9d-900e-4059f0e19e7e" }
    1. Open the targetcli shell on node 2, and define a LUN pointing to the same backing file as node 1, but this time explicitly specifying the wwn (from step 1)
/backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-1/mytest.img 653e4072-8aad-4e9d-900e-4059f0e19e7e
      (if you cd to /backstores/user:glfs and use
help create
    you'll see a summary of the options available when creating the LUN)
  1. With the LUN in place, you can follow steps 4-7 above to create the iqn, portal and LUN masking for this node.
At this point you have;
  • 3 gluster nodes
  • a gluster volume with a file defined, serving as an iscsi target
  • 2 gluster nodes defined as iscsi gateways
  • each gateway exports the same LUN to a client (supporting multipathing)
Next up...configuring the client.

Client Configuration

To get the client to connect to your 'exported' LUN(s), you first need to ensure that the following rpms are installed on the client; device-mapper-multipath, iscsi-initiator-utils and preferably sg3_utils. With these packages in place you can move on to configure multipathing and connect to you LUN(s).
  • Multipathing : the example below shows a devices section from /etc/multipath.conf that I used to ensure my exported LUNs are seen as multipath devices. With this in place, you can take a node down for maintenance and your LUN remains accessible (as long as your volume has quorum!)
# # LIO iSCSI devices { device { vendor "LIO-ORG" path_grouping_policy "multibus" # I tested with a path_selector of "round-robin" and "queue-length" path_selector "queue-length 0" path_checker "directio" prio "const" rr_weight "uniform" } }
  • iscsi discovery/login : to login to the gluster iscsi gateway's just use the iscsiadm command (from iscsi-initiator-utils rpm)
# iscsiadm -m discovery -t st -p <your_gluster_node_1> -l # iscsiadm -m discovery -t st -p <your_gluster_node_2> -l # #check your paths are working as expected with multipath command # multipath -ll mpathd (36001405891b9858f4b0440285cacbcca) dm-2 LIO-ORG ,TCMU device size=8.0G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 33:0:0:1 sdc 8:32 active ready running `- 34:0:0:1 sde 8:64 active ready running mpathb (3600140596a3a65692104740a88516aba) dm-3 LIO-ORG ,TCMU device size=8.0G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 33:0:0:0 sdb 8:16 active ready running `- 34:0:0:0 sdd 8:48 active ready running mpathf (36001405653e40728aad4e9d900e4059f) dm-6 LIO-ORG ,TCMU device size=1.0G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 35:0:0:0 sdf 8:80 active ready running `- 33:0:0:2 sdg 8:96 active ready running
You can see in this example, I have three LUN's exported, and each one has two active paths (one to each gluster node). By default, the iscsi node definition in (/var/lib/iscsi/nodes) uses a setting of node.startup=automatic, which means LUN(s) will automagically reappear on the client following a reboot. But from the client's perspective, how do you know which LUN is from which glusterfs volume/file? For this, sg_inq is your friend...
# sg_inq -i /dev/dm-6 VPD INQUIRY: Device Identification page Designation descriptor number 1, descriptor length: 49 designator_type: T10 vendor identification,  code_set: ASCII associated with the addressed logical unit vendor id: LIO-ORG vendor specific: 653e4072-8aad-4e9d-900e-4059f0e19e7e Designation descriptor number 2, descriptor length: 20 designator_type: NAA,  code_set: Binary associated with the addressed logical unit NAA 6, IEEE Company_id: 0x1405 Vendor Specific Identifier: 0x653e40728 Vendor Specific Identifier Extension: 0xaad4e9d900e4059f [0x6001405653e40728aad4e9d900e4059f] Designation descriptor number 3, descriptor length: 39 designator_type: vendor specific [0x0],  code_set: ASCII associated with the addressed logical unit vendor specific: glfs/iscsi-pool@iscsi-3/mytest.img
The highlighted text shows the configuration string you specified when you created the LUN in targetcli. If you run the same command against the devices themselves (/dev/sdf or /dev/sdg) you'd see the connection string from each of respective gateways. Nice and easy!

And Finally...

Remember, this is all shiny and new - so if you try it, expect some rough edges! However, I have to say that it looks promising, and during my tests I didn't lose any data...but YMMV :) Happy testing!
by on March 29, 2016

Persistent Volume and Claim in OpenShift and Kubernetes using GlusterFS Volume Plugin

OpenShift is a platform as a service product from Red Hat. The software that runs the service is open-sourced under the name OpenShift Origin, and is available on GitHub.

OpenShift v3 is a layered system designed to expose underlying Docker and Kubernetes concepts as accurately as possible, with a focus on easy composition of applications by a developer. For example, install Ruby, push code, and add MySQL.

Docker is an open platform for developing, shipping, and running applications. With Docker you can separate your applications from your infrastructure and treat your infrastructure like a managed application. Docker does this by combining kernel containerization features with workflows and tooling that help you manage and deploy your applications. Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. Available on GitHub.

Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon a decade and a half of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community. Available on GitHub.

GlusterFS is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. GlusterFS is free and open source software. Available on GitHub.

Hope you know a little bit of all the above Technologies, now we jump right into our topic which is Persistent Volume and Persistent volume claim in Kubernetes and Openshift v3 using GlusterFS volume. So what is Persistent Volume? Why do we need it? How does it work using GlusterFS Volume Plugin?

In Kubernetes, Managing storage is a distinct problem from managing compute. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this we introduce two new API resources in kubernetes: PersistentVolume and PersistentVolumeClaim.

A PersistentVolume (PV) is a piece of networked storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g, can be mounted once read/write or many times read-only).

In simple words, Containers in Kubernetes Cluster need some storage which should be persistent even if the container goes down or no longer needed. So Kubernetes Administrator creates a Storage(GlusterFS storage, In this case) and creates a PV for that storage. When a Developer (Kubernetes cluster user) needs a Persistent Volume in a container, creates a Persistent Volume claim. Persistent Volume Claim will contain the options which Developer needs in the pods. So from list of Persistent Volume the best match is selected for the claim and Binded to the claim. Now the developer can use the claim in the pods.


Prerequisites:

Need a Kubernetes or Openshift cluster, My setup is one master and three nodes.

Note: you can use kubectl in place of oc, oc is openshift controller which is a wrapper around kubectl. I am not sure about the difference.


#oc get nodes
NAME LABELS STATUS AGE
dhcp42-144.example.com kubernetes.io/hostname=dhcp42-144.example.com,name=node3 Ready 15d
dhcp42-235.example.com kubernetes.io/hostname=dhcp42-235.example.com,name=node1 Ready 15d
dhcp43-174.example.com kubernetes.io/hostname=dhcp43-174.example.com,name=node2 Ready 15d
dhcp43-183.example.com kubernetes.io/hostname=dhcp43-183.example.com,name=master Ready,SchedulingDisabled 15d

2) Have a GlusterFS cluster setup, Create a GlusterFS Volume and start the GlusterFS volume.

# gluster v status
Status of volume: gluster_vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 170.22.42.84:/gluster_brick 49152 0 Y 8771
Brick 170.22.43.77:/gluster_brick 49152 0 Y 7443
NFS Server on localhost 2049 0 Y 7463
NFS Server on 170.22.42.84 2049 0 Y 8792
Task Status of Volume gluster_vol
------------------------------------------------------------------------------
There are no active volume tasks

3) All nodes in kubernetes cluster must have GlusterFS-Client Package installed.

Now we have the prerequisites o/ …

In Kube-master administrator has to write required yaml file which will be given as input to the kube cluster.

There are three files to be written by administrator and one by Developer.

Service
Service Keeps the endpoint to be persistent or active.
Endpoint
Endpoint is the file which points to the GlusterFS cluster location.
PV
PV is Persistent Volume where the administrator will define the gluster volume name, capacity of volume and access mode.
PVC
PVC is persistent volume claim where developer defines the type of storage as needed.

STEP 1: Create a service for the gluster volume.


# cat gluster_pod/gluster-service.yaml
apiVersion: "v1"
kind: "Service"
metadata:
name: "glusterfs-cluster"
spec:
ports:
- port: 1
# oc create -f gluster_pod/gluster-service.yaml
service "glusterfs-cluster" created

Verify:

# oc get service
NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
glusterfs-cluster 172.30.251.13 1/TCP 9m
kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 16d

STEP 2: Create an Endpoint for the gluster service

# cat gluster_pod/gluster-endpoints.yaml
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 170.22.43.77
ports:
- port: 1

The ip here is the glusterfs cluster ip.


# oc create -f gluster_pod/gluster-endpoints.yaml
endpoints "glusterfs-cluster" created
# oc get endpoints
NAME ENDPOINTS AGE
glusterfs-cluster 170.22.43.77:1 3m
kubernetes 170.22.43.183:8053,170.22.43.183:8443,170.22.43.183:8053 16d

STEP 3: Create a PV for the gluster volume.

# cat gluster_pod/gluster-pv.yaml
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: "gluster-default-volume"
spec:
capacity:
storage: "8Gi"
accessModes:
- "ReadWriteMany"
glusterfs:
endpoints: "glusterfs-cluster"
path: "gluster_vol"
readOnly: false
persistentVolumeReclaimPolicy: "Recycle"

Note : path here is the gluster volume name. Access mode specifies the way to access the volume. Capacity has the storage size of the GlusterFS volume.


# oc create -f gluster_pod/gluster-pv.yaml
persistentvolume "gluster-default-volume" created
# oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
gluster-default-volume 8Gi RWX Available 36s

STEP 4: Create a PVC for the gluster PV.


# cat gluster_pod/gluster-pvc.yaml
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "glusterfs-claim"
spec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "8Gi"

Note: the Developer request for 8 Gb of storage with access mode rwx.


# oc create -f gluster_pod/gluster-pvc.yaml
persistentvolumeclaim "glusterfs-claim" created
# oc get pvc
NAME LABELS STATUS VOLUME CAPACITY ACCESSMODES AGE
glusterfs-claim Bound gluster-default-volume 8Gi RWX 14s

Here the pvc is bounded as soon as created, because it found the PV that satisfies the requirement. Now lets go and check the pv status


# oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
gluster-default-volume 8Gi RWX Bound default/glusterfs-claim 5m

See now the PV has been bound to “default/glusterfs-claim”. In this state developer has the Persistent Volume Claim bounded successfully, now the developer can use the pv claim like below.

STEP 5: Use the persistent Volume Claim in a Pod defined by the Developer.


# cat gluster_pod/gluster_pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: mygluster
image: ashiq/gluster-client
command: ["/usr/sbin/init"]
volumeMounts:
- mountPath: "/home"
name: gluster-default-volume
volumes:
- name: gluster-default-volume
persistentVolumeClaim:
claimName: glusterfs-claim

The above pod definition will pull the ashiq/gluster-client image(some private image) and start init script. The gluster volume will be mounted on the host machine by the GlusterFS volume Plugin available in the kubernetes and then bind mounted to the container’s /home. So all the Kubernetes cluster nodes must have glusterfs-client packages.

Lets try running.


# oc create -f gluster_pod/fedora_pod.yaml
pod "mypod" created
# oc get pods
NAME READY STATUS RESTARTS AGE
mypod 1/1 Running 0 1m

Wow its running… lets go and check where it is running.

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec57d62e3837 ashiq/gluster-client "/usr/sbin/init" 4 minutes ago Up 4 minutes k8s_myfedora.dc1f7d7a_mypod_default_5d301443-ec20-11e5-9076-5254002e937b_ed2eb8e5
1439dd72fb1d openshift3/ose-pod:v3.1.1.6 "/pod" 4 minutes ago Up 4 minutes k8s_POD.e071dbf6_mypod_default_5d301443-ec20-11e5-9076-5254002e937b_4d6a7afb

Found the Pod running successfully on one of the Kubernetes node.

On the host:


# df -h | grep gluster_vol
170.22.43.77:gluster_vol 35G 4.0G 31G 12% /var/lib/origin/openshift.local.volumes/pods/5d301443-ec20-11e5-9076-5254002e937b/volumes/kubernetes.io~glusterfs/gluster-default-volume

I can see the gluster volume being mounted on the host o/. Lets check inside the container. Note the random number is the container-id from the docker ps command.


# docker exec -it ec57d62e3837 /bin/bash
[root@mypod /]# df -h | grep gluster_vol
170.22.43.77:gluster_vol 35G 4.0G 31G 12% /home

Yippy the GlusterFS volume has been mounted inside the container on /home as mentioned in the pod definition. Lets try writing something to it


[root@mypod /]# mkdir /home/ashiq
[root@mypod /]# ls /home/
ashiq

Since the AccessMode is RWX I am able to write to the mount point.

That’s all Folks.

Author: Mohamed Ashiq

by on February 25, 2016

Gluster at FAST

We hosted a small meetup/birds of a feather session at USENIX’s FAST conference. FAST is a conference that focuses on File And Storage Technologies in Santa Clara, California.

Vijay Bellur, Gluster Project Lead did a short talk on Gluster.Next, our ongoing architectural evolution in Gluster to improve scaling and enable new use cases like like storage as a service, storage for containers and hyperconvergence.

  • what is Gluster.Next?
  • how are we building Gluster.Next –
    • DHTv2, NSR, Glusterd 2.0, Heketi, Brick multiplexing, Quality of Service
  • why are we building Gluster.Next
  • when is this planned for release
    • 3.8 slated for May/June, 4.0 for December

Integrations with other projects like OpenShift, OpenStack and oVirt were also highlighted.

Slides are included: Gluster.Next Feb 2016

 

by on February 22, 2016

Gluster Takes Its Show on the Road

The last week of January and the first week of February were packed with events and meetings. This blog contains my observations, opinions, and ideas in the hope that they will be useful or at least interesting for some.

CentOS Dojo in Brussels

The day before FOSDEM starts, the CentOS project organizes a community meetup in the form of their Dojos at an IBM office in Brussels. Because Gluster is participating in the CentOS Storage SIG (special interest group), I was asked to present something. My talk had a good participation, asking about different aspects of the goals that the Storage SIG has. Many people are interested in the Storage SIG, mainly other SIGs that would like to consume the packages getting produced. There is also increasing interest from upcoming architectures to get Gluster running on their new hardware (Aarch64 and ppc64le). The CentOS team is working on getting the hardware in the build infrastructure and testing environment, the Gluster packages will be one of the first SIG projects going to use that. (more…)