all posts tagged cloud


by on August 24, 2016

[Coming Soon] Dynamic Provisioning of GlusterFS volumes in Kubernetes/Openshift!!

In this context I am talking about the dynamic provisioning capability of ‘glusterfs’ plugin in Kubernetes/Openshift. I have submitted a Pull Request to Kubernetes to add this functionality for GlusterFS. At present, there is no existing network storage provisioners in kubernetes eventhough there are cloud providers. The idea here is to make the glusterfs plugin capable of provisioning volumes on demand from kubernetes/openshift .. Cool, Isnt it ? Indeed this is a nice feature to have. That said, an OSE user request for a space for example : 20G and the glusterfs plugin takes this request and create 20G and bound that to the claim. The plugin can use any REST service, but the example patch is based on ‘heketi’. Here is the workflow: Start your kubernetes controller manager with highlighted options:

 ...kube controller-manager --v=3 
 --service-account-private-key-file=/tmp/kube-serviceaccount.key
 --root-ca-file=/var/run/kubernetes/apiserver.crt --enable-hostpath-provisioner=false

 --enable-network-storage-provisioner=true --storage-config=/tmp --net-provider=glusterfs
 --pvclaimbinder-sync-period=15s --cloud-provider= --master=127.0.0.1:8080

 
Create a file called `gluster.json` in `/tmp` directory. The important fields in this config file are ‘endpoint’ and ‘resturl’. The endpoint has to be defined and match the setup. The `resturl` has been filled with the rest service which can take the input and create a gluster volume in the backend. As mentioned earlier I am using `heketi` for the same.

 [hchiramm@dhcp35-111 tmp]$ cat gluster.json
 {
 "endpoint": "glusterfs-cluster",
 "resturl": "http://127.0.0.1:8081",
 "restauthenabled":false,
 "restuser":"",
 "restuserkey":""
 }
 [hchiramm@dhcp35-111 tmp]$
 

We have to define an ENDPOINT and SERVICE. Below are the example configuration files. ENDPOINT : “ip” has to be filled with your gluster trusted pool IP.


[hchiramm@dhcp35-111 ]$ cat glusterfs-endpoint.json
{
"kind": "Endpoints",
"apiVersion": "v1",
"metadata": {
"name": "glusterfs-cluster"
},
"subsets": [
{
"addresses": [
{
"ip": "10.36.4.112"
}
],
"ports": [
{
"port": 1
}
]
},
{
"addresses": [
{
"ip": "10.36.4.112"
}
],
"ports": [
{
"port": 1
}
]
}
]
}

SERVICE: Please note that the Service Name is matching with ENDPOINT name


[hchiramm@dhcp35-111 ]$ cat gluster-service.json
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "glusterfs-cluster"
},
"spec": {
"ports": [
{"port": 1}
]
}
}
[hchiramm@dhcp35-111 ]$

Finally we have a Persistent Volume Claim file as shown below: NOTE: The size of the volume is mentioned as ’20G’:


[hchiramm@dhcp35-111 ]$ cat gluster-pvc.json
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "glusterc",
"annotations": {
"volume.alpha.kubernetes.io/storage-class": "glusterfs"
}
},
"spec": {
"accessModes": [
"ReadOnlyMany"
],
"resources": {
"requests": {
"storage": "20Gi"
}
}
}
}
[hchiramm@dhcp35-111 ]$

Let's start defining the endpoint, service and PVC.


[hchiramm@dhcp35-111 ]$ ./kubectl create -f glusterfs-endpoint.json
endpoints "glusterfs-cluster" created
[hchiramm@dhcp35-111 ]$ ./kubectl create -f gluster-service.json
service "glusterfs-cluster" created
[hchiramm@dhcp35-111 ]$ ./kubectl get ep,service
NAME ENDPOINTS AGE
ep/glusterfs-cluster 10.36.6.105:1 14s
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/glusterfs-cluster 10.0.0.10 1/TCP 9s
svc/kubernetes 10.0.0.1 443/TCP 13m
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
[hchiramm@dhcp35-111 ]$

Now, let's request a claim!

[hchiramm@dhcp35-111 ]$ ./kubectl create -f glusterfs-pvc.json
persistentvolumeclaim "glusterc" created
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv/pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c  20Gi ROX  Bound  default/glusterc 2s
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
pvc/glusterc Bound pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c 0 3s
[hchiramm@dhcp35-111 ]$

Awesome! Based on the request it created a PV and BOUND to the PVClaim!!


[hchiramm@dhcp35-111 ]$ ./kubectl describe pv pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c
Name: pvc-39ebcdc5-442b-11e6-8dfa-54ee7551fd0c
Labels:
Status: Bound
Claim: default/glusterc
Reclaim Policy: Delete
Access Modes: ROX
Capacity: 20Gi
Message:
Source:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: glusterfs-cluster
 Path: vol_038b56756f4e3ab4b07a87494097941c
ReadOnly: false
No events.
[hchiramm@dhcp35-111 ]$
 

Verify the volume exist in backend:

 [root@ ~]# heketi-cli volume list |grep 038b56756f4e3ab4b07a87494097941c
 038b56756f4e3ab4b07a87494097941c
 [root@ ~]#

Let's delete the PV claim --


[hchiramm@dhcp35-111 ]$ ./kubectl delete pvc glusterc
persistentvolumeclaim "glusterc" deleted
[hchiramm@dhcp35-111 ]$ ./kubectl get pv,pvc
[hchiramm@dhcp35-111 ]$

It got deleted! Verify it from backend:


 [root@ ~]# heketi-cli volume list |grep 038b56756f4e3ab4b07a87494097941c
 [root@ ~]# 

We can use the Volume for app pods by referring the claim name. Hope this is a nice feature to have !

Please let me know if you have any comments/suggestions.

Also, the patch - https://github.com/kubernetes/kubernetes/pull/30888 is undergoing review in upstream as mentioned earlier and hopefully it will make it soon to the kubernetes release. I will provide an update here as soon as its available in upstream.

by on March 29, 2016

Persistent Volume and Claim in OpenShift and Kubernetes using GlusterFS Volume Plugin

OpenShift is a platform as a service product from Red Hat. The software that runs the service is open-sourced under the name OpenShift Origin, and is available on GitHub.

OpenShift v3 is a layered system designed to expose underlying Docker and Kubernetes concepts as accurately as possible, with a focus on easy composition of applications by a developer. For example, install Ruby, push code, and add MySQL.

Docker is an open platform for developing, shipping, and running applications. With Docker you can separate your applications from your infrastructure and treat your infrastructure like a managed application. Docker does this by combining kernel containerization features with workflows and tooling that help you manage and deploy your applications. Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. Available on GitHub.

Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon a decade and a half of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community. Available on GitHub.

GlusterFS is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. GlusterFS is free and open source software. Available on GitHub.

Hope you know a little bit of all the above Technologies, now we jump right into our topic which is Persistent Volume and Persistent volume claim in Kubernetes and Openshift v3 using GlusterFS volume. So what is Persistent Volume? Why do we need it? How does it work using GlusterFS Volume Plugin?

In Kubernetes, Managing storage is a distinct problem from managing compute. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this we introduce two new API resources in kubernetes: PersistentVolume and PersistentVolumeClaim.

A PersistentVolume (PV) is a piece of networked storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g, can be mounted once read/write or many times read-only).

In simple words, Containers in Kubernetes Cluster need some storage which should be persistent even if the container goes down or no longer needed. So Kubernetes Administrator creates a Storage(GlusterFS storage, In this case) and creates a PV for that storage. When a Developer (Kubernetes cluster user) needs a Persistent Volume in a container, creates a Persistent Volume claim. Persistent Volume Claim will contain the options which Developer needs in the pods. So from list of Persistent Volume the best match is selected for the claim and Binded to the claim. Now the developer can use the claim in the pods.


Prerequisites:

Need a Kubernetes or Openshift cluster, My setup is one master and three nodes.

Note: you can use kubectl in place of oc, oc is openshift controller which is a wrapper around kubectl. I am not sure about the difference.


#oc get nodes
NAME LABELS STATUS AGE
dhcp42-144.example.com kubernetes.io/hostname=dhcp42-144.example.com,name=node3 Ready 15d
dhcp42-235.example.com kubernetes.io/hostname=dhcp42-235.example.com,name=node1 Ready 15d
dhcp43-174.example.com kubernetes.io/hostname=dhcp43-174.example.com,name=node2 Ready 15d
dhcp43-183.example.com kubernetes.io/hostname=dhcp43-183.example.com,name=master Ready,SchedulingDisabled 15d

2) Have a GlusterFS cluster setup, Create a GlusterFS Volume and start the GlusterFS volume.

# gluster v status
Status of volume: gluster_vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 170.22.42.84:/gluster_brick 49152 0 Y 8771
Brick 170.22.43.77:/gluster_brick 49152 0 Y 7443
NFS Server on localhost 2049 0 Y 7463
NFS Server on 170.22.42.84 2049 0 Y 8792
Task Status of Volume gluster_vol
------------------------------------------------------------------------------
There are no active volume tasks

3) All nodes in kubernetes cluster must have GlusterFS-Client Package installed.

Now we have the prerequisites o/ …

In Kube-master administrator has to write required yaml file which will be given as input to the kube cluster.

There are three files to be written by administrator and one by Developer.

Service
Service Keeps the endpoint to be persistent or active.
Endpoint
Endpoint is the file which points to the GlusterFS cluster location.
PV
PV is Persistent Volume where the administrator will define the gluster volume name, capacity of volume and access mode.
PVC
PVC is persistent volume claim where developer defines the type of storage as needed.

STEP 1: Create a service for the gluster volume.


# cat gluster_pod/gluster-service.yaml
apiVersion: "v1"
kind: "Service"
metadata:
name: "glusterfs-cluster"
spec:
ports:
- port: 1
# oc create -f gluster_pod/gluster-service.yaml
service "glusterfs-cluster" created

Verify:

# oc get service
NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
glusterfs-cluster 172.30.251.13 1/TCP 9m
kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 16d

STEP 2: Create an Endpoint for the gluster service

# cat gluster_pod/gluster-endpoints.yaml
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 170.22.43.77
ports:
- port: 1

The ip here is the glusterfs cluster ip.


# oc create -f gluster_pod/gluster-endpoints.yaml
endpoints "glusterfs-cluster" created
# oc get endpoints
NAME ENDPOINTS AGE
glusterfs-cluster 170.22.43.77:1 3m
kubernetes 170.22.43.183:8053,170.22.43.183:8443,170.22.43.183:8053 16d

STEP 3: Create a PV for the gluster volume.

# cat gluster_pod/gluster-pv.yaml
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: "gluster-default-volume"
spec:
capacity:
storage: "8Gi"
accessModes:
- "ReadWriteMany"
glusterfs:
endpoints: "glusterfs-cluster"
path: "gluster_vol"
readOnly: false
persistentVolumeReclaimPolicy: "Recycle"

Note : path here is the gluster volume name. Access mode specifies the way to access the volume. Capacity has the storage size of the GlusterFS volume.


# oc create -f gluster_pod/gluster-pv.yaml
persistentvolume "gluster-default-volume" created
# oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
gluster-default-volume 8Gi RWX Available 36s

STEP 4: Create a PVC for the gluster PV.


# cat gluster_pod/gluster-pvc.yaml
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "glusterfs-claim"
spec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "8Gi"

Note: the Developer request for 8 Gb of storage with access mode rwx.


# oc create -f gluster_pod/gluster-pvc.yaml
persistentvolumeclaim "glusterfs-claim" created
# oc get pvc
NAME LABELS STATUS VOLUME CAPACITY ACCESSMODES AGE
glusterfs-claim Bound gluster-default-volume 8Gi RWX 14s

Here the pvc is bounded as soon as created, because it found the PV that satisfies the requirement. Now lets go and check the pv status


# oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
gluster-default-volume 8Gi RWX Bound default/glusterfs-claim 5m

See now the PV has been bound to “default/glusterfs-claim”. In this state developer has the Persistent Volume Claim bounded successfully, now the developer can use the pv claim like below.

STEP 5: Use the persistent Volume Claim in a Pod defined by the Developer.


# cat gluster_pod/gluster_pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: mygluster
image: ashiq/gluster-client
command: ["/usr/sbin/init"]
volumeMounts:
- mountPath: "/home"
name: gluster-default-volume
volumes:
- name: gluster-default-volume
persistentVolumeClaim:
claimName: glusterfs-claim

The above pod definition will pull the ashiq/gluster-client image(some private image) and start init script. The gluster volume will be mounted on the host machine by the GlusterFS volume Plugin available in the kubernetes and then bind mounted to the container’s /home. So all the Kubernetes cluster nodes must have glusterfs-client packages.

Lets try running.


# oc create -f gluster_pod/fedora_pod.yaml
pod "mypod" created
# oc get pods
NAME READY STATUS RESTARTS AGE
mypod 1/1 Running 0 1m

Wow its running… lets go and check where it is running.

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec57d62e3837 ashiq/gluster-client "/usr/sbin/init" 4 minutes ago Up 4 minutes k8s_myfedora.dc1f7d7a_mypod_default_5d301443-ec20-11e5-9076-5254002e937b_ed2eb8e5
1439dd72fb1d openshift3/ose-pod:v3.1.1.6 "/pod" 4 minutes ago Up 4 minutes k8s_POD.e071dbf6_mypod_default_5d301443-ec20-11e5-9076-5254002e937b_4d6a7afb

Found the Pod running successfully on one of the Kubernetes node.

On the host:


# df -h | grep gluster_vol
170.22.43.77:gluster_vol 35G 4.0G 31G 12% /var/lib/origin/openshift.local.volumes/pods/5d301443-ec20-11e5-9076-5254002e937b/volumes/kubernetes.io~glusterfs/gluster-default-volume

I can see the gluster volume being mounted on the host o/. Lets check inside the container. Note the random number is the container-id from the docker ps command.


# docker exec -it ec57d62e3837 /bin/bash
[root@mypod /]# df -h | grep gluster_vol
170.22.43.77:gluster_vol 35G 4.0G 31G 12% /home

Yippy the GlusterFS volume has been mounted inside the container on /home as mentioned in the pod definition. Lets try writing something to it


[root@mypod /]# mkdir /home/ashiq
[root@mypod /]# ls /home/
ashiq

Since the AccessMode is RWX I am able to write to the mount point.

That’s all Folks.

Author: Mohamed Ashiq

by on April 18, 2014

OpenStack Icehouse brings new features for the enterprise

OpenStack Icehouse

Deploying an open source enterprise cloud just got a little bit easier yesterday with the release of the newest version of the OpenStack platform: Icehouse. To quote an email from OpenStack release manager Thierry Carrez announcing the release, “During this cycle we added a new integrated component (Trove), completed more than 350 feature blueprints, and fixed almost 3000 reported bugs in integrated projects alone!”

read more

by on April 10, 2014

How to govern a project on the scale of OpenStack

Managing collaborative open source projects

How an open source project is governed can matter just as much as the features it supports, the speed at which it runs, or the code that underlies it. Some open source projects have what we might call a “benevolent dictator for life.” Others are outgrowths of corporate projects that, while open, still have their goals and code led by the company that manages it. And of course, there are thousands of projects out there that are written and managed by a single person or a small group of people for whom governance is less of an issue than insuring project sustainability.

read more

by on March 10, 2014

OpenNebula: Native GlusterFS Image Access for KVM Drivers

If you saw our Gluster Spotlight (“Integration Nation”) last week, you’ll recall that Javi and Jaime from the OpenNebula project were discussing their recent advances with GlusterFS and libgfapi access. Here’s a post where they go into some detail about it:

The good news is that for some time now qemu and libvirt have native support for GlusterFS. This makes possible for VMs running from images stored in Gluster to talk directly with its servers making the IO much faster.

In this case, they use GFAPI for direct virtual machine access in addition to the FUSE-based GlusterFS client mount for image registration as an example of using the best tool for a particular job. As they explain, OpenNebula administrators expect a mounted, POSIX filesystem for many operations, so the FUSE-based mount fits best with their workflow while GFAPI works when lower latency and better performance are called for.

Read the full post here.

The GFAPI integration is slated for the 4.6 release of OpenNebula. To get an early look at the code, check out their Git repository. Documentation is available here.

by on March 6, 2014

Deploying Pydio in AWS with GlusterFS

(This was originally published at the pyd.io web site)

Introduction

Deploying Pydio in a highly-demanding environment (lots of users, tons of documents) to achieve a dropbox-like server at scale requires a solid and elastic architecture.

As a distributed file-system and software-defined storage, GlusterFS is a low-cost way of providing robust storage architecture on standard hardware. On its side, having kept the FileSystem driver at its core since the beginnings of the project, Pydio is a perfect match to be deployed on top of Gluster, to provide user-friendly features and enterprise-grade security.

Architecture

The principle here is to provide High Availability and Scalability combining GlusterFS (for the storage part) and Pydio (for the access part) through a load-balanced cluster of nodes.

We choose here to install Pydio ( = compute ) and the Gluster bricks ( = storage) on the same instances, but every configuration can be imagined : N dedicated nodes for storage, and a subset of them running Pydio, or none of them running Pydio and K nodes of compute, etc.

Also, we choose to set up two Gluster volumes (each of them assembling 2p bricks), for an easier maintenance: one will contain some Pydio shared configurations, allowing the startup of a new Pydio node without hassle, and one will contain the actual users data (files). On EC2, we will use EBS volumes as primary bricks for the data gluster volume, and instances available disk space for the configs gluster bricks. Finally, a DB must be set up to receive all the annex Pydio data (namely users and ACLs, event logs, etc). This DB can be running on another instance, or eventually installed on one of the nodes. It should be replicated and backed-up for a better failover scenario.

The following schema shows an overview of the targeted architecture.

ArchitectureSchema.002-Gluster

Launch Instances

Create two (or four) EC2 instances, attaching to each an EBS of X Gb depending on the size you require. We chose Ubuntu 12.04 as the OS. Make sure to use a quite open security group, we’ll restrict permissions later. Instances will start with both PRIVATE and PUBLIC ips/dns. Update apt package lists with sudo apt-get update

GlusterFS Setup

Prepare Gluster bricks

We’ll use one for the actual data, and one for Pydio configurations data

$ sudo apt-get install glusterfs-server xfsprogs

$ sudo mkfs.xfs /dev/xvdb $ sudo mkdir /mnt/ebs

$ sudo mount /dev/xvdb /mnt/ebs

And add the line to /etc/fstab to automount at startup

/dev/xvdb       /mnt/ebs        xfs defaults    0 0

Let’s also create a dedicated folder for the configs volume, on both nodes

$ sudo mkdir /var/confbrick

Create and start the volumes

Recognize nodes each other

On node 1

$ sudo gluster peer probe PRIVATE2

On node 2

$ sudo gluster peer probe PRIVATE1

$ sudo gluster volume create pydio-data replica 2 transport tcp PRIVATE1:/mnt/ebs PRIVATE2:/mnt/ebs

$ sudo gluster volume create pydio-config replica 2 transport tcp PRIVATE1:/var/confbrick PRIVATE2:/var/confbrick

sudo gluster volume start pydio-data

sudo gluster volume start pydio-config

Mount the volumes on both nodes

If not already installed,

$ sudo apt-get install glusterfs-client

Create folders /mnt/pydio-config and /mnt/pydio-data

Edit /etc/fstab again, add in each node the following lines

PRIVATE1:/pydio-data /mnt/pydio-data glusterfs defaults,_netdev 0 0

PRIVATE1:/pydio-config /mnt/pydio-config glusterfs defaults,_netdev 0 0

Then remount everything $ sudo mount -a

Verify everything is mounted :  $ df -h

ubuntu@ip-10-62-94-160:/mnt/ebs$ df -h
Filesystem                                                Size  Used Avail Use% Mounted on
/dev/xvda1                                                7.9G  939M  6.6G  13% /
udev                                                      1.9G   12K  1.9G   1% /dev
tmpfs                                                     751M  168K  750M   1% /run
none                                                      5.0M     0  5.0M   0% /run/lock
none                                                      1.9G     0  1.9G   0% /run/shm
/dev/xvdb                                                 10G   33M   10G   1% /mnt/ebs
PRIVATE1:/pydio-data                                      10G   33M   10G   1% /mnt/pydio-data
PRIVATE1:/pydio-config                                    7.9G  939M  6.6G  13% /mnt/pydio-config

Make sure the webserver will be able to use these two folders

$ sudo chown -R www-data: /mnt/pydio-data

$ sudo chown -R www-data: /mnt/pydio-config

Now touch a file on one node and verify it’s on the other side.

Set up DB

For example on Node 1

sudo apt-get install mysql-server

Set up a root password, and allow MySQL to listen to external connexions: comment out following line in /etc/myslq/my.cnf

#bind-address           = 127.0.0.1

Using the EC2 PUBLIC address in the Pydio Config

Create a database
mysql> create database pydio;
mysql> grant all privileges on pydio.* to 'pydio'@'%' with grant option;

(Make sure to add a password, or update password at the end, otherwise it creates users with empt password)

Deploy pydio

First Node

Get the script from https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh and run it as root.

$ wget https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh
$ chmod u+x pydio-gluster.sh
$ ./pydio-gluster.sh

Once finished, start or restart apache
$ apachectl start
Go to the public IP of the node through a web-browser http://PUBLIC_IP1/pydio/, and follow the standard installation process. Setup admin login and global options, and for the Configurations Storage, choose Database  > Mysql , and use the public IP of the DB node as server host.

Installation Process

Installation Process

Then save an connect as admin, switch to the « Settings » workspace, and do some customization as you like in the configuration. You can activate some additional plugins, customize logo and application title, etc. The interesting part of doing that now is that any changes will be automatically reported to the other nodes you switch on.

Settings Panel

Settings Panel Sample

 

Second Node

As they will share their base configuration through the gluster pydio-config volume, the next nodes will directly inherit from the first node configs. So to add fire a new node, all you will have to do will be the script part:

$ wget https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh
$ chmod u+x pydio-gluster.sh
$ ./pydio-gluster.sh

Then verify that pydio is up and running, and that you can log in with the same credentials, at http://PUBLIC_IP2/pydio/

Load Balancer

AWS LoadBalancer

We could use a custome compute node equiped with HAProxy or some similar software, but as our tutorial is running on AWS, we will use the available service to that: LoadBalancer. In your AWS console, create a LoadBalancer, forwarding port 80 to instances port 80.

Creating a LoadBalancer

Creating a LoadBalancer

To configure how healthcheck will be performed (how does the LB check that instances are alive), make sure to change the name of the file checked to check.txt. It is important because thanks to our install scripts, the nodes Apache servers are configured to skip the log of calls to this file, to avoid filling the logs with useless data (happening every 5s).

NOTE If you have an SSL certificate, which is definitely a good security rule, you will install it on this LoadBalancer, and redirect port 443 to 80: internal communications do not need to be encrypted.

Session Stickyness

Once edited and created, edit the « Stickyness » parameter of the redirection rules and choose « Enable Application Generated Cookie Stickyness », using « AjaXplorer » as cookie name. This is important, as although clients will be randonly redirected to instances on first connexion, once a session is established, it will always stay on a given instance.

Session Stickyness

Session Stickyness

 

NOTE Session stickyness avoid us to set up a session-sharing mechanism between nodes, but this could be done for example adding a memcache server.

Outside world address

Now that our various nodes will be accessed through a proxy and not through their « natural » public IP, we need to inform Pydio of that. This is necessary to generate correct sharing URLs, or sending emails pointing to the correct URL. Without that, Pydio would try to auto-detect the IP, and would probably end up displaying the PRIVATE IP of the current working node.

Login as admin to Pydio, and go the Settings > Global Configurations > Pydio Main Options. Here, update the fields Server URL and Download URL with the real addresses, and save. Go to a file workspace and try to share a file or a folder, and verify the link is correct and working.

Pydio Main Options, updated with Load Balancer address

Pydio Main Options, updated with Load Balancer address

Conclusion: adding new nodes on-demand

Well, that’s pretty much. We could refine this architecture on many points, but basically you’re good to go.

So what do you do to add a new node? Basically you’ll have to

[if you need more storage]

 

  1. Fire up a new instance with the ubuntu OS
  2. Configure Gluster to add it as a new brick to the volume

[if you need more compute]

  1. Fire up a new instance with the ubuntu OS
  2. Configure the gluster client to mount the volumes,
  3. Run the Pydio script to deploy and load configs
  4. Add this node to the LoadBalancer instances list.

Wishing you a happy scaling!

by on January 9, 2014

Configuring OpenStack Havana Cinder, Nova and Glance to run on GlusterFS

Configuring Glace, Cinder and Nova for OpenStack Havana to run on GlusterFS is actually quite simple; assuming that you’ve already got GlusterFS up and running.

So lets first look at my Gluster configuration. As you can see below, I have a Gluster volume defined for Cinder, Glance and Nova.… Read the rest

The post Configuring OpenStack Havana Cinder, Nova and Glance to run on GlusterFS appeared first on vmware admins.

by on December 19, 2013

Installing GlusterFS on RHEL 6.4 for OpenStack Havana (RDO)

The OpenCompute systems are the the ideal hardware platform for distributed filesystems. Period. Why? Cheap servers with 10GB NIC’s and a boatload of locally attached cheap storage!

In preparation for deploying RedHat RDO on RHEL, the distributed filesystem I chose was GlusterFS.… Read the rest

The post Installing GlusterFS on RHEL 6.4 for OpenStack Havana (RDO) appeared first on vmware admins.

by on December 12, 2013

The Tyranny of the Clouds

Or “How I learned to start worrying and never trust the cloud.”

The Clouderati have been derping for some time now about how we’re all going towards the public cloud and “private cloud” will soon become a distant, painful memory, much like electric generators filled the gap before power grids became the norm. They seem far too glib about that prospect, and frankly, they should know better. When the Clouderati see the inevitability of the public cloud, their minds lead to unicorns and rainbows that are sure to follow. When I think of the inevitability of the public cloud, my mind strays to “The Empire Strikes Back” and who’s going to end up as Han Solo. When the Clouderati extol the virtues of public cloud providers, they prove to be very useful idiots advancing service providers’ aims, sort of the Lando Calrissians of the cloud wars. I, on the other hand, see an empire striking back at end users and developers, taking away our hard-fought gains made from the proliferation of free/open source software. That “the empire” is doing this *with* free/open source software just makes it all the more painful an irony to bear.

I wrote previously that It Was Never About Innovation, and that article was set up to lead to this one, which is all about the cloud. I can still recall talking to Nicholas Carr about his new book at the time, “The Big Switch“, all about how we were heading towards a future of utility computing, and what that would portend. Nicholas saw the same trends the Clouderati did, except a few years earlier, and came away with a much different impression. Where the Clouderati are bowled over by Technology! and Innovation!, Nicholas saw a harbinger of potential harm and warned of a potential economic calamity as a result. While I also see a potential calamity, it has less to do with economic stagnation and more to do with the loss of both freedom and equality.

The virtuous cycle I mentioned in the previous article does not exist when it comes to abstracting software over a network, into the cloud, and away from the end user and developer. In the world of cloud computing, there is no level playing field – at least, not at the moment. Customers are at the mercy of service providers and operators, and there are no “four freedoms” to fall back on.

When several of us co-founded the Open Cloud Initiative (OCI), it was with the intent, as Simon Phipps so eloquently put it, of projecting the four freedoms onto the cloud. There have been attempts to mandate additional terms in licensing that would force service providers to participate in a level playing field. See, for example, the great debates over “closing the web services loophole” as we called it then, during the process to create the successor to the GNU General Public License version 2. Unfortunately, while we didn’t yet realize it, we didn’t have the same leverage as we had when software was something that you installed and maintained on a local machine.

The Way to the Open Cloud

Many “open cloud” efforts have come and gone over the years, none of them leading to anything of substance or gaining traction where it matters. Bradley Kuhn helped drive the creation of the Affero GPL version 3, which set out to define what software distribution and conveyance mean in a web-driven world, but the rest of the world has been slow to adopt because, again, service providers have no economic incentive to do so. Where we find ourselves today is a world without a level playing field, which will, in my opinion, stifle creativity and, yes, innovation. It is this desire for “innovation” that drives the service providers to behave as they do, although as you might surmise, I do not think that word means what they think it means. As in many things, service providers want to be the arbiters of said innovation without letting those dreaded freeloaders have much of a say. Worse yet, they create services that push freeloaders into becoming part of the product – not a participant in the process that drives product direction. (I know, I know: yes, users can get together and complain or file bugs, but they cannot mandate anything over the providers)

Most surprising is that the closed cloud is aided and abetted by well-intentioned, but ultimately harmful actors. If you listen to the Clouderati, public cloud providers are the wonderful innovators in the space, along with heaping helpings of concern trolling over OpenStack’s future prospects. And when customers lose because a cloud company shuts its doors, the clouderati can’t be bothered to bring themselves to care: c’est la vie and let them eat cake. The problem is that too many of the clouderati think that Innovation! is a means to its own ends without thinking of ground rules or a “bill of rights” for the cloud. Innovation! and Technology! must rule all, and therefore the most innovative take all, and anything else is counter-productive or hindering the “free market”. This is what happens when the libertarian-minded carry prejudiced notions of what enabled open source success without understanding what made it possible: the establishment and codification of rights and freedoms. None of the Clouderati are evil, freedom-stealing, or greedy, per se, but their actions serve to enable those who are. Because they think solely in terms of Innovation! and Technology!, they set the stage for some companies to dominate the cloud space without any regard for establishing a level playing field.

Let us enumerate the essential items for open innovation:

  1. Set of ground rules by which everyone must abide, eg. the four freedoms
  2. Level playing field where every participant is a stakeholder in a collaborative effort
  3. Economic incentives for participation

These will be vigorously opposed by those who argue that establishing such a list is too restrictive for innovation to happen, because… free market! The irony is that establishing such rules enabled Open Source communities to become the engine that runs the world’s economy. Let us take each and discuss its role in creating the open cloud.

Ground Rules

We have already established the irony that the four freedoms led to the creation of software that was used as the infrastructure for creating proprietary cloud services. What if the four freedoms where tweaked for cloud services. As a reminder, here are the four freedoms:

  • The freedom to run the program, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1).
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3).

If we rewrote this to apply to cloud services, how much would need to change? I made an attempt at this, and it turns out that only a couple of words need to change:

  • The freedom to run the program or service, for any purpose (freedom 0).
  • The freedom to study how the service works, and change it so it does your computing as you wish (freedom 1).
  • The freedom to implement and redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to implement your modified versions for others (freedom 3).

Freedom 0 adds “or service” to denote that we’re not just talking about a single program, but a set of programs that act in concert to deliver a service.

Freedom 1 allows end users and developers to peak under the hood.

Freedom 2 adds “implement and” to remind us that the software alone is not much use – the data forms a crucial part of any service.

Freedom 3 also changes “distribute copies of” to “implement” because of the fundamental role that data plays in any service. Distributing copies of software in this case doesn’t help anyone without also adding the capability of implementing the modified service, data and all.

Establishing these rules will be met, of course, with howls of rancor from the established players in the market, as it should be.

Level Playing Field

With the establishment of the service-oriented freedoms, above, we have the foundation for a level playing field with actors from all sides having a stake in each other’s success. Each of the enumerated freedoms serves to establish a managed ecosystem, rather than a winner-take-all pillage and plunder system. This will be countered by the argument that if we hinder the development of innovative companies won’t we a.) hinder economic growth in general and b.) socialism!

In the first case, there is a very real threat from a winner-take-all system. In its formative stages, when everyone has the economic incentive to innovate (there’s that word again!), everyone wins. Companies create and disrupt each other, and everyone else wins by utilizing the creations of those companies. But there’s a well known consequence of this activity: each actor will try to build in the ability to retain customers at all costs. We have seen this happen in many markets, such as the creation of proprietary, undocumented data formats in the office productivity market. And we have seen it in the cloud, with the creation of proprietary APIs that lock in customers to a particular service offering. This, too, chokes off economic development and, eventually, innovation. At first, this lock in happens via the creation of new products and services which usually offer new features that enable customers to be more productive and agile. Over time, however, once the lock-in is established, customers find that their long-term margins are not in their favor, and moving to another platform proves too costly and time-consuming. If all vendors are equal, this may not be so bad, because vendors have an incentive to lure customers away from their existing providers, and the market becomes populated by vendors competing for customers, acting in their interest. Allow one vendor to establish a larger share than others, and this model breaks down. In a monopoly situation, the incumbent vendor has many levers to lock in their customers, making the transition cost too high to switch to another provider. In cloud computing, this winner-take-all effect is magnified by the massive economies of scale enjoyed by the incumbent providers. Thus, the customer is unable to be as innovative as they could be due to their vendor’s lock-in schemes. If you believe in unfettered Innovation! at all costs, then you must also understand the very real economic consequences of vendor lock-in. By creating a level playing field through the establishment of ground rules that ensure freedom, a sustainable and innovative market is at least feasible. Without that, an unfettered winner-take-all approach will invariably result in the loss of freedom and, consequently, agility and innovation.

Economic Incentives

This is the hard one. We have already established that open source ecosystems work because all actors have an incentive to participate, but we have not established whether the same incentives apply here. In the open source software world, developers participate because they had to, because the price of software is always dropping, and customers enjoy open source software too much to give it up for anything else. One thing that may be in our favor is the distinct lack of profits in the cloud computing space, although that changes once you include services built on cloud computing architectures.

If we focus on infrastructure as a service (IaaS) and platform as a service (PaaS), the primary gateways to creating cloud-based services, then the margins and profits are quite low. This market is, by its nature, open to competition because the race is on to lure as many developers and customers as possible to the respective platform offerings. However, the danger becomes if one particular service provider is able to offer proprietary services that give it leverage over the others, establishing the lock-in levers needed to pound the competition into oblivion.

In contrast to basic infrastructure, the profit margins of proprietary products built on top of cloud infrastructure has been growing for some time, which incentivizes the IaaS and PaaS vendors to keep stacking proprietary services on top of their basic infrastructure. This results in a situation where increasing numbers of people and businesses have happily donated their most important business processes and workflows to these service providers. If any of them are to grow unhappy with the service, they cannot easily switch, because no competitor would have access to the same data or implementation of that service. In this case, not only is there a high cost associated with moving to another service, there is the distinct loss of utility (and revenue) that the customer would experience. There is a cost that comes from entrusting so much of your business to single points of failure with no known mechanism for migrating to a competitor.

In this model, there is no incentive for service providers to voluntarily open up their data or services to other service providers. There is, however, an incentive for competing service providers to be more open with their products. One possible solution could be to create an Open Cloud certification that would allow services that abide by the four freedoms in the cloud to differentiate themselves from the rest of the pack. If enough service providers signed on, it would lead to a network effect adding pressure to those providers who don’t abide by the four freedoms. This is similar to the model established by the Free Software Foundation and, although the GNU people would be loathe to admit it, the Open Source Initiative. The OCI’s goal was to ultimately create this, but we have not yet been able to follow through on those efforts.

Conclusion

We have a pretty good idea why open source succeeded, but we don’t know if the open cloud will follow the same path. At the moment, end users and developers have little leverage in this game. One possibility would be if end users chose, at massive scale, to use services that adhered to open cloud principles, but we are a long way away from this reality. Ultimately, in order for the open cloud to succeed, there must be economic incentives for all parties involved. Perhaps pricing demands will drive some of the lower rung service providers to adopt more open policies. Perhaps end users will flock to those service providers, starting a new virtuous cycle. We don’t yet know. What we do know is that attempts to create Innovation! will undoubtedly lead to a stacked deck and a lack of leverage for those who rely on these services.

If we are to resolve this problem, it can’t be about innovation for innovation’s sake – it must be, once again, about freedom.