all posts tagged KVM


by on May 9, 2014

Gluster scale-out tests: an 84 node volume

This post describes recent tests done by Red Hat on an 84 node gluster volume.  Our experiments measured performance characteristics and management behavior. To our knowledge, this is the largest performance test ever done under controlled conditions within the organization (we have heard of larger clusters in the community but do not know any details about them).

Red Hat officially supports up to 64 gluster servers in a cluster and our tests exceed that. But the problems we encounter are not theoretical. The scalability issues appeared to be related to the number of bricks, not the number of servers. If a customer was to use just 16 servers, but have 60 drives on each, they would have 960 bricks and likely see similar issues to what we found.

Summary: With one important exception, our tests show gluster scales linearly on common I/O patterns. The exception is on file create operations. On creates, we observe network overhead increase as the cluster grows. This issue appears to have a solution and a fix is forthcoming.

We also observe that gluster management operations become slower as the number of nodes increases. bz 1044693 has been opened for this.  However, we were using the shared local disk on the hypervisor, rather than the disk dedicated to the VM. When this was changed, performance of the commands increased, e.g. 8 seconds.

Configuration

Configuring an 84 node volume is easier said than done. Our intention was to build a methodology (tools and procedures) to spin up and tear down a large cluster of gluster servers at will.

We do not have 84 physical machines available. But our lab does have very powerful servers (described below). They can run multiple gluster servers at a time in virtual machines.  We ran 12 such VMs on each physical machine. Each virtual machine was bound to its own disk and CPU. Using this technique, we are able to use 7 physical servers to test 84 nodes.

Tools to setup and manage clusters of this many virtual machines are nascent. Much configuration work must be done by hand. The general technique is to create a “golden copy” VM and “clone” it many times. Care must be taken to keep track of IP addresses, host names, and the like. If a single VM is misconfigured , it can be difficult to locate the problem within a large cluster.

Puppet and Chef are good candidates to simplify some of the work, and vagrant can create virtual machines and do underlying resource management, but everything still must be tied together and programmed. Our first implementation did not use the modern tools. Instead, crude but effective bash, expect, and kickstart scripts were written. We hope to utilize puppet in the near term with the help from gluster configuration management guru James Shubin. If you like ugly scripts, they may be found here.

One of the biggest problem areas in this setup was networking. When KVM creates a Linux VM, a hardware address and virtual serial port console exist, and an IP address can be obtained using DHCP. But we have a limited pool of IP addresses on our public subnet- and our lab’s system administrator frowns upon 84 new IP addresses being allocated out of the blue.  Worse, the public network is 1GbE ethernet – too slow for performance testing.

To workaround those problems, we utilized static IP addresses on a private 10GbE ethernet network. This network has its own subnet and is free from lab restrictions. It does not have a DHCP server. To set the static IP address, we wrote an “expect” script which logs into the VM over the serial line, and modifies the network configuration files.

At the hypervisor level, we manually set up the virtual bridge, the disk configurations, and set the virtual-host “tuned” profile.

Once the system was built, it quickly became apparent that another set of tools would be needed to manage the running VMs. For example, it is sometimes necessary to run the same command across all 84 machines. Bash scripts was written to that end, though other tools (pdsh) could have been used.

Test results

With that done, we were ready to do some tests. Our goals were:

  1. To confirm gluster “scales linearly” for large and small files- as new nodes are added, performance increases accordingly
  2. To examine behavior on large systems. Do all the management commands work?

Large files tests: gluster scales nicely.

scaling

Small file tests: gluster scales on reads, but not write-new-file.

smf-scaling

Oops. Small file writes are not scaling linerally. Whats going on here?

Looking at wireshark traces, we observed many LOOKUP calls sent to each of the nodes for every file create operation. As the number of nodes increased, so did the number of LOOKUPs. It turns out that the gluster client was doing a multicast lookup on every node on creates. It does this to confirm the file does not already exist.

The gluster parameter “lookup-unhashed” forces DHT hashing to be used. This will send the LOOKUP to the node where the new file should reside, rather than doing a multicast to all nodes. Below are the results when this setting is enabled.

write-new-file test results with the parameter set (red line). Much better!

lookup-unhashed

 

This parameter is dangerous. If the cluster’s brick topography has changed and the rebalancing was aborted, gluster may find itself in a situation believing a file does not exist, when it really does. In other words, the LOOKUP existence test would generate a false negative because DHT would have the client look to the wrong nodes. This could result in two GFIDs being accessible by the same path.

A fix is being written. It will assign generation counts to bricks. By default DHT will be used on lookups. But if the generation counts indicated a topography change had taken place on the target bricks, the client will revert to the slower broadcast mode of operation.

We observed any management commands dealing with the volume took as long as a minute. For example, the “gluster import” command on the oVirt UI takes more then 30 seconds to complete. Bug 1044693 was opened for this. In all cases the management command worked, but was very slow. See note above in red.

Future

Some additional tests we could do were suggested by gluster engineers. This would be future work:

  1. object enumeration – how well does “ls” scale for large volumes?
  2. What is the largest number of small objects (files) that a machine can handle before it makes sense to add a new node
  3. Snapshot testing for scale-out volumes
  4. Openstack behavior – what happens when the number of VMs goes up? We would look at variance and latency for the worse case.

Proposals to do larger scale-out tests:

  • We could present to gluster volumes partitions of disks. For example, a single 1TB drive could be divided into 10 100GB drives.  This could boost the cluster size by an order of magnitude. Given the disk head would be shared by multiple servers, this technique would only make sense for random I/O tests (where the head is already under stress).
  • Experiment with running gluster servers within containers.

Hardware

Gluster volumes are constructed out out a varying number of bricks embedded within separate virtual machines.   Each virtual machine has:

  • dedicated 7200-RPM SAS disk for Gluster brick
  • a file on hypervisor system disk for the operating system image
  • 2 Westmere or Sandy Bridge cores
  • 4 GB RAM

The KVM hosts are 7 standard Dell R510/R720 servers with these attributes:

  • 2-socket Westmere/Sandy Bridge Intel x86_64
  • 48/64 GB RAM
  • 1 10-GbE interface with jumbo frames (MTU=9000)
  • 12 7200-RPM SAS disks configured in JBOD mode from a Dell PERC H710 (LSI MegaRAID)

For sequential workloads, we use only 8 out of 12 guests in each host so that aggregate disk bandwidth never exceeds network bandwidth.

Clients are 8 standard Dell R610/R620 servers with:

  • 2-socket Westmere/Sandy Bridge Intel x86_64
  • 64 GB RAM
  • 1 10-GbE Intel NIC interface with jumbo frames (MTU=9000)
by on February 23, 2014

Setting up a test-environment for Apache CloudStack and Gluster

This is an example of how to configure an environment where you can test CloudStack and Gluster. It uses two machines on the same LAN, one acts as a KVM hypervisor and the other as storage and management server. Because the (virtual) networking in the hypervisor is a little more complex than the networking on the management server, the hypervisor will be setup with an OpenVPN connection so that the local LAN is not affected with 'foreign' network traffic.

I am not a CloudStack specialist, so this configuration may not be optimal for real world usage. It is the intention to be able to test CloudStack and its Gluster integration in existing networks. The CloudStack installation and configuration done is suitable for testing and development systems, for production environments it is highly recommended to follow the CloudStack documentation instead.


.----------------. .-------------------.
| | | |
| KVM Hypervisor | <------- LAN -------> | Management Server |
| | ^-- OpenVPN --^ | |
'----------------' '-------------------'
agent.cloudstack.tld storage.cloudstack.tld

Both systems have one network interface with a static IP-address. In the LAN, other IP-addresses can not be used. This makes it difficult to access virtual machines, but that does not matter too much for this testing.

Both systems need a basic installation:

  • Red Hat Enterprise Linux 6.5 (CentOS 6.5 should work too)
  • Fedora EPEL enabled (howto install epel-release)
  • enable ssh access
  • SELinux in permissive mode (or disabled)
  • firewall enabled, but not restricting anything
  • Java 1.7 from the standard java-1.7.0-openjdk packages (not Java 1.6)

On the hypervisor, an additional (internal only) bridge needs to be setup. This bridge will be used for providing IP-addresses to the virtual machines. Each virtual machine seems to need at least 3 IP-addresses. This is a default in CloudStack. This example uses virtual networks 192.168.N.0/24, where N is 0 to 4.

Configuration for the main cloudbr0 device:


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.0.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

And the additional IP-addresses on the cloudbr0 bridge (create 4 files, replace N by 1, 2, 3 and 4):


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0:N
DEVICE=cloudbr0:N
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.N.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

Enable the new cloudbr0 bridge with all its IP-addresses:


# ifup cloudbr0
Any of the VMs that have a 192.168.*.* address, should be able to get to the real LAN, and ultimately also the internet. Enabling NAT for the internal virtual networks is the easiest:

# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.1.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.2.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.3.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.4.0/24 -j MASQUERADE
# service iptables save

The hypervisor will need to be setup to act as a gateway to the virtual machines on the cloudbr0 bridge. In order to so do, a very basic OpenVPN service does the trick:


# yum install openvpn
# openvpn --genkey --secret /etc/openvpn/static.key
# cat << EOF > /etc/openvpn/server.conf
dev tun
ifconfig 192.168.200.1 192.168.200.2
secret static.key
EOF
# chkconfig openvpn on
# service openvpn start

On the management server, it is needed to configure OpenVPN as a client, so that routing to the virtual networks is possible:


# yum install openvpn
# cat << EOF > /etc/openvpn/client.conf
remote real-hostname-of-hypervisor.example.net
dev tun
ifconfig 192.168.200.2 192.168.200.1
secret static.key
EOF
# scp real-hostname-of-hypervisor.example.net:/etc/openvpn/static.key /etc/openvpn
# chkconfig opennvpn on
# service openvpn start

In /etc/hosts (on both the hypervisor and management server) the internal hostnames for the environment should be added:


#file: /etc/hosts
192.168.200.1 agent.cloudstack.tld
192.168.200.2 storage.cloudstack.tld

The hypervisor will also function as a DNS-server for the virtual machines. The easiest is to use dnsmasq which uses /etc/hosts and /etc/resolv.conf for resolving:


# yum install dnsmasq
# chkconfig dnsmasq on
# service dnsmasq start

The management server is also used as a Gluster Storage Server. Therefor it needs to have some Gluster packages:


# wget -O /etc/yum.repo.d/glusterfs-epel.repo \
http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/RHEL/glusterfs-epel.repo
# yum install glusterfs-server
# vim /etc/glusterfs/glusterd.vol

# service glusterd restart

Create two volumes where CloudStack will store disk images. Before starting the volumes, apply the required settings too. Note that the hostname that holds the bricks should be resolvable by the hypervisor and the Secondary Storage VMs. This example does not show how to create volumes for production usage, do not create volumes like this for anything else than testing and scratch data.


# mkdir -p /bricks/primary/data
# mkdir -p /bricks/secondary/data
# gluster volume create primary storage.cloudstack.tld:/bricks/primary/data
# gluster volume set primary storage.owner-uid 36
# gluster volume set primary storage.owner-gid 36
# gluster volume set primary server.allow-insecure on
# gluster volume set primary nfs.disable true
# gluster volume start primary
# gluster volume create secondary storage.cloudstack.tld:/bricks/secondary/data
# gluster volume set secondary storage.owner-uid 36
# gluster volume set secondary storage.owner-gid 36
# gluster volume start secondary

When the preparation is all done, it is time to install Apache CloudStack. It is planned to have support for Gluster in CloudStack 4.4. At the moment not all required changes are included in the CloudStack git repository. Therefor, is is needed to build the RPM packages from the Gluster Forge repository where the development is happening. On a system running RHEL-6.5, checkout the sources and build the packages (this needs a standard CloudStack development environment, including java-1.7.0-openjdk-devel, Apache Maven and others):


$ git clone git://forge.gluster.org/cloudstack-gluster/cloudstack.git
$ cd cloudstack
$ git checkout -t -b wip/master/gluster
$ cd packaging/centos63
$ ./package.sh

In the end, these packages should have been build:

  • cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-usage-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-cli-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

On the management server, install the following packages:


# yum localinstall cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

Install and configure the database:


# yum install mysql-server
# chkconfig mysqld on
# service mysqld start
# vim /etc/cloudstack/management/classpath.conf

# cloudstack-setup-databases cloud:secret --deploy-as=root:

Install the systemvm templates:


# mount -t nfs storage.cloudstack.tld:/secondary /mnt
# /usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt \
-m /mnt \
-h kvm \
-u http://jenkins.buildacloud.org/view/master/job/build-systemvm-master/lastSuccessfulBuild/artifact/tools/appliance/dist/systemvmtemplate-master-kvm.qcow2.bz2
# umount /mnt

The management server is now prepared, and the webui can get configured:


# cloudstack-setup-management

On the hypervisor, install the following additional packages:


# yum install qemu-kvm libvirt glusterfs-fuse
# yum localinstall cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
# cloudstack-setup-agent

Make sure that in /etc/cloudstack/agent/agent.properties the right NICs are being used:


guest.network.device=cloudbr0
private.bridge.name=cloudbr0
private.network.device=cloudbr0
network.direct.device=cloudbr0
public.network.device=cloudbr0

Go to the CloudStack webinterface, this should be running on the management server: http://real-hostname-of-mgmt.example.net:8080/client The default username/password is: admin / password

It is easiest to skip the configuration wizard (not sure if that supports Gluster already). When the normal interface is shown, under 'Infrastructure' a new 'Zone' can get added. The Zone wizard will need the following input:

  • DNS 1: 192.168.0.1
  • Internal DNS 1: 192.168.0.1
  • Hypervisor: KVM

Under POD, use these options:

  • Reserved system gateway: 192.168.0.1
  • Reserved system netmask: 255.255.255.0
  • Start reserved system IP: 192.168.0.10
  • End reserved system IP: 192.168.0.250

Next the network config for the virtual machines:

  • Guest gateway: 192.168.1.1
  • Guest system netmask: 255.255.255.0
  • Guest start IP: 192.168.1.10
  • Guest end IP: 192.168.1.250

Primary storage:

  • Type: Gluster
  • Server: storage.cloudstack.tld
  • Volume: primary

Secondary Storage:

  • Type: nfs
  • Server: storage.cloudstack.tld
  • path: /secondary

Hypervisor agent:

  • hostname: agent.cloudstack.tld
  • username: root
  • password: password

If this all succeeded, the newly created Zone can get enabled. After a while, there should be two system VMs listed in the Infrastructure. It is possible to log in on these system VMs and check if all is working. To do so, log in over SSH on the hypervisor and connect to the VMs through libvirt:


# virsh list
Id Name State
----------------------------------------------------
1 s-1-VM running
2 v-2-VM running

# virsh console 1
Connected to domain s-1-VM
Escape character is ^]

Debian GNU/Linux 7 s-1-VM ttyS0

s-1-VM login: root
Password: password
...
root@s-1-VM:~#

Log out from the shell, and press CTRL+] to disconnect from the console.

To verify that this VM indeed runs with the QEMU+libgfapi integration, check the log file that libvirt writes and confirm that there is a -drive with a glusterfs+tcp:// URL in /var/log/libvirt/qemu/s-1-VM.log:


... /usr/libexec/qemu-kvm -name s-1-VM ... -drive file=gluster+tcp://storage.cloudstack.tld:24007/primary/d691ac19-4ec1-47c1-b765-55f804b78bec,...
by on

Setting up a test-environment for Apache CloudStack and Gluster

This is an example of how to configure an environment where you can test CloudStack and Gluster. It uses two machines on the same LAN, one acts as a KVM hypervisor and the other as storage and management server. Because the (virtual) networking in the hypervisor is a little more complex than the networking on the management server, the hypervisor will be setup with an OpenVPN connection so that the local LAN is not affected with 'foreign' network traffic.

I am not a CloudStack specialist, so this configuration may not be optimal for real world usage. It is the intention to be able to test CloudStack and its Gluster integration in existing networks. The CloudStack installation and configuration done is suitable for testing and development systems, for production environments it is highly recommended to follow the CloudStack documentation instead.


.----------------. .-------------------.
| | | |
| KVM Hypervisor | <------- LAN -------> | Management Server |
| | ^-- OpenVPN --^ | |
'----------------' '-------------------'
agent.cloudstack.tld storage.cloudstack.tld

Both systems have one network interface with a static IP-address. In the LAN, other IP-addresses can not be used. This makes it difficult to access virtual machines, but that does not matter too much for this testing.

Both systems need a basic installation:

  • Red Hat Enterprise Linux 6.5 (CentOS 6.5 should work too)
  • Fedora EPEL enabled (howto install epel-release)
  • enable ssh access
  • SELinux in permissive mode (or disabled)
  • firewall enabled, but not restricting anything
  • Java 1.7 from the standard java-1.7.0-openjdk packages (not Java 1.6)

On the hypervisor, an additional (internal only) bridge needs to be setup. This bridge will be used for providing IP-addresses to the virtual machines. Each virtual machine seems to need at least 3 IP-addresses. This is a default in CloudStack. This example uses virtual networks 192.168.N.0/24, where N is 0 to 4.

Configuration for the main cloudbr0 device:


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.0.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

And the additional IP-addresses on the cloudbr0 bridge (create 4 files, replace N by 1, 2, 3 and 4):


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0:N
DEVICE=cloudbr0:N
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.N.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

Enable the new cloudbr0 bridge with all its IP-addresses:


# ifup cloudbr0
Any of the VMs that have a 192.168.*.* address, should be able to get to the real LAN, and ultimately also the internet. Enabling NAT for the internal virtual networks is the easiest:

# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.1.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.2.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.3.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.4.0/24 -j MASQUERADE
# service iptables save

The hypervisor will need to be setup to act as a gateway to the virtual machines on the cloudbr0 bridge. In order to so do, a very basic OpenVPN service does the trick:


# yum install openvpn
# openvpn --genkey --secret /etc/openvpn/static.key
# cat << EOF > /etc/openvpn/server.conf
dev tun
ifconfig 192.168.200.1 192.168.200.2
secret static.key
EOF
# chkconfig openvpn on
# service openvpn start

On the management server, it is needed to configure OpenVPN as a client, so that routing to the virtual networks is possible:


# yum install openvpn
# cat << EOF > /etc/openvpn/client.conf
remote real-hostname-of-hypervisor.example.net
dev tun
ifconfig 192.168.200.2 192.168.200.1
secret static.key
EOF
# scp real-hostname-of-hypervisor.example.net:/etc/openvpn/static.key /etc/openvpn
# chkconfig opennvpn on
# service openvpn start

In /etc/hosts (on both the hypervisor and management server) the internal hostnames for the environment should be added:


#file: /etc/hosts
192.168.200.1 agent.cloudstack.tld
192.168.200.2 storage.cloudstack.tld

The hypervisor will also function as a DNS-server for the virtual machines. The easiest is to use dnsmasq which uses /etc/hosts and /etc/resolv.conf for resolving:


# yum install dnsmasq
# chkconfig dnsmasq on
# service dnsmasq start

The management server is also used as a Gluster Storage Server. Therefor it needs to have some Gluster packages:


# wget -O /etc/yum.repo.d/glusterfs-epel.repo \
http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/RHEL/glusterfs-epel.repo
# yum install glusterfs-server
# vim /etc/glusterfs/glusterd.vol

# service glusterd restart

Create two volumes where CloudStack will store disk images. Before starting the volumes, apply the required settings too. Note that the hostname that holds the bricks should be resolvable by the hypervisor and the Secondary Storage VMs. This example does not show how to create volumes for production usage, do not create volumes like this for anything else than testing and scratch data.


# mkdir -p /bricks/primary/data
# mkdir -p /bricks/secondary/data
# gluster volume create primary storage.cloudstack.tld:/bricks/primary/data
# gluster volume set primary storage.owner-uid 36
# gluster volume set primary storage.owner-gid 36
# gluster volume set primary server.allow-insecure on
# gluster volume set primary nfs.disable true
# gluster volume start primary
# gluster volume create secondary storage.cloudstack.tld:/bricks/secondary/data
# gluster volume set secondary storage.owner-uid 36
# gluster volume set secondary storage.owner-gid 36
# gluster volume start secondary

When the preparation is all done, it is time to install Apache CloudStack. It is planned to have support for Gluster in CloudStack 4.4. At the moment not all required changes are included in the CloudStack git repository. Therefor, is is needed to build the RPM packages from the Gluster Forge repository where the development is happening. On a system running RHEL-6.5, checkout the sources and build the packages (this needs a standard CloudStack development environment, including java-1.7.0-openjdk-devel, Apache Maven and others):


$ git clone git://forge.gluster.org/cloudstack-gluster/cloudstack.git
$ cd cloudstack
$ git checkout -t -b wip/master/gluster
$ cd packaging/centos63
$ ./package.sh

In the end, these packages should have been build:

  • cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-usage-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-cli-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

On the management server, install the following packages:


# yum localinstall cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

Install and configure the database:


# yum install mysql-server
# chkconfig mysqld on
# service mysqld start
# vim /etc/cloudstack/management/classpath.conf

# cloudstack-setup-databases cloud:secret --deploy-as=root:

Install the systemvm templates:


# mount -t nfs storage.cloudstack.tld:/secondary /mnt
# /usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt \
-m /mnt \
-h kvm \
-u http://jenkins.buildacloud.org/view/master/job/build-systemvm-master/lastSuccessfulBuild/artifact/tools/appliance/dist/systemvmtemplate-master-kvm.qcow2.bz2
# umount /mnt

The management server is now prepared, and the webui can get configured:


# cloudstack-setup-management

On the hypervisor, install the following additional packages:


# yum install qemu-kvm libvirt glusterfs-fuse
# yum localinstall cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
# cloudstack-setup-agent

Make sure that in /etc/cloudstack/agent/agent.properties the right NICs are being used:


guest.network.device=cloudbr0
private.bridge.name=cloudbr0
private.network.device=cloudbr0
network.direct.device=cloudbr0
public.network.device=cloudbr0

Go to the CloudStack webinterface, this should be running on the management server: http://real-hostname-of-mgmt.example.net:8080/client The default username/password is: admin / password

It is easiest to skip the configuration wizard (not sure if that supports Gluster already). When the normal interface is shown, under 'Infrastructure' a new 'Zone' can get added. The Zone wizard will need the following input:

  • DNS 1: 192.168.0.1
  • Internal DNS 1: 192.168.0.1
  • Hypervisor: KVM

Under POD, use these options:

  • Reserved system gateway: 192.168.0.1
  • Reserved system netmask: 255.255.255.0
  • Start reserved system IP: 192.168.0.10
  • End reserved system IP: 192.168.0.250

Next the network config for the virtual machines:

  • Guest gateway: 192.168.1.1
  • Guest system netmask: 255.255.255.0
  • Guest start IP: 192.168.1.10
  • Guest end IP: 192.168.1.250

Primary storage:

  • Type: Gluster
  • Server: storage.cloudstack.tld
  • Volume: primary

Secondary Storage:

  • Type: nfs
  • Server: storage.cloudstack.tld
  • path: /secondary

Hypervisor agent:

  • hostname: agent.cloudstack.tld
  • username: root
  • password: password

If this all succeeded, the newly created Zone can get enabled. After a while, there should be two system VMs listed in the Infrastructure. It is possible to log in on these system VMs and check if all is working. To do so, log in over SSH on the hypervisor and connect to the VMs through libvirt:


# virsh list
Id Name State
----------------------------------------------------
1 s-1-VM running
2 v-2-VM running

# virsh console 1
Connected to domain s-1-VM
Escape character is ^]

Debian GNU/Linux 7 s-1-VM ttyS0

s-1-VM login: root
Password: password
...
root@s-1-VM:~#

Log out from the shell, and press CTRL+] to disconnect from the console.

To verify that this VM indeed runs with the QEMU+libgfapi integration, check the log file that libvirt writes and confirm that there is a -drive with a glusterfs+tcp:// URL in /var/log/libvirt/qemu/s-1-VM.log:


... /usr/libexec/qemu-kvm -name s-1-VM ... -drive file=gluster+tcp://storage.cloudstack.tld:24007/primary/d691ac19-4ec1-47c1-b765-55f804b78bec,...
by on

Setting up a test-environment for Apache CloudStack and Gluster

This is an example of how to configure an environment where you can test CloudStack and Gluster. It uses two machines on the same LAN, one acts as a KVM hypervisor and the other as storage and management server. Because the (virtual) networking in the hypervisor is a little more complex than the networking on the management server, the hypervisor will be setup with an OpenVPN connection so that the local LAN is not affected with 'foreign' network traffic.

I am not a CloudStack specialist, so this configuration may not be optimal for real world usage. It is the intention to be able to test CloudStack and its Gluster integration in existing networks. The CloudStack installation and configuration done is suitable for testing and development systems, for production environments it is highly recommended to follow the CloudStack documentation instead.


.----------------. .-------------------.
| | | |
| KVM Hypervisor | <------- LAN -------> | Management Server |
| | ^-- OpenVPN --^ | |
'----------------' '-------------------'
agent.cloudstack.tld storage.cloudstack.tld

Both systems have one network interface with a static IP-address. In the LAN, other IP-addresses can not be used. This makes it difficult to access virtual machines, but that does not matter too much for this testing.

Both systems need a basic installation:

  • Red Hat Enterprise Linux 6.5 (CentOS 6.5 should work too)
  • Fedora EPEL enabled (howto install epel-release)
  • enable ssh access
  • SELinux in permissive mode (or disabled)
  • firewall enabled, but not restricting anything
  • Java 1.7 from the standard java-1.7.0-openjdk packages (not Java 1.6)

On the hypervisor, an additional (internal only) bridge needs to be setup. This bridge will be used for providing IP-addresses to the virtual machines. Each virtual machine seems to need at least 3 IP-addresses. This is a default in CloudStack. This example uses virtual networks 192.168.N.0/24, where N is 0 to 4.

Configuration for the main cloudbr0 device:


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.0.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

And the additional IP-addresses on the cloudbr0 bridge (create 4 files, replace N by 1, 2, 3 and 4):


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0:N
DEVICE=cloudbr0:N
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.N.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

Enable the new cloudbr0 bridge with all its IP-addresses:


# ifup cloudbr0
Any of the VMs that have a 192.168.*.* address, should be able to get to the real LAN, and ultimately also the internet. Enabling NAT for the internal virtual networks is the easiest:

# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.1.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.2.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.3.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.4.0/24 -j MASQUERADE
# service iptables save

The hypervisor will need to be setup to act as a gateway to the virtual machines on the cloudbr0 bridge. In order to so do, a very basic OpenVPN service does the trick:


# yum install openvpn
# openvpn --genkey --secret /etc/openvpn/static.key
# cat << EOF > /etc/openvpn/server.conf
dev tun
ifconfig 192.168.200.1 192.168.200.2
secret static.key
EOF
# chkconfig openvpn on
# service openvpn start

On the management server, it is needed to configure OpenVPN as a client, so that routing to the virtual networks is possible:


# yum install openvpn
# cat << EOF > /etc/openvpn/client.conf
remote real-hostname-of-hypervisor.example.net
dev tun
ifconfig 192.168.200.2 192.168.200.1
secret static.key
EOF
# scp real-hostname-of-hypervisor.example.net:/etc/openvpn/static.key /etc/openvpn
# chkconfig opennvpn on
# service openvpn start

In /etc/hosts (on both the hypervisor and management server) the internal hostnames for the environment should be added:


#file: /etc/hosts
192.168.200.1 agent.cloudstack.tld
192.168.200.2 storage.cloudstack.tld

The hypervisor will also function as a DNS-server for the virtual machines. The easiest is to use dnsmasq which uses /etc/hosts and /etc/resolv.conf for resolving:


# yum install dnsmasq
# chkconfig dnsmasq on
# service dnsmasq start

The management server is also used as a Gluster Storage Server. Therefor it needs to have some Gluster packages:


# wget -O /etc/yum.repo.d/glusterfs-epel.repo \
http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/RHEL/glusterfs-epel.repo
# yum install glusterfs-server
# vim /etc/glusterfs/glusterd.vol

# service glusterd restart

Create two volumes where CloudStack will store disk images. Before starting the volumes, apply the required settings too. Note that the hostname that holds the bricks should be resolvable by the hypervisor and the Secondary Storage VMs. This example does not show how to create volumes for production usage, do not create volumes like this for anything else than testing and scratch data.


# mkdir -p /bricks/primary/data
# mkdir -p /bricks/secondary/data
# gluster volume create primary storage.cloudstack.tld:/bricks/primary/data
# gluster volume set primary storage.owner-uid 36
# gluster volume set primary storage.owner-gid 36
# gluster volume set primary server.allow-insecure on
# gluster volume set primary nfs.disable true
# gluster volume start primary
# gluster volume create secondary storage.cloudstack.tld:/bricks/secondary/data
# gluster volume set secondary storage.owner-uid 36
# gluster volume set secondary storage.owner-gid 36
# gluster volume start secondary

When the preparation is all done, it is time to install Apache CloudStack. It is planned to have support for Gluster in CloudStack 4.4. At the moment not all required changes are included in the CloudStack git repository. Therefor, is is needed to build the RPM packages from the Gluster Forge repository where the development is happening. On a system running RHEL-6.5, checkout the sources and build the packages (this needs a standard CloudStack development environment, including java-1.7.0-openjdk-devel, Apache Maven and others):


$ git clone git://forge.gluster.org/cloudstack-gluster/cloudstack.git
$ cd cloudstack
$ git checkout -t -b wip/master/gluster
$ cd packaging/centos63
$ ./package.sh

In the end, these packages should have been build:

  • cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-usage-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-cli-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

On the management server, install the following packages:


# yum localinstall cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

Install and configure the database:


# yum install mysql-server
# chkconfig mysqld on
# service mysqld start
# vim /etc/cloudstack/management/classpath.conf

# cloudstack-setup-databases cloud:secret --deploy-as=root:

Install the systemvm templates:


# mount -t nfs storage.cloudstack.tld:/secondary /mnt
# /usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt \
-m /mnt \
-h kvm \
-u http://jenkins.buildacloud.org/view/master/job/build-systemvm-master/lastSuccessfulBuild/artifact/tools/appliance/dist/systemvmtemplate-master-kvm.qcow2.bz2
# umount /mnt

The management server is now prepared, and the webui can get configured:


# cloudstack-setup-management

On the hypervisor, install the following additional packages:


# yum install qemu-kvm libvirt glusterfs-fuse
# yum localinstall cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
# cloudstack-setup-agent

Make sure that in /etc/cloudstack/agent/agent.properties the right NICs are being used:


guest.network.device=cloudbr0
private.bridge.name=cloudbr0
private.network.device=cloudbr0
network.direct.device=cloudbr0
public.network.device=cloudbr0

Go to the CloudStack webinterface, this should be running on the management server: http://real-hostname-of-mgmt.example.net:8080/client The default username/password is: admin / password

It is easiest to skip the configuration wizard (not sure if that supports Gluster already). When the normal interface is shown, under 'Infrastructure' a new 'Zone' can get added. The Zone wizard will need the following input:

  • DNS 1: 192.168.0.1
  • Internal DNS 1: 192.168.0.1
  • Hypervisor: KVM

Under POD, use these options:

  • Reserved system gateway: 192.168.0.1
  • Reserved system netmask: 255.255.255.0
  • Start reserved system IP: 192.168.0.10
  • End reserved system IP: 192.168.0.250

Next the network config for the virtual machines:

  • Guest gateway: 192.168.1.1
  • Guest system netmask: 255.255.255.0
  • Guest start IP: 192.168.1.10
  • Guest end IP: 192.168.1.250

Primary storage:

  • Type: Gluster
  • Server: storage.cloudstack.tld
  • Volume: primary

Secondary Storage:

  • Type: nfs
  • Server: storage.cloudstack.tld
  • path: /secondary

Hypervisor agent:

  • hostname: agent.cloudstack.tld
  • username: root
  • password: password

If this all succeeded, the newly created Zone can get enabled. After a while, there should be two system VMs listed in the Infrastructure. It is possible to log in on these system VMs and check if all is working. To do so, log in over SSH on the hypervisor and connect to the VMs through libvirt:


# virsh list
Id Name State
----------------------------------------------------
1 s-1-VM running
2 v-2-VM running

# virsh console 1
Connected to domain s-1-VM
Escape character is ^]

Debian GNU/Linux 7 s-1-VM ttyS0

s-1-VM login: root
Password: password
...
root@s-1-VM:~#

Log out from the shell, and press CTRL+] to disconnect from the console.

To verify that this VM indeed runs with the QEMU+libgfapi integration, check the log file that libvirt writes and confirm that there is a -drive with a glusterfs+tcp:// URL in /var/log/libvirt/qemu/s-1-VM.log:


... /usr/libexec/qemu-kvm -name s-1-VM ... -drive file=gluster+tcp://storage.cloudstack.tld:24007/primary/d691ac19-4ec1-47c1-b765-55f804b78bec,...
by on

Setting up a test-environment for Apache CloudStack and Gluster

This is an example of how to configure an environment where you can test CloudStack and Gluster. It uses two machines on the same LAN, one acts as a KVM hypervisor and the other as storage and management server. Because the (virtual) networking in the hypervisor is a little more complex than the networking on the management server, the hypervisor will be setup with an OpenVPN connection so that the local LAN is not affected with 'foreign' network traffic.

I am not a CloudStack specialist, so this configuration may not be optimal for real world usage. It is the intention to be able to test CloudStack and its Gluster integration in existing networks. The CloudStack installation and configuration done is suitable for testing and development systems, for production environments it is highly recommended to follow the CloudStack documentation instead.


.----------------. .-------------------.
| | | |
| KVM Hypervisor | <------- LAN -------> | Management Server |
| | ^-- OpenVPN --^ | |
'----------------' '-------------------'
agent.cloudstack.tld storage.cloudstack.tld

Both systems have one network interface with a static IP-address. In the LAN, other IP-addresses can not be used. This makes it difficult to access virtual machines, but that does not matter too much for this testing.

Both systems need a basic installation:

  • Red Hat Enterprise Linux 6.5 (CentOS 6.5 should work too)
  • Fedora EPEL enabled (howto install epel-release)
  • enable ssh access
  • SELinux in permissive mode (or disabled)
  • firewall enabled, but not restricting anything
  • Java 1.7 from the standard java-1.7.0-openjdk packages (not Java 1.6)

On the hypervisor, an additional (internal only) bridge needs to be setup. This bridge will be used for providing IP-addresses to the virtual machines. Each virtual machine seems to need at least 3 IP-addresses. This is a default in CloudStack. This example uses virtual networks 192.168.N.0/24, where N is 0 to 4.

Configuration for the main cloudbr0 device:


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.0.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

And the additional IP-addresses on the cloudbr0 bridge (create 4 files, replace N by 1, 2, 3 and 4):


#file: /etc/sysconfig/network-scripts/ifcfg-cloudbr0:N
DEVICE=cloudbr0:N
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.N.1
NETMASK=255.255.255.0
NM_CONTROLLED=no

Enable the new cloudbr0 bridge with all its IP-addresses:


# ifup cloudbr0
Any of the VMs that have a 192.168.*.* address, should be able to get to the real LAN, and ultimately also the internet. Enabling NAT for the internal virtual networks is the easiest:

# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.1.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.2.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.3.0/24 -j MASQUERADE
# iptables -t nat -A POSTROUTING -o eth0 -s 192.168.4.0/24 -j MASQUERADE
# service iptables save

The hypervisor will need to be setup to act as a gateway to the virtual machines on the cloudbr0 bridge. In order to so do, a very basic OpenVPN service does the trick:


# yum install openvpn
# openvpn --genkey --secret /etc/openvpn/static.key
# cat << EOF > /etc/openvpn/server.conf
dev tun
ifconfig 192.168.200.1 192.168.200.2
secret static.key
EOF
# chkconfig openvpn on
# service openvpn start

On the management server, it is needed to configure OpenVPN as a client, so that routing to the virtual networks is possible:


# yum install openvpn
# cat << EOF > /etc/openvpn/client.conf
remote real-hostname-of-hypervisor.example.net
dev tun
ifconfig 192.168.200.2 192.168.200.1
secret static.key
EOF
# scp real-hostname-of-hypervisor.example.net:/etc/openvpn/static.key /etc/openvpn
# chkconfig opennvpn on
# service openvpn start

In /etc/hosts (on both the hypervisor and management server) the internal hostnames for the environment should be added:


#file: /etc/hosts
192.168.200.1 agent.cloudstack.tld
192.168.200.2 storage.cloudstack.tld

The hypervisor will also function as a DNS-server for the virtual machines. The easiest is to use dnsmasq which uses /etc/hosts and /etc/resolv.conf for resolving:


# yum install dnsmasq
# chkconfig dnsmasq on
# service dnsmasq start

The management server is also used as a Gluster Storage Server. Therefor it needs to have some Gluster packages:


# wget -O /etc/yum.repo.d/glusterfs-epel.repo \
http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/RHEL/glusterfs-epel.repo
# yum install glusterfs-server
# vim /etc/glusterfs/glusterd.vol

# service glusterd restart

Create two volumes where CloudStack will store disk images. Before starting the volumes, apply the required settings too. Note that the hostname that holds the bricks should be resolvable by the hypervisor and the Secondary Storage VMs. This example does not show how to create volumes for production usage, do not create volumes like this for anything else than testing and scratch data.


# mkdir -p /bricks/primary/data
# mkdir -p /bricks/secondary/data
# gluster volume create primary storage.cloudstack.tld:/bricks/primary/data
# gluster volume set primary storage.owner-uid 36
# gluster volume set primary storage.owner-gid 36
# gluster volume set primary server.allow-insecure on
# gluster volume set primary nfs.disable true
# gluster volume start primary
# gluster volume create secondary storage.cloudstack.tld:/bricks/secondary/data
# gluster volume set secondary storage.owner-uid 36
# gluster volume set secondary storage.owner-gid 36
# gluster volume start secondary

When the preparation is all done, it is time to install Apache CloudStack. It is planned to have support for Gluster in CloudStack 4.4. At the moment not all required changes are included in the CloudStack git repository. Therefor, is is needed to build the RPM packages from the Gluster Forge repository where the development is happening. On a system running RHEL-6.5, checkout the sources and build the packages (this needs a standard CloudStack development environment, including java-1.7.0-openjdk-devel, Apache Maven and others):


$ git clone git://forge.gluster.org/cloudstack-gluster/cloudstack.git
$ cd cloudstack
$ git checkout -t -b wip/master/gluster
$ cd packaging/centos63
$ ./package.sh

In the end, these packages should have been build:

  • cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-usage-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-cli-4.4.0-SNAPSHOT.el6.x86_64.rpm
  • cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

On the management server, install the following packages:


# yum localinstall cloudstack-management-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-awsapi-4.4.0-SNAPSHOT.el6.x86_64.rpm

Install and configure the database:


# yum install mysql-server
# chkconfig mysqld on
# service mysqld start
# vim /etc/cloudstack/management/classpath.conf

# cloudstack-setup-databases cloud:secret --deploy-as=root:

Install the systemvm templates:


# mount -t nfs storage.cloudstack.tld:/secondary /mnt
# /usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt \
-m /mnt \
-h kvm \
-u http://jenkins.buildacloud.org/view/master/job/build-systemvm-master/lastSuccessfulBuild/artifact/tools/appliance/dist/systemvmtemplate-master-kvm.qcow2.bz2
# umount /mnt

The management server is now prepared, and the webui can get configured:


# cloudstack-setup-management

On the hypervisor, install the following additional packages:


# yum install qemu-kvm libvirt glusterfs-fuse
# yum localinstall cloudstack-common-4.4.0-SNAPSHOT.el6.x86_64.rpm \
cloudstack-agent-4.4.0-SNAPSHOT.el6.x86_64.rpm
# cloudstack-setup-agent

Make sure that in /etc/cloudstack/agent/agent.properties the right NICs are being used:


guest.network.device=cloudbr0
private.bridge.name=cloudbr0
private.network.device=cloudbr0
network.direct.device=cloudbr0
public.network.device=cloudbr0

Go to the CloudStack webinterface, this should be running on the management server: http://real-hostname-of-mgmt.example.net:8080/client The default username/password is: admin / password

It is easiest to skip the configuration wizard (not sure if that supports Gluster already). When the normal interface is shown, under 'Infrastructure' a new 'Zone' can get added. The Zone wizard will need the following input:

  • DNS 1: 192.168.0.1
  • Internal DNS 1: 192.168.0.1
  • Hypervisor: KVM

Under POD, use these options:

  • Reserved system gateway: 192.168.0.1
  • Reserved system netmask: 255.255.255.0
  • Start reserved system IP: 192.168.0.10
  • End reserved system IP: 192.168.0.250

Next the network config for the virtual machines:

  • Guest gateway: 192.168.1.1
  • Guest system netmask: 255.255.255.0
  • Guest start IP: 192.168.1.10
  • Guest end IP: 192.168.1.250

Primary storage:

  • Type: Gluster
  • Server: storage.cloudstack.tld
  • Volume: primary

Secondary Storage:

  • Type: nfs
  • Server: storage.cloudstack.tld
  • path: /secondary

Hypervisor agent:

  • hostname: agent.cloudstack.tld
  • username: root
  • password: password

If this all succeeded, the newly created Zone can get enabled. After a while, there should be two system VMs listed in the Infrastructure. It is possible to log in on these system VMs and check if all is working. To do so, log in over SSH on the hypervisor and connect to the VMs through libvirt:


# virsh list
Id Name State
----------------------------------------------------
1 s-1-VM running
2 v-2-VM running

# virsh console 1
Connected to domain s-1-VM
Escape character is ^]

Debian GNU/Linux 7 s-1-VM ttyS0

s-1-VM login: root
Password: password
...
root@s-1-VM:~#

Log out from the shell, and press CTRL+] to disconnect from the console.

To verify that this VM indeed runs with the QEMU+libgfapi integration, check the log file that libvirt writes and confirm that there is a -drive with a glusterfs+tcp:// URL in /var/log/libvirt/qemu/s-1-VM.log:


... /usr/libexec/qemu-kvm -name s-1-VM ... -drive file=gluster+tcp://storage.cloudstack.tld:24007/primary/d691ac19-4ec1-47c1-b765-55f804b78bec,...
by on November 26, 2013

GlusterFS Block Device Translator

Block device translator

Block device translator (BD xlator) is a new translator added to GlusterFS recently which provides block backend for GlusterFS. This replaces the existing bd_map translator in GlusterFS that provided similar but very limited functionality. GlusterFS expects the underlying brick to be formatted with a POSIX compatible file system. BD xlator changes that and allows for having bricks that are raw block devices like LVM which needn’t have any file systems on them. Hence with BD xlator, it becomes possible to build a GlusterFS volume comprising of bricks that are logical volumes (LV).

bd

BD xlator maps underlying LVs to files and hence the LVs appear as files to GlusterFS clients. Though BD volume externally appears very similar to the usual Posix volume, not all operations are supported or possible for the files on a BD volume. Only those operations that make sense for a block device are supported and the exact semantics are described in subsequent sections.

While Posix volume takes a file system directory as brick, BD volume needs a volume group (VG) as brick. In the usual use case of BD volume, a file created on BD volume will result in an LV being created in the brick VG. In addition to a VG, BD volume also needs a file system directory that should be specified at the volume creation time. This directory is necessary for supporting the notion of directories and directory hierarchy for the BD volume. Metadata about LVs (size, mapping info) is stored in this directory.

BD xlator was mainly developed to use block devices directly as VM images when GlusterFS is used as storage for KVM virtualization. Some of the salient points of BD xlator are

  • Since BD supports file level snapshots and clones by leveraging the snapshot and clone capabilities of LVM, it can be used to fully off-load snapshot and cloning operations from QEMU to the storage (GlusterFS) itself.
  • BD understands dm-thin LVs and hence can support files that are backed by thinly provisioned LVs. This capability of BD xlator translates to having thinly provisioned raw VM images.
  • BD enables thin LVs from a thin pool to be used from multiple nodes that have visibility to GlusterFS BD volume. Thus thin pool can be used as a VM image repository allowing access/visibility to it from multiple nodes.
  • BD supports true zerofill by using BLKZEROOUT ioctl on underlying block devices. Thus BD allows SCSI WRITESAME to be used on underlying block device if the device supports it.

Though BD xlator is primarily intended to be used with block devices, it does provide full Posix xlator compatibility for files that are created on BD volume but are not backed by or mapped to a block device. Such files which don’t have a block device mapping exist on the Posix directory that is specified during BD volume creation.

Availability

BD xlator developed by M. Mohan Kumar was committed into GlusterFS git in November 2013 and is expected to be part of upcoming GlusterFS-3.5 release.

Compiling BD translator

BD xlator needs lvm2 development library. –enable-bd-xlator option can be used with ./configure script to explicitly enable BD translator. The following snippet from the output of configure script shows that BD xlator is enabled for compilation.

GlusterFS configure summary
===================

Block Device xlator  : yes

Creating a BD volume

BD supports hosting of both linear LV and thin LV within the same volume. However I will be showing them separately in the following instructions. As noted above, the prerequisite for a BD volume is VG which I am creating here from a loop device, but it can be any other device too.

1. Creating BD volume with linear LV backend

- Create a loop device

[root@bharata ~]# dd if=/dev/zero of=bd-loop count=1024 bs=1M
[root@bharata ~]# losetup /dev/loop0 bd-loop

- Prepare a brick by creating a VG

[root@bharata ~]# pvcreate /dev/loop0
[root@bharata ~]# vgcreate bd-vg /dev/loop0

- Create the BD volume

Create a POSIX directory first
[root@bharata ~]# mkdir /bd-meta

It is recommended that this directory is created on an LV in the brick VG itself so that both data and metadata live together on the same device.

Create and mount the volume
[root@bharata ~]# gluster volume create bd bharata:/bd-meta?bd-vg force

The general syntax for specifying the brick is host:/posix-dir?volume-group-name where “?” is the separator.

[root@bharata ~]# gluster volume start bd
[root@bharata ~]# gluster volume info bd

Volume Name: bd
Type: Distribute
Volume ID: cb042d2a-f435-4669-b886-55f5927a4d7f
Status: Started
Xlator 1: BD
Capability 1: offload_copy
Capability 2: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: bharata:/bd-meta
Brick1 VG: bd-vg

[root@bharata ~]# mount -t glusterfs bharata:/bd /mnt

- Create a file that is backed by an LV

[root@bharata ~]# ls /mnt
[root@bharata ~]#

Since the volume is empty now, so is the underlying VG.
[root@bharata ~]# lvdisplay bd-vg
[root@bharata ~]#

Creating a file that is mapped to an LV is a 2 step operation. First the file should be created on the mount point and a specific extended attribute should be set to map the file to LV.

[root@bharata ~]# touch /mnt/lv
[root@bharata ~]# setfattr -n “user.glusterfs.bd” -v “lv” /mnt/lv

Now an LV got created in the VG brick and the file /mnt/lv maps to this LV. Any read/write to this file ends up as read/write to the underlying LV.
[root@bharata ~]# lvdisplay bd-vg
— Logical volume —
LV Path                          /dev/bd-vg/6ff0f25f-2776-4d19-adfb-df1a3cab8287
LV Name                        6ff0f25f-2776-4d19-adfb-df1a3cab8287
VG Name                       bd-vg
LV UUID                         PjMPcc-RkD5-RADz-6ixG-UYsk-oclz-vL0nv6
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-26 16:15:45 +0530
LV Status                      available
# open                          0
LV Size                    4.00 MiB
Current LE                   1
Segments                     1
Allocation                     inherit
Read ahead sectors    0
Block device                253:6

The file gets created with default LV size which is 1 LE which is 4MB in this case.
[root@bharata ~]# ls -lh /mnt/lv
-rw-r–r–. 1 root root 4.0M Nov 26 16:15 /mnt/lv

truncate can be used to set the required file size.
[root@bharata ~]# truncate /mnt/lv -s 256M
[root@bharata ~]# lvdisplay bd-vg
— Logical volume —
LV Path                          /dev/bd-vg/6ff0f25f-2776-4d19-adfb-df1a3cab8287
LV Name                        6ff0f25f-2776-4d19-adfb-df1a3cab8287
VG Name                       bd-vg
LV UUID                         PjMPcc-RkD5-RADz-6ixG-UYsk-oclz-vL0nv6
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-26 16:15:45 +0530
LV Status                       available
# open                           0
LV Size                     256.00 MiB
Current LE                   64
Segments                      1
Allocation                      inherit
Read ahead sectors     0
Block device                 253:6

[root@bharata ~]# ls -lh /mnt/lv
-rw-r–r–. 1 root root 256M Nov 26 16:15 /mnt/lv

The size of the file/LV can be specified during creation/mapping time itself like this:
setfattr -n “user.glusterfs.bd” -v “lv:256MB” /mnt/lv

2. Creating BD volume with thin LV backend

- Create a loop device

[root@bharata ~]# dd if=/dev/zero of=bd-loop-thin count=1024 bs=1M
[root@bharata ~]# losetup /dev/loop0 bd-loop-thin

- Prepare a brick by creating a VG and thin pool

[root@bharata ~]# pvcreate /dev/loop0
[root@bharata ~]# vgcreate bd-vg-thin /dev/loop0

Create a thin pool
[root@bharata ~]# lvcreate –thin bd-vg-thin -L 1000M
Rounding up size to full physical extent 4.00 MiB
Logical volume “lvol0″ created

lvdisplay shows the thin pool
[root@bharata ~]# lvdisplay bd-vg-thin
— Logical volume —
LV Name                       lvol0
VG Name                      bd-vg-thin
LV UUID                        HVa3EM-IVMS-QG2g-oqU6-1UxC-RgqS-g8zhVn
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-26 16:39:06 +0530
LV Pool transaction ID  0
LV Pool metadata          lvol0_tmeta
LV Pool data                  lvol0_tdata
LV Pool chunk size       64.00 KiB
LV Zero new blocks     yes
LV Status                      available
# open                          0
LV Size                          1000.00 MiB
Allocated pool data     0.00%
Allocated metadata     0.88%
Current LE                   250
Segments                     1
Allocation                     inherit
Read ahead sectors     auto
– currently set to         256
Block device                253:9

- Create the BD volume

Create a POSIX directory first
[root@bharata ~]# mkdir /bd-meta-thin

Create and mount the volume
[root@bharata ~]# gluster volume create bd-thin bharata:/bd-meta-thin?bd-vg-thin force
[root@bharata ~]# gluster volume start bd-thin
[root@bharata ~]# gluster volume info bd-thin

Volume Name: bd-thin
Type: Distribute
Volume ID: 27aa7eb0-4ffa-497e-b639-7cbda0128793
Status: Started
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: bharata:/bd-meta-thin
Brick1 VG: bd-vg-thin
[root@bharata ~]# mount -t glusterfs bharata:/bd-thin /mnt

- Create a file that is backed by a thin LV

[root@bharata ~]# ls /mnt
[root@bharata ~]#

Creating a file that is mapped to a thin LV is a 2 step operation. First the file should be created on the mount point and a specific extended attribute should be set to map the file to a thin LV.

[root@bharata ~]# touch /mnt/thin-lv
[root@bharata ~]# setfattr -n “user.glusterfs.bd” -v “thin:256MB” /mnt/thin-lv

Now /mnt/thin-lv is a thin provisioned file that is backed by a thin LV.
[root@bharata ~]# lvdisplay bd-vg-thin
— Logical volume —
LV Name                        lvol0
VG Name                       bd-vg-thin
LV UUID                         HVa3EM-IVMS-QG2g-oqU6-1UxC-RgqS-g8zhVn
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-26 16:39:06 +0530
LV Pool transaction ID 1
LV Pool metadata         lvol0_tmeta
LV Pool data                 lvol0_tdata
LV Pool chunk size       64.00 KiB
LV Zero new blocks     yes
LV Status                      available
# open                         0
LV Size                         1000.00 MiB
Allocated pool data    0.00%
Allocated metadata    0.98%
Current LE                  250
Segments                    1
Allocation                    inherit
Read ahead sectors   auto
– currently set to        256
Block device               253:9

— Logical volume —
  LV Path                     /dev/bd-vg-thin/081b01d1-1436-4306-9baf-41c7bf5a2c73
LV Name                        081b01d1-1436-4306-9baf-41c7bf5a2c73
VG Name                       bd-vg-thin
LV UUID                         coxpTY-2UZl-9293-8H2X-eAZn-wSp6-csZIeB
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-26 16:43:19 +0530
LV Pool name                 lvol0
LV Status                       available
# open                           0
  LV Size                     256.00 MiB
Mapped size                  0.00%
Current LE                    64
Segments                      1
Allocation                      inherit
Read ahead sectors     auto
– currently set to          256
Block device                 253:10

As can be seen from above, creation of a file resulted in creation of a thin LV in the brick.

Snapshots and clones

BD xlator uses LVM snapshot and clone capabilities to provide file level snapshots and clones for files on GlusterFS volume. Snapshots and clones work only for those files that have been already mapped to an LV. In other words, snapshots and clones aren’t for Posix-only file that exist on BD volume.

Creating a snapshot

Say we are interested in taking snapshot of a file /mnt/file that already exists and has been mapped to an LV.

[root@bharata ~]# ls -l /mnt/file
-rw-r–r–. 1 root root 268435456 Nov 27 10:16 /mnt/file

[root@bharata ~]# lvdisplay bd-vg
— Logical volume —
LV Path                        /dev/bd-vg/abf93bbd-2c78-4612-8822-c4e0a40c4626
LV Name                      abf93bbd-2c78-4612-8822-c4e0a40c4626
VG Name                      bd-vg
LV UUID                        HwSRTL-UdPH-MMz7-rg7U-pU4a-yS4O-59bDfY
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-27 10:16:54 +0530
LV Status                      available
# open                          0
LV Size                          256.00 MiB
Current LE                   64
Segments                     1
Allocation                     inherit
Read ahead sectors    0
Block device                253:6

Snapshot creation is a two step process.

- Create a snapshot destination file first
[root@bharata ~]# touch /mnt/file-snap

- Then take the actual snapshot
In order to create the actual snapshot, we need to know the GFID of the snapshot file.

[root@bharata ~]# getfattr -n glusterfs.gfid.string  /mnt/file-snap
getfattr: Removing leading ‘/’ from absolute path names
# file: mnt/file-snap
glusterfs.gfid.string=”bdf74e38-dc96-4b26-94e2-065fe3b8bcc3″

Use this GFID string to create the actual snapshot
[root@bharata ~]# setfattr -n snapshot -v bdf74e38-dc96-4b26-94e2-065fe3b8bcc3 /mnt/file
[root@bharata ~]# lvdisplay bd-vg
— Logical volume —
LV Path                         /dev/bd-vg/abf93bbd-2c78-4612-8822-c4e0a40c4626
LV Name                       abf93bbd-2c78-4612-8822-c4e0a40c4626
VG Name                      bd-vg
LV UUID                        HwSRTL-UdPH-MMz7-rg7U-pU4a-yS4O-59bDfY
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-27 10:16:54 +0530
LV snapshot status   source of bdf74e38-dc96-4b26-94e2-065fe3b8bcc3 [active]
LV Status                       available
# open                           0
LV Size                           256.00 MiB
Current LE                   64
Segments                      1
Allocation                      inherit
Read ahead sectors     0
Block device                 253:6

— Logical volume —
LV Path                        /dev/bd-vg/bdf74e38-dc96-4b26-94e2-065fe3b8bcc3
LV Name                      bdf74e38-dc96-4b26-94e2-065fe3b8bcc3
VG Name                      bd-vg
LV UUID                        9XH6xX-Sl64-uNhk-7OiH-f91m-DaMo-6AWiBD
LV Write Access            read/write
LV Creation host, time bharata, 2013-11-27 10:20:35 +0530
LV snapshot status  active destination for abf93bbd-2c78-4612-8822-c4e0a40c4626
LV Status                      available
# open                          0
LV Size                          256.00 MiB
Current LE                   64
COW-table size             4.00 MiB
COW-table LE               1
Allocated to snapshot  0.00%
Snapshot chunk size    4.00 KiB
Segments                      1
Allocation                      inherit
Read ahead sectors     auto
– currently set to          256
Block device                 253:7

As can be seen from the lvdisplay output, /mnt/file-snap now is the snapshot of /mnt/file.

Creating a clone

Creating a clone is similar to creating a snapshot except that “clone” attribute name should be used instead of “snapshot”.

setfattr -n clone -v <gfid-of-clone-file> <path-to-source-file>

Clone in BD volume is essentially a server off-loaded full copy of the file.

Concerns

As you have seen, creation of block device backed file on BD volume, creation of snapshots and clones involve non-standard steps including setting of extended attributes. These steps could be cumbersome for an end user and there are plans to encapsulate all these into nice APIs that users could use easily.


by on October 17, 2013

Red Hat Related Talks at LinuxCon + CloudOpen Europe

LinuxCon and CloudOpen Europe are just a few days away, and the line-up for talks looks really good. If you’re putting together your schedule, we have a couple of suggestions for talks that you’d probably find interesting.

The full schedule is on Sched.org, which makes it really easy to keep track of the talks you don’t want to miss. Also, don’t miss the Gluster Workshop on Thursday.

Monday, October 21

Tuesday, October 22

Wednesday, October 23

by on September 16, 2013

oVirt 3.3, Glusterized

The All-in-One install I detailed in Up and Running with oVirt 3.3 includes everything you need to run virtual machines and get a feel for what oVirt can do, but the downside of the local storage domain type is that it limits you to that single All in One (AIO) node.

You can shift your AIO install to a shared storage configuration to invite additional nodes to the party, and oVirt has supported the usual shared storage suspects such as NFS and iSCSI since the beginning.

New in oVirt 3.3, however, is a storage domain type for GlusterFS that takes advantage of Gluster’s new libgfapi feature to boost performance compared to FUSE or NFS-based methods of accessing Gluster storage with oVirt.

With a GlusterFS data center in oVirt, you can distribute your storage resources right alongside your compute resources. As a new feature, GlusterFS domain support is rougher around the edges than more established parts of oVirt, but once you get it up and running, it’s worth the trouble.

In oVirt, each host can be part of only one data center at a time. Before we decommission our local storage domain, we have to shut down any VMs running on our host, and, we’re interested in moving them to our new Gluster storage domain, we need to ferry those machines over to our export domain.

GlusterFS Domain & RHEL/CentOS:

The new, libgfapi-based GlusterFS storage type has a couple of software prerequisites that aren’t currently available for RHEL/CentOS — the feature requires qemu 1.3 or better and libvirt 1.0.1 or better. Earlier versions of those components don’t know about the GlusterFS block device support, so while you’ll be able to configure a GlusterFS domain on one of those distros today, any attempts to launch VMs will fail.

Versions of qemu and libvirt with the needed functionality backported are in the works, and should be available soon, but for now, you’ll need Fedora 19 to use the GlusterFS domain type. For RHEL or CentOS hosts, you can still use Gluster-based storage, but you’ll need to do so with the POSIXFS storage type.

The setup procedures are very similar, so I’ll include the POSIXFS instructions below as well in case you want to pursue that route in the meantime. Once the updated packages become available, I’ll modify this howto accordingly.

SELinux, Permissive

Currently, the GlusterFS storage scenario described in this howto requires that SELinux be put in permissive mode. You can put selinux in permissive mode with the command:

sudo setenforce 0

To make the shift to permissive mode persist between reboots, edit “/etc/sysconfig/selinux” and change SELINUX=enforcing to SELINUX=permissive.

Glusterizing Your AIO Install in n Easy Steps

  1. Evacuate Your VMs

    Visit the “Virtual Machines” tab in your Administrator Portal, shut down any running VMs, and click “Export,” “OK” to copy them over to your export domain.

    While an export is in progress, there’ll be an hourglass icon next to your VM name. Once any VMs you wish to save have moved over, you can reclaim some space by right-clicking the VMs and hitting “Remove,” and then “OK.”

  2. Detach Your Domains

    Next, detach your ISO_DOMAIN from the local storage data center by visiting the “Storage” tab, clicking on the ISO_DOMAIN, visiting the “Data Center” tab in the bottom pane, clicking “local_datacenter,” then “Maintenance,” then “Detach,” and “OK” in the following dialog. Follow these same steps to detach your EXPORT_DOMAIN as well.

  3. Modify Your Data Center, Cluster & Host

    Now, click the “Data Centers” tab, select the “Default” data center, and click “Edit.” In the resulting dialog box, choose “GlusterFS” in the “Type” drop down menu and click “OK.”

    If you’re using RHEL/CentOS and taking the Gluster via POSIXFS storage route I referenced above, choose “POSIXFS” in the “Type” drop down menu instead.

    Next, click the “Clusters” tab, select the “Default” cluster, and click “Edit.” In the resulting dialog box, click the check box next to “Enable Gluster Service” and click “OK.”

    Then, visit the “Hosts” tab, select your “local_host” host, and click “Maintenance.” When the host is in maintenance mode, click “Edit,” select “Default” from the “Data Center” drop down menu, hit “OK,” and then “OK” again in the following dialog.

  4. Next, hit the command-line for a few tweaks that ought to be handled automatically, but aren’t (yet).

    Install the vdsm-gluster package, start gluster, and restart vdsm:

    sudo yum install vdsm-gluster

    Now, edit the file “/etc/glusterfs/glusterd.vol” [bz#] to add “option rpc-auth-allow-insecure on” to the list of options under “volume management.”

    As part of the virt store optimizations that oVirt applies to Gluster volumes, there’s a Gluster virt group in which oVirt places optimized volumes. The file that describes this group isn’t currently provided in a package, so we have to fetch it from Gluster’s source repository:

    sudo curl https://raw.github.com/gluster/glusterfs/master/extras/group-virt.example -o /var/lib/glusterd/groups/virt [bz#]

    Now, we’ll start the Gluster service and restart the vdsm service:

    sudo service glusterd start
    sudo service vdsmd restart
  5. Next, we’ll create a mount point for our Gluster brick and set its permissions appropriately. To keep this howto short, I’m going to use a regular directory on our test machine’s file system for the Gluster brick. In a production setup, you’d want your Gluster brick to live on a separate XFS partition.
    sudo mkdir /var/lib/exports/data
    chown 36:36 /var/lib/exports/data [bz#]
  6. Now, we’re ready to re-activate our host, and use it to create the Gluster volume we’ll be using for VM storage. Return to the Administrator Portal, visit the “Hosts” tab, and click “Activate.”

    Then, visit the “Volumes” tab, click “Create Volume,” and give your new volume a name. I’m calling mine “data.” Check the “Optimize for Virt Store” check box, and click the “Add Bricks” button.

    In the resulting dialog box, populate “Brick Directory” with the path we created earlier, “/var/lib/exports/data” and click “Add” to add it to the bricks list. Then, click “OK” to exit the dialog, and “OK” again to return to the “Volumes” tab.

  7. Before we start up our new volume, we need to head back to the command line to apply the “server.allow-insecure” option we added earlier to our volume:
    sudo gluster volume set data server.allow-insecure on
  8. Now, back to the Administrator Portal to start our volume and create a new data domain. Visit the “Volumes” tab, select your newly-created volume, and click “Start.”

    Then, visit the “Storage” tab, hit “New Domain,” give your domain a name, and populate the “Path” field with your machine’s hostname colon volume name:

    mylittlepony.lab:data

    If you’re using RHEL/CentOS and taking the Gluster via POSIXFS storage route I referenced above, you need to populate the “Path” field with with your machine’s hostname colon slash volume name instead. Again, this is only if you’re taking the POSIXFS route. With the GlusterFS storage type, that pesky slash [BZwon’t prevent the domain from being created, but it’ll cause VM startup mysteriously to fail! Also, in the “VFS Type” field, you’ll need to enter “glusterfs”

    Click “OK” and wait a few moments for the new storage domain to initialize. Next, click on your detached export domain, choose the “Data Center” tab in the bottom pane, click “Attach,” select “Default” data center, and click “OK.” Perform the same steps with your iso domain.

  9. All Right. You’re back up and running, this time with a GlusterFS Storage Domain. If you ferried any of the VMs you created on the original local storage domain out to your export domain, you can now ferry them back:

    Visit the “Storage” tab, select your export domain, click “VM Import” in the lower pane, select the VM you wish to import, and click “Import.” Click “OK” on the dialog that appears next. If you didn’t remove the VM you’re importing from your local storage domain earlier, you may have to “Import as cloned.”

Next Steps

From here, you can experiment with different types of Gluster volumes for your data domains. For instance, if, after adding a second host to your data center, you want to replicate storage between the two hosts, you’d create a storage brick on both of your hosts, choose the replicated volume type when creating your Gluster volume, create a data domain backed by that volume, and start storing your VMs there.

You can also disable the NFS ISO and Export shares hosted from your AIO machine and re-create them on new Gluster volumes, accessed via Gluster’s built-in NFS server. If you do, make sure to disable your system’s own NFS service, as kernel NFS and Gluster NFS conflict with each other.

by on

oVirt 3.3 Spices Up the Software Defined Datacenter with OpenStack and Gluster Integration

The oVirt 3.3 release may not quite let you manage all the things in the data center, but it’s getting awfully close. Just shy of six months after the oVirt 3.2 release, the team has delivered an update with groundbreaking integration with OpenStack components, GlusterFS, and a number of ways to custom tailor oVirt to your data center’s needs.

What is oVirt?

oVirt is an entirely open source approach to the software defined datacenter. oVirt builds on the industry-standard open source hypervisor, KVM, and delivers a platform that can scale from one system to hundreds of nodes running thousands of instances.

The oVirt project comprises two main components:

  • oVirt Node: A minimal Linux install that includes the KVM hypervisor and is tuned for running massive workloads.
  • oVirt Engine: A full-featured, centralized management portal for managing oVirt Nodes. oVirt Engine gives admins, developers, and users the tools needed to orchestrate their virtual machines across many oVirt Nodes.

See the oVirt Feature Guide for a comprehensive list of oVirt’s features.

What’s New in 3.3?

In just under six months of development, the oVirt team has made some impressive improvements and additions to the platform.

Integration with OpenStack Components

Evaluating or deploying OpenStack in your datacenter? The oVirt team has added integration with Glance and Neutron in 3.3 to enable sharing components between oVirt and OpenStack.

By integrating with Glance, OpenStack’s service for managing disk and server images and snapshots, you’ll be able to leverage your KVM-based disk images between oVirt and OpenStack.

OpenStack Neutron integration allows oVirt to use Neutron as an external network provider. This means you can tap Neutron from oVirt to provide networking capabilities (such as network discovery, provisioning, security groups, etc.) for your oVirt-managed VMs.

oVirt 3.3 also provides integration with Cloud-Init, so oVirt can simplify provisioning of virtual machines with SSH keys, user data, timezone information, and much more.

Gluster Improvements

With the 3.3 release, oVirt gains support for using GlusterFS as a storage domain. This means oVirt can take full advantage of Gluster’s integration with Qemu, providing a performance boost over the previous method of using Gluster’s POSIX exports. Using the native QEMU-GlusterFS integration allows oVirt to bypass the FUSE overhead and access images stored in Gluster as a network block device.

The latest oVirt release also allows admins to use oVirt to manage their Gluster clusters, and oVirt will recognize changes made via Gluster’s command line tools. In short, oVirt has gained tight integration with network-distributed storage, and Gluster users have easy management of their domains with a simple user interface.

Extending oVirt

Out of the proverbial box, oVirt is already a fantastic platform for managing your virtualized data center. However, oVirt can be extended to fit your computing needs precisely.

  • External Tasks give external applications the ability to inject tasks to the oVirt engine via a REST API, and track changes in the oVirt UI.
  • Custom Device Properties allow you to specify custom properties for virtual devices, such as vNICs, when devices may need non-standard settings.
  • Java-SDK is a full SDK for interacting with the oVirt API from external applications.

Getting oVirt 3.3

Ready to take oVirt for a test drive? Head over to the oVirt download page and check out Jason Brooks’ Getting Started with oVirt 3.3 Guide. Have questions? You can find us on IRC or subscribe to the users mailing list to get help from others using oVirt.

by on August 20, 2013

Troubleshooting QEMU-GlusterFS

As described in my previous blog post, QEMU supports talking to GlusterFS using libgfapi which is a much better way to use GlusterFS to host VM images than using the FUSE mount to access GlusterFS volumes. However due to some bugs that exist in GlusterFS-3.4, any invalid specification of GlusterFS drive on QEMU command line can result in completely non-obvious error messages from QEMU. The aim of this blog post is to document some of these known error scenarios in QEMU-GlusterFS so that users can figure out what could be potentially wrong in their QEMU-GlusterFS setup.

As of this writing (Aug 2013), I know about two bugs in GlusterFS that cause all this confusion in the failure messages:

  • A bug that results in GlusterFS log messages not reaching stderr because of which no meaningful failure messages are shown from QEMU. This bug has been fixed, and it should eventually make it to some release (probably 3.4.1) of GlusterFS.
  • A bug in a libgfapi routine called glfs_init() that results in returning failure with errno set to 0. QEMU depends on errno here and this leads to unpleasant segmentation faults in QEMU.

Environment

I am using Fedora 19 with distro-provided QEMU and GlusterFS rpms for all the experiments here.

# rpm -qa | grep qemu
qemu-system-x86-1.4.2-5.fc19.x86_64
ipxe-roms-qemu-20130517-2.gitc4bce43.fc19.noarch
qemu-kvm-1.4.2-5.fc19.x86_64
qemu-guest-agent-1.4.2-3.fc19.x86_64
libvirt-daemon-driver-qemu-1.0.5.1-1.fc19.x86_64
qemu-img-1.4.2-5.fc19.x86_64
qemu-common-1.4.2-5.fc19.x86_64

# rpm -qa | grep gluster
glusterfs-3.4.0-8.fc19.x86_64
glusterfs-libs-3.4.0-8.fc19.x86_64
glusterfs-fuse-3.4.0-8.fc19.x86_64
glusterfs-server-3.4.0-8.fc19.x86_64
glusterfs-api-3.4.0-8.fc19.x86_64
glusterfs-debuginfo-3.4.0-8.fc19.x86_64
glusterfs-cli-3.4.0-8.fc19.x86_64

Error scenarios

1. glusterd service not started

On Fedora 19, glusterd service fails to start during boot and if you don’t notice it, you can end up using QEMU with GlusterFS when glusterd service isn’t running. In this scenario, you will encounter the following kind of failure:

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

2. GlusterFS volume not started

If QEMU is used to boot a VM image on GlusterFS volume which is not in started state, the following kind of error is seen:

[root@localhost ~]# gluster volume status test
Volume test is not started
[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

3. QEMU run as non-root user

As of this writing (Aug 2013), GlusterFS doesn’t support non-root users to access GlusterFS volumes via libgfapi without some manual settings. This is a very typical scenario for most users who don’t use QEMU directly but use it via libvirt and oVirt. In this scenario, the following kind of error is seen:

[bharata@localhost ~]$ qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

As you can see, 3 different failure scenarios produced similar error messages. This will improve once error logs from GlusterFS start reaching QEMU when GlusterFS with the fix is used.

4. Specifying non-existing GlusterFS volume or server

Specifying invalid or wrong server or volume names can cause nasty failures like this:

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test-xxx/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test-xxx/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test-xxx image=F17-qcow2 transport=tcp
Segmentation fault (core dumped)

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster-xxx/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster-xxx/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster-xxx port=0 volume=test image=F17-qcow2 transport=tcp
Segmentation fault (core dumped)

This should get fixed when the bug in libgfapi is resolved.