Opennebula & DFS like GlusterFS & Sheepdog & Ceph

nico_opennebula_org · February 16, 2015, 4:56pm

Hello,

we at ungleich.ch are testing Opennebula w/ Ceph, Gluster and Sheepdog backends. So far we have collected various results,
roughly leading to:

Very bad performance (<30 MiB/s write speed) and VM kernel panics on Ceph
Good to great performance with GlusterFS 3.4.2, 3.6.2 on Ubuntu 14.04. and 3.6.2 on CentOS 7: > 50 MiB/s in the VM
Bad performance / small amount of test data with Sheepdog (~11 MiB/s in the VM, but only a short test)
We mostly looked at the Sheepdog integration status - as of 2015-02-15 there seems to be some cleanup required until things are working smoothly.
We think that in theory Sheepdog would be the best fit for a VM cluster, as it is simple and designed for the single use of VM images
We were running Sheepdog in a qemu only cluster before with great performance

We are interested in your experiences with various filesystems & wanted to share our experiences here as well.

lorenzo_faleschini · February 16, 2015, 5:31pm

Try LizardFS http://lizardfs.com/

we are using it on our OpenNebula powered platform (NodeWeaver), you can just mount /var/lib/one on all nodes to the LizardFS root and you’re good to go, just as it was a NFS share, but way more scalable.

Very reliable and we achieved great performance with a bit of SSD chaching and some tuning.
check this results:

carlo_daffara · February 17, 2015, 7:53am

A great advantage of LizardFS is that you can use it with just little modifications from the shared filesystem, and host everything ONE-related in a single reliable datastore (the TM for moosefs and lizardfs is here: http://wiki.opennebula.org/ecosystem:moosefs )
On a two node (2 rotational devices+2 EnhanceIO SSD caches) we got 11K write IOPS, and we easily reach 90MB/sec within the VMs.
Another advantage is the copy-on-write snapshot capability, that greatly enhances what you can do with OpenNebula for thinly provisioned images, without performance problems.

michael_kutzner · February 17, 2015, 9:39am

Hello Carlo,
Short question. Did you do any evaluation of using other caching systems? (flashcache, bcache, dm-cache e.g.).
I am currently testing different setups - actually it is bcache and LSI cachecade, but bcache looks promising. Perhaps you made similar tests and can share your experiences.
Tried also gluster (which was from the performance point of view very good, especially on 10GB networks, but usability is not yet very nice).
And on the last ONE conf everybody was fine with CEPH, but that needs a more expensive footprint in terms of hardware.

Best, Michael

PerlStalker · February 17, 2015, 5:17pm

Sheepdog was horrendously unstable when I last used it. Sometimes, simple storage host reboots would destroy all data in the cluster.

I’ve been using ceph with KVM on Ubuntu for a few years now (since the argonaut release) and have had very few problems with it. I’ve only recently added Open Nebula to the mix but it dropped right in with no changes needed to my ceph config.

The biggest drawback is that ceph’s a network hog. Make sure you have lots and lots of bandwidth for it. If you don’t you might start seeing sufficient IO lag on your VMs to cause problems.

I’ve not used gluster so I can’t speak to it.

shank15217 · February 18, 2015, 12:07am

We are using bcache backed glusterfs bricks. Don’t expect miracles in benchmarks but there is a measurable increase in IOPS performance, but you have to consider that for KVM at least there are quite a few tunables that should be addressed first before looking at SSD caching as a performance enhancer.

shank15217 · February 18, 2015, 12:10am

I’m also not a fan of the qemu-glusterfs integration. It doesn’t feel complete yet, there is some work that needs to be done. Also keeping the shared filesystem layer separate from the hypervisor is easier for support. We are using glusterfs-fuse backed shared fs and it’s working great so far with qemu images.

Vadikgo · February 18, 2015, 1:45am

We are using sheepdog for non-productional cluster. It is much stable in version 0.9.1. Also you need to use qemu version 1.7 or higher, for autofailover support.

jasaavedra · February 18, 2015, 10:22am

We are using GlusterFS with shared storage and the performance is good. The bad point is that sometimes some node hangs and must be rebooted (and the documentation is quite poor IMHO).

christian_huening · March 4, 2015, 10:11am

Hey,

great thread! I am just looking into the same issue. So far i am looking forward to try GlusterFS first.
@nico_opennebula_org : Could you point out the storage hardware you use?
I am running on IBM PureFlexNodes and a storwize storage solution which is attached via 10 Gbit FCoE.

Cheers,
Christian

nico_opennebula_org · March 4, 2015, 2:33pm

Hey Christian,

we are using a very simple architecture: Every of our cluster consists of 2 nodes. They have two network cards, one connected to the public network, one connected to the other host.

The hosts are only using replicated mechanism and we build n of these gluster clusters.

Hardware wise they are mid range servers (16-128 GiB RAM, 8-32 cores, 1-12TB).

Cheers,

Nico

christian_huening · March 8, 2015, 4:28pm

I finally managed to set Gluster up in my setup. I use three IBM Pureflex nodes which are each attached via 10 Gbit FCoE uplink to their own volume managed by an IBM storwize v7000. These volumes are managed by GlusterFS Servers on each node and accessed by OpenNebula from each node respectively. The volume has a combined size of 6 TB and runs in standard distributed mode (that is: no replication here).

Results from first tests:

Overall Performance is pretty good (will conduct benchmarks later)
Deplyoing 40 VMs very quick (<4 minutes)

In 40 simultaneous deployments, 6 - 10 VMs fail to deploy and have to re-deployed

Does anybody have the failing deployment with GlusterFS as well?

shank15217 · March 10, 2015, 3:32am

What options are you using? We’ve had no issues with Glusterfs 3.4.3 (our current version) and we have tested massive loads (50-60 Gbps) across our distributed replicate clusters. The options on the vms and the storage matters. I would hold back on the bleeding edge (glusterfs 3.6.X) if possible. Are you sure your deployment issues are storage related?

christian_huening · March 10, 2015, 8:18am

Hi,

Thanks for your reply!
I am using bleeding edge gluster as it seems:

glusterfs.x86_64                      3.6.2-1.el7        @glusterfs-epel
glusterfs-api.x86_64                  3.6.2-1.el7       @glusterfs-epel
glusterfs-cli.x86_64                  3.6.2-1.el7        @glusterfs-epel
glusterfs-fuse.x86_64                 3.6.2-1.el7      @glusterfs-epel
glusterfs-libs.x86_64                 3.6.2-1.el7       @glusterfs-epel
glusterfs-server.x86_64               3.6.2-1.el7      @glusterfs-epel

And these are my volume infos, in cluding the options set from the ‘virt’ group, which you are advised to set when installing from the opennenbula documentation:

Volume Name: one
Type: Distribute
Volume ID: e64309f5-88d8-4d55-9272-16611acebe25
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: molokai:/data/gluster/brick
Brick2: lanai:/data/gluster/brick
Brick3: maui:/data/gluster/brick
Options Reconfigured:
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 9869
storage.owner-uid: 9869
server.allow-insecure: on

As for the VM options: What do you mean? I use the qcow2 model on the images, if the images are in qcow2 format. What other option could I set on the VMs regarding GlusterFS?

ruben · March 10, 2015, 10:32am

Hi Shankhadeep Shome,

I’m just wondering if we are exposing all the needed parameters for tuning VM performance. Are you currently relying on RAW? Could we benefit from exposing some of these parameters in Sunstone as advanced options?

Cheers

christian_huening · March 11, 2015, 11:45am

The exact error message is this:

Wed Mar 11 12:35:57 2015 [Z0][TM][I]: Command execution fail: 
/var/lib/one/remotes/tm/shared/clone 
molokai:/var/lib/one/datastores/114/3fbc702cdff9fdced57c7b95c33b2459 
lanai:/var/lib/one//datastores/120/169/disk.0 169 114
Wed Mar 11 
12:35:57 2015 [Z0][TM][I]: clone: Cloning 
/var/lib/one/datastores/114/3fbc702cdff9fdced57c7b95c33b2459 in 
lanai:/var/lib/one//datastores/120/169/disk.0
Wed
 Mar 11 12:35:57 2015 [Z0][TM][E]: clone: Command "cd 
/var/lib/one/datastores/120/169; cp 
/var/lib/one/datastores/114/3fbc702cdff9fdced57c7b95c33b2459 
/var/lib/one/datastores/120/169/disk.0" failed: Warning: Permanently 
added 'lanai,141.22.29.23' (ECDSA) to the list of known hosts.
Wed Mar 11 12:35:57 2015 [Z0][TM][I]: sh: line 3: cd: /var/lib/one/datastores/120/169: No such file or directory
Wed
 Mar 11 12:35:57 2015 [Z0][TM][I]: cp: cannot create regular file 
'/var/lib/one/datastores/120/169/disk.0': No such file or directory
Wed
 Mar 11 12:35:57 2015 [Z0][TM][E]: Error copying 
molokai:/var/lib/one/datastores/114/3fbc702cdff9fdced57c7b95c33b2459 to 
lanai:/var/lib/one//datastores/120/169/disk.0
Wed Mar 11 12:35:57 2015 [Z0][TM][I]: ExitCode: 1
Wed
 Mar 11 12:35:57 2015 [Z0][TM][E]: Error executing image transfer 
script: Error copying 
molokai:/var/lib/one/datastores/114/3fbc702cdff9fdced57c7b95c33b2459 to 
lanai:/var/lib/one//datastores/120/169/disk.0

So the problem results from a directory no created. I checked it, the directory really does not get created.

shank15217 · March 22, 2015, 10:10pm

These are the current storage/vm settings, I am just using qcow2 over gluster.fuse.

root@XXXXXXX:~# gluster volume info
 
Volume Name: PRODVMCLUSTERSTORE1
Type: Distributed-Replicate
Volume ID: 4cf1fbfd-caf7-44eb-a9fd-081dbd69a979
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: sgwa-glusterfs:/GLUSTERBRICK1/GLUSTERBRICK1
Brick2: sgwc-glusterfs:/GLUSTERBRICK1/GLUSTERBRICK1
Brick3: sgwb-glusterfs:/GLUSTERBRICK1/GLUSTERBRICK1
Brick4: sgwd-glusterfs:/GLUSTERBRICK1/GLUSTERBRICK1
Brick5: sgwa-glusterfs:/GLUSTERBRICK2/GLUSTERBRICK2
Brick6: sgwc-glusterfs:/GLUSTERBRICK2/GLUSTERBRICK2
Brick7: sgwb-glusterfs:/GLUSTERBRICK2/GLUSTERBRICK2
Brick8: sgwd-glusterfs:/GLUSTERBRICK2/GLUSTERBRICK2
Brick9: sgwa-glusterfs:/GLUSTERBRICK3/GLUSTERBRICK3
Brick10: sgwc-glusterfs:/GLUSTERBRICK3/GLUSTERBRICK3
Brick11: sgwb-glusterfs:/GLUSTERBRICK3/GLUSTERBRICK3
Brick12: sgwd-glusterfs:/GLUSTERBRICK3/GLUSTERBRICK3
Brick13: sgwa-glusterfs:/GLUSTERBRICK4/GLUSTERBRICK4
Brick14: sgwc-glusterfs:/GLUSTERBRICK4/GLUSTERBRICK4
Brick15: sgwb-glusterfs:/GLUSTERBRICK4/GLUSTERBRICK4
Brick16: sgwd-glusterfs:/GLUSTERBRICK4/GLUSTERBRICK4
Options Reconfigured:
server.allow-insecure: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 2500
storage.owner-gid: 2500
 
root@XXXXXX:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/SYSTEM-STORAGEOS /               ext4    errors=remount-ro 0       1
/dev/mapper/SYSTEM-SWAP none            swap    sw              0       0
/dev/FSSTORAGEVOL/GLUSTERBRICK1  /GLUSTERBRICK1  xfs     inode64         0       0
/dev/FSSTORAGEVOL/GLUSTERBRICK2  /GLUSTERBRICK2  xfs     inode64         0       0
/dev/FSSTORAGEVOL/GLUSTERBRICK3  /GLUSTERBRICK3  xfs     inode64         0       0
/dev/FSSTORAGEVOL/GLUSTERBRICK4  /GLUSTERBRICK4  xfs     inode64         0       0
 
root@XXXXXX:~# lvs | grep GLUSTER
  GLUSTERBRICK1 FSSTORAGEVOL -wi-ao---   4.00t
  GLUSTERBRICK2 FSSTORAGEVOL -wi-ao---   4.00t
  GLUSTERBRICK3 FSSTORAGEVOL -wi-ao---   4.00t
  GLUSTERBRICK4 FSSTORAGEVOL -wi-ao---   4.00t

Client Side Mount Settings

mount -t glusterfs -o backupvolfile-server=sgwd-glusterfs,log-level=WARNING,log-file=/var/log/gluster.log sgwa-glusterfs:/PRODVMCLUSTERSTORE1 /var/lib/one/datastores/100

VM Settings

<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/one/datastores/100/firs_storage_server2.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>

Topic		Replies	Views
Opennebula + ScaleIO Community Support	3	1584	July 15, 2016
Which OS to choose for ON? Discuss	5	944	December 10, 2017
Storage Migration Community Support	0	297	September 5, 2018
Questions of a newbie (KVM-Ceph cloud) Community Support	2	461	March 18, 2018
Opennebula HA automated script Discuss	5	1774	September 9, 2016

Opennebula & DFS like GlusterFS & Sheepdog & Ceph

Related Topics