Mixed storage in a cluster

Hi, I was wondering if the following configuration is possible with OpenNebula (running 5.10.1).

Current Configuration:

Storage:

  • Silverish: 8x SSD ZFS Pool on SmartOS, exposed via NFS over 10Gbps
  • Peacock: 8x HDD + 2x SSD Log Pool on ZoL, exposed via NFS over 10Gbps

Hypervisors:

  • Gosling: 2x SSD Mirrored Pool on SmartOS, with access to storage and internal 10Gbps networks
  • node1-node5: No local storage. Diskless PXE Boot to Debian 9.11, OpenNebula KVM Hypervisors, mounting IMAGE_DS and SYSTEM_DS with the “shared” driver from Silverish and Peacock via 10Gbps

Services:

  • oned: VM running on Gosling and hosts the OpenNebula oned
  • sunstone: VM running on Gosling and hosts the OpenNebula sunstone
  • SQL/Memcached: VMs running on Gosling

All nodes have eth0-3, where eth0 is the 1Gbps services network, eth1 is the 1Gbps external interface to upstream, eth2 is the 10Gbps storage network, and eth3 is the 10Gbps internal network, where live migration, VXLAN, etc happens

All hosts are running Debian 9.11.


Propose changes:

I would like to add node6, but it only has eth0-1, with no 10Gbps connectivity to the existing cluster (so no access to Silverish or Peacock). However, the node will have local storage available.

Before I make a Diskless PXE Debian with ZFS Root initrd, I would like to know if OpenNebula can support a mixed use case, where node1-5 are running “shared”, and node6 will be running “ssh,” where IMAGE_DS will not be available to node6 and requiring oned instance to transfer the System Image via ssh over the 1Gbps admin network to the local ZFS Pool on node6.

Otherwise, do I need to create a new cluster with node6 and future hypervisors with local storage?

Thanks.

Hello, I think you shoudl create another cluster for nodes such node6 is. In that new cluster put node6 and new DS with ssh TM. Regarding to upstream network, you can put it to both clusters

Does OpenNebula allow you to have the same virtual network in different clusters?

Hi,

Yes. A virtual network is allowed by default on the default cluster (id=0). than you can also enable this VNet to the different clusters you may have.

On the virtual network page (on sunstone) you have the “Clusters” button to select clusters where you want to enable your network.

hmm okay. I will look into it. Thanks!

I’m getting this error:

[Z0][ReM][E]: Req:3440 UID:0 one.vm.deploy result FAILURE [one.vm.deploy] Image Datastore does not support transfer mode: ssh

Configuration:

node6:

oneadmin@oned:~$ onehost show 5
HOST 5 INFORMATION                                                              
ID                    : 5                   
NAME                  : rigel               
CLUSTER               : fmt01.01.15.DAS     
STATE                 : MONITORED           
IM_MAD                : kvm                 
VM_MAD                : kvm                 
LAST MONITORING TIME  : 12/31 07:50:49      

HOST SHARES                                                                     
RUNNING VMS           : 0                   
MEMORY                                                                          
  TOTAL               : 47.2G               
  TOTAL +/- RESERVED  : 47.2G               
  USED (REAL)         : 449.2M              
  USED (ALLOCATED)    : 0K                  
CPU                                                                             
  TOTAL               : 2400                
  TOTAL +/- RESERVED  : 2400                
  USED (REAL)         : 0                   
  USED (ALLOCATED)    : 0                   

LOCAL SYSTEM DATASTORE #106 CAPACITY                                            
TOTAL:                : 3.5T                
USED:                 : 1M                  
FREE:                 : 3.5T                

MONITORING INFORMATION                                                          
ARCH="x86_64"
CLUSTER_ID="101"
CPUSPEED="2268"
HOSTNAME="rigel"
HYPERVISOR="kvm"
IM_MAD="kvm"
KVM_CPU_MODEL="Westmere"
KVM_CPU_MODELS="486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5"
KVM_MACHINES="pc-i440fx-2.8 pc pc-0.12 pc-i440fx-2.4 pc-1.3 pc-q35-2.7 pc-q35-2.6 xenpv pc-i440fx-1.7 pc-i440fx-1.6 pc-i440fx-2.7 pc-0.11 pc-i440fx-2.3 pc-0.10 pc-1.2 pc-i440fx-2.2 isapc pc-q35-2.5 xenfv pc-0.15 pc-0.14 pc-i440fx-1.5 pc-i440fx-2.6 pc-i440fx-1.4 pc-i440fx-2.5 pc-1.1 pc-i440fx-2.1 pc-q35-2.8 q35 pc-1.0 pc-i440fx-2.0 pc-q35-2.4 pc-0.13"
MODELNAME="Intel(R) Xeon(R) CPU           L5640  @ 2.27GHz"
NAME="rigel"
NETRX="366357117"
NETTX="1339344"
PIN_POLICY="NONE"
RESERVED_CPU=""
RESERVED_MEM=""
VERSION="5.10.1"
VM_MAD="kvm"

NUMA NODES

  ID CORES              USED FREE
   0 -- -- -- -- -- --  0    12
   1 -- -- -- -- -- --  0    12

NUMA MEMORY

 NODE_ID TOTAL    USED_REAL            USED_ALLOCATED       FREE    
       0 23.6G    773M                 0K                   22.8G
       1 23.6G    259.9M               0K                   23.4G

Other KVM Hypervisors with shared storage:

oneadmin@oned:~$ onehost show 0
HOST 0 INFORMATION                                                              
ID                    : 0                   
NAME                  : nodding             
CLUSTER               : fmt01.01.15.SAN     
STATE                 : MONITORED           
IM_MAD                : kvm                 
VM_MAD                : kvm                 
LAST MONITORING TIME  : 12/31 07:53:45      

HOST SHARES                                                                     
RUNNING VMS           : 12                  
MEMORY                                                                          
  TOTAL               : 62.8G               
  TOTAL +/- RESERVED  : 62.8G               
  USED (REAL)         : 20G                 
  USED (ALLOCATED)    : 37.5G               
CPU                                                                             
  TOTAL               : 1600                
  TOTAL +/- RESERVED  : 1600                
  USED (REAL)         : 32                  
  USED (ALLOCATED)    : 2000                

MONITORING INFORMATION                                                          
ARCH="x86_64"
CPUSPEED="2518"
HOSTNAME="nodding"
HYPERVISOR="kvm"
IM_MAD="kvm"
KVM_CPU_MODEL="Broadwell-IBRS"
KVM_CPU_MODELS="486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5"
KVM_MACHINES="pc-i440fx-2.8 pc pc-0.12 pc-i440fx-2.4 pc-1.3 pc-q35-2.7 pc-q35-2.6 xenpv pc-i440fx-1.7 pc-i440fx-1.6 pc-i440fx-2.7 pc-0.11 pc-i440fx-2.3 pc-0.10 pc-1.2 pc-i440fx-2.2 isapc pc-q35-2.5 xenfv pc-0.15 pc-0.14 pc-i440fx-1.5 pc-i440fx-2.6 pc-i440fx-1.4 pc-i440fx-2.5 pc-1.1 pc-i440fx-2.1 pc-q35-2.8 q35 pc-1.0 pc-i440fx-2.0 pc-q35-2.4 pc-0.13"
MODELNAME="Intel(R) Xeon(R) CPU D-1540 @ 2.00GHz"
NETRX="48424776086"
NETTX="63540492241"
RESERVED_CPU=""
RESERVED_MEM=""
VERSION="5.10.1"
VM_MAD="kvm"

NUMA NODES

  ID CORES                    USED FREE
   0 -- -- -- -- -- -- -- --  0    16

NUMA MEMORY

 NODE_ID TOTAL    USED_REAL            USED_ALLOCATED       FREE    
       0 62.8G    20.5G                0K                   42.3G

Shared Image Datastore:

oneadmin@oned:~$ onedatastore show 100
DATASTORE 100 INFORMATION                                                       
ID             : 100                 
NAME           : Peacock_Image       
USER           : oneadmin              
GROUP          : oneadmin            
CLUSTERS       : 0,101               
TYPE           : IMAGE               
DS_MAD         : fs                  
TM_MAD         : shared              
BASE PATH      : /var/lib/one//datastores/100
DISK_TYPE      : FILE                
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 20.7T               
FREE:          : 20.7T               
USED:          : 981M                
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
CLONE_TARGET="SYSTEM"
DISK_TYPE="FILE"
DS_MAD="fs"
LN_TARGET="NONE"
TM_MAD="shared"
TYPE="IMAGE_DS"

Local storage datastore on node6:

oneadmin@oned:~$ onedatastore show 106
DATASTORE 106 INFORMATION                                                       
ID             : 106                 
NAME           : ZFS_DAS             
USER           : oneadmin              
GROUP          : oneadmin            
CLUSTERS       : 101                 
TYPE           : SYSTEM              
DS_MAD         : -                   
TM_MAD         : ssh                 
BASE PATH      : /var/lib/one//datastores/106
DISK_TYPE      : FILE                
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : -                   
FREE:          : -                   
USED:          : -                   
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
DISK_TYPE="FILE"
DS_MIGRATE="YES"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
SHARED="NO"
TM_MAD="ssh"
TYPE="SYSTEM_DS"

Question:

Should I change the TM_MAD for the IMAGE_DS to ssh? Will this affect my current cluster with shared IMAGE_DS and SYSTEM_DS?

I appreciate the advice!

hmmm I don’t understand.

==> /var/log/one/oned.log <==
Tue Dec 31 09:11:23 2019 [Z0][ReM][D]: Req:7680 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 439, 5, false, 106, ""
Tue Dec 31 09:11:23 2019 [Z0][DiM][D]: Deploying VM 439
Tue Dec 31 09:11:23 2019 [Z0][ReM][D]: Req:7680 UID:0 one.vm.deploy result SUCCESS, 439

==> /var/log/one/sched.log <==
Tue Dec 31 09:11:23 2019 [Z0][SCHED][D]: Dispatching VMs to hosts:
        VMID    Priority        Host    System DS
        --------------------------------------------------------------
        439     0               5       106


==> /var/log/one/oned.log <==
Tue Dec 31 09:11:24 2019 [Z0][ReM][D]: Req:2416 UID:0 IP:10.50.3.253 one.vm.info invoked , 439, false
Tue Dec 31 09:11:24 2019 [Z0][ReM][D]: Req:2416 UID:0 one.vm.info result SUCCESS, "<VM><ID>439</ID><UID..."
Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 Command execution failed (exit code: 1): /var/lib/one/remotes/tm/shared/clone oned:/var/lib/one//datastores/100/f5c54ed442e2d4712a07eefb765a1a81 rigel:/var/lib/one//datastores/106/439/disk.0 439 100

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 clone: Cloning /var/lib/one/datastores/100/f5c54ed442e2d4712a07eefb765a1a81 in rigel:/var/lib/one//datastores/106/439/disk.0

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG E 439 clone: Command "    set -e -o pipefail

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 cd /var/lib/one/datastores/106/439

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 rm -f /var/lib/one/datastores/106/439/disk.0

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 cp /var/lib/one/datastores/100/f5c54ed442e2d4712a07eefb765a1a81 /var/lib/one/datastores/106/439/disk.0

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG I 439 qemu-img resize /var/lib/one/datastores/106/439/disk.0 10240M" failed: cp: cannot stat '/var/lib/one/datastores/100/f5c54ed442e2d4712a07eefb765a1a81': No such file or directory

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: LOG E 439 Error copying oned:/var/lib/one//datastores/100/f5c54ed442e2d4712a07eefb765a1a81 to rigel:/var/lib/one//datastores/106/439/disk.0

Tue Dec 31 09:11:24 2019 [Z0][TM][D]: Message received: TRANSFER FAILURE 439 Error copying oned:/var/lib/one//datastores/100/f5c54ed442e2d4712a07eefb765a1a81 to rigel:/var/lib/one//datastores/106/439/disk.0

Even though the IMAGE_DS has these:

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
CLONE_TARGET="SYSTEM"
CLONE_TARGET_SSH="SYSTEM"
DISK_TYPE="FILE"
DISK_TYPE_SSH="FILE"
DS_MAD="fs"
LN_TARGET="NONE"
LN_TARGET_SSH="SYSTEM"
TM_MAD="shared"
TM_MAD_SYSTEM="ssh"
TYPE="IMAGE_DS"

Currently the way that I see it, the IMAGE_DS is shared, and the local SYSTEM_DS is ssh. It doesn’t seem like the drivers are intelligent enough to use tm/shared/clone when deploying to a shared SYSTEM_DS, and use tm/ssh/clone when deploying to a ssh SYSTEM_DS.

I’m really not sure about what I missed.

I reference the docs http://docs.opennebula.org/5.8/deployment/open_cloud_storage_setup/ceph_ds.html#ssh-mode, and here is what I tried:

The shared IMAGE_DS:

oneadmin@oned:~$ onedatastore show 100
DATASTORE 100 INFORMATION                                                       
ID             : 100                 
NAME           : Peacock_Image       
USER           : oneadmin              
GROUP          : oneadmin            
CLUSTERS       : 0,101               
TYPE           : IMAGE               
DS_MAD         : fs                  
TM_MAD         : shared              
BASE PATH      : /var/lib/one//datastores/100
DISK_TYPE      : FILE                
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 20.7T               
FREE:          : 20.7T               
USED:          : 981M                
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
CLONE_TARGET="SYSTEM"
CLONE_TARGET_SSH="SYSTEM"
DISK_TYPE="FILE"
DISK_TYPE_SSH="FILE"
DS_MAD="fs"
LN_TARGET="NONE"
LN_TARGET_SSH="SYSTEM"
TM_MAD="shared"
TM_MAD_SYSTEM="ssh"
TYPE="IMAGE_DS"

The ssh SYSTEM_DS:

oneadmin@oned:~$ onedatastore show 106
DATASTORE 106 INFORMATION                                                       
ID             : 106                 
NAME           : ZFS_DAS             
USER           : oneadmin              
GROUP          : oneadmin            
CLUSTERS       : 101               
TYPE           : SYSTEM              
DS_MAD         : -                   
TM_MAD         : ssh                 
BASE PATH      : /var/lib/one//datastores/106
DISK_TYPE      : FILE                
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : -                   
FREE:          : -                   
USED:          : -                   
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
DISK_TYPE="FILE"
DS_MIGRATE="YES"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
SHARED="NO"
TM_MAD="ssh"
TYPE="SYSTEM_DS"

node6 and ZFS_DAS are in cluster 101, and all the IMAGE_DS from cluster 0 are also in cluster 101.


VM Template:

User template
HYPERVISOR = "kvm"
INPUTS_ORDER = "SET_HOSTNAME"
LOGO = "images/logos/debian.png"
MEMORY_UNIT_COST = "MB"
SCHED_DS_REQUIREMENTS = "ID=\"106\""
SET_HOSTNAME = "test-das"
USER_INPUTS = [
  CPU = "M|list||0.5,1,2,4|1",
  MEMORY = "M|list||512,1024,2048,4096,8192,16384|1024",
  SET_HOSTNAME = "M|text|Hostname for the VM",
  VCPU = "O|list||1,2,4,8|2" ]
Template
AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0 | \"CLUSTERS/ID\" @> 101) & (TM_MAD = \"ssh\")"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0 | \"CLUSTERS/ID\" @> 101)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0 | CLUSTER_ID = 101) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)"
CONTEXT = [
  DISK_ID = "1",
  ETH0_CONTEXT_FORCE_IPV4 = "",
  ETH0_DNS = "208.67.220.220",
  ETH0_EXTERNAL = "",
  ETH0_GATEWAY = "10.0.165.1",
  ETH0_GATEWAY6 = "",
  ETH0_IP = "10.0.165.13",
  ETH0_IP6 = "",
  ETH0_IP6_PREFIX_LENGTH = "",
  ETH0_IP6_ULA = "",
  ETH0_MAC = "02:00:0a:00:a5:0d",
  ETH0_MASK = "255.255.255.0",
  ETH0_MTU = "1496",
  ETH0_NETWORK = "10.0.165.0",
  ETH0_SEARCH_DOMAIN = "",
  ETH0_VLAN_ID = "165",
  ETH0_VROUTER_IP = "",
  ETH0_VROUTER_IP6 = "",
  ETH0_VROUTER_MANAGEMENT = "",
  NETWORK = "YES",
  SET_HOSTNAME = "test-das",
  SSH_PUBLIC_KEY = "[redacted]",
  TARGET = "hda" ]
CPU = "1"
DISK = [
  ALLOW_ORPHANS = "NO",
  CLONE = "YES",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "0,101",
  DATASTORE = "Peacock_Image",
  DATASTORE_ID = "100",
  DEV_PREFIX = "vd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  IMAGE = "Debian 9",
  IMAGE_ID = "57",
  IMAGE_STATE = "2",
  LN_TARGET = "SYSTEM",
  ORIGINAL_SIZE = "2048",
  READONLY = "NO",
  SAVE = "NO",
  SIZE = "10240",
  SOURCE = "/var/lib/one//datastores/100/f5c54ed442e2d4712a07eefb765a1a81",
  TARGET = "vda",
  TM_MAD = "shared",
  TM_MAD_SYSTEM = "ssh",
  TYPE = "FILE" ]
FEATURES = [
  ACPI = "yes",
  APIC = "yes",
  GUEST_AGENT = "no",
  HYPERV = "yes",
  LOCALTIME = "no",
  PAE = "yes" ]
GRAPHICS = [
  LISTEN = "0.0.0.0",
  TYPE = "VNC" ]
INPUT = [
  BUS = "usb",
  TYPE = "tablet" ]
MEMORY = "1024"
NIC = [
  AR_ID = "0",
  BRIDGE = "protected0",
  BRIDGE_TYPE = "linux",
  CLUSTER_ID = "0,101",
  IP = "10.0.165.13",
  MAC = "02:00:0a:00:a5:0d",
  MODEL = "virtio",
  NAME = "NIC0",
  NETWORK = "Protected",
  NETWORK_ID = "17",
  NIC_ID = "0",
  PHYDEV = "eth1",
  SECURITY_GROUPS = "0",
  TARGET = "one-447-0",
  VLAN_ID = "165",
  VN_MAD = "802.1Q" ]
NIC_DEFAULT = [
  MODEL = "virtio" ]
OS = [
  ARCH = "x86_64",
  BOOT = "" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "OUTBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "INBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
TEMPLATE_ID = "24"
TM_MAD_SYSTEM = "ssh"
VCPU = "2"
VMID = "447"

VM Templates “Debian 9 - DAS” has TM_MAD_SYSTEM = “ssh”, and I selected only ZFS_DAS in the create VM screen. I’m still getting the error where the drive tries to use the shared driver instead of the ssh driver.

It looks like this PR addresses this cross type DS situation, but it doesn’t seem to have shared/clone.ssh support. Was this by design?

Sorry for pinging you @jan.orel. It seems that you were behind the PR. What possible side effects will arise by adding share/clone.ssh?

shared/clone.ssh is missing because it would be more or less the same like share/clone.

Anyway, in such cases, ONE always fall-backs to the action script without the suffix – shared/clone in this case.

The configuration looks good but it seems the driver can not access the image on the hypervisor. Even tough you use ssh TM you still need to have image datastore accessible on hosts. Could it be the case?

According to the documentation, it assumes that all hosts have access to the shared image store (which shared/clone makes sense). However in my configuration, this proposed new host doesn’t have access to the shared storage (and that is exactly that the point), which it relies on the frontend/oned instance to scp the image over (which is what I expect from something like shared/clone.ssh).

I hope that somewhat makes sense to you.