How to deal with multiple locations?

I have 1 bunch of hosts in a local DC and a few hosts with a remote hosting company.

root@one-1:~# onecluster list
   ID NAME                      HOSTS VNETS DATASTORES
    0 default                       0     0          0
  100 FFGT TNG                      6     2          7
  101 UU GUT                        2     2          3
  102 UU FKS                        2     2          3
  103 FFGT Legacy                   0     2          3
  104 FKS-3                         1     1          2

FKS-3 is one host a the remote hoster; trying to instantiate a VM there I end up with: [TemplateInstantiate] Error allocating a new virtual machine template. Incompatible clusters in NIC. Network for NIC 0 is not in the same cluster as the one used by other VM elements (cluster 100)

Ok, yes, the Datastore is 103, which relates to cluster 100:

root@one-1:~# onetemplate show 84 | grep -B4 -A2 100
DESCRIPTION="Start a new Ubuntu LTS (16.04) VM on HV-local storage"
DISK=[
  CLONE="NO",
  CLONE_TARGET="SYSTEM",
  CLUSTER_ID="100",
  DATASTORE="local_files",
  DATASTORE_ID="103",

Thing is, I seem not to be able to clone an image to the remote location, I always get the message that the remote (ssh) datastore has not enough space.

root@one-1:~# onedatastore list
  ID NAME                SIZE AVAIL CLUSTERS     IMAGES TYPE DS      TM      STAT
   0 system            203.6G 15%   100,101,102,      0 sys  -       shared  on  
   1 default           203.6G 15%   100,101,102,      4 img  fs      shared  on  
   2 files             203.6G 15%   100,101,102,      0 fil  fs      shared  on  
 102 local_system           - -     100               0 sys  -       ssh     on  
 103 local_files       393.6G 78%   100              19 img  fs      ssh     on  
 104 lfs_img               3T 66%   100               5 img  fs      shared  on  
 105 lfs_sys               3T 66%   100               0 sys  -       shared  on  
 106 local_img_fks         0M -     104               0 img  fs      ssh     on  
 107 local_sys_fks          - -     104               0 sys  -       ssh     on  

The local oned-VM does not have the remote filesystems mounted; I assume that’s why it reads “0M” and therefore basically “no space left on device”?

root@one-1:~# for i in `cd /var/lib/one/datastores/ ; echo *` ; do echo -n "`printf %-4d $i `" ; df -h /var/lib/one/datastores/$i | tail -1 ; done
0   nfs-int:/nfs    204G  172G   30G  86% /nfs
1   nfs-int:/nfs    204G  172G   30G  86% /nfs
102 tmpnfs-int:/tmp-nfs  394G   66G  309G  18% /tmp-nfs
103 tmpnfs-int:/tmp-nfs  394G   66G  309G  18% /tmp-nfs
104 mfs#lfs:9421    3.0T  1.1T  2.0T  34% /lfs
105 mfs#lfs:9421    3.0T  1.1T  2.0T  34% /lfs
2   nfs-int:/nfs    204G  172G   30G  86% /nfs
root@FKS-3:~# for i in `cd /var/lib/one/datastores/ ; echo *` ; do echo -n "`printf %-4d $i `" ; df -h /var/lib/one/datastores/$i | tail -1 ; done
106 /dev/md2         50G  3.9G   43G   9% /
107 /dev/mapper/vg0-pg--gis  1.8T  509G  1.3T  29% /data

A few questions:

  1. Why does the storage location of the image matter in regards to networking at all?!

  2. How to I copy/clone an locally existing image to a remote ssh-based datastore?

  3. What did I miss/misunderstood regarding OpenNebula concepts? (E. g. why would the oned-/Sunstone host need “mounted access” to filesystems defined to be accessible via ssh only?)

Not sure if I understood completely. But the concept is: Image DS has to be accessible to the frontend (only the frontend if we talk about ssh tm) System DS has to be accessible to the node (only the node if ssh). Whenever you instantiate a template the frontend will copy the image via ssh from the frontend-local image DS to the node-local System DS.
Otherwise it would not be possible to instantiate the image / template on another host.

Depending on your needs and the bandwith between the two locations, you could alternatively create a second Opennebula installation and create a federation between the two, so you can control it from a central point.

Glad, that’s how I understood it and how it’s set up. Thing here is: I create (via Sunstone if that makes a difference?) a Template for VMs to be deployed on Cluster 104 (which has DS 106 for img and 107 for sys, both with ssh TM) with an Image that resides in DS 103 (ssh TM, type img). DS 106 and 107 belong to Cluster 104 only, DS 103 to Cluster 100 only. (oned-VM runs outside OpenNebula as a Wild VM on a host of Cluster 100, in case that matters.)

All fine, until I try to instantiate a VM with that Template, see below.

I think the message Network for NIC 0 is not in the same cluster as the one used by other VM elements (cluster 100) is wrong; the real issue to me seems to be that the VM-Template generated relates to Cluster 100 for the disk image, but it needs to be stored in a DS of Cluster 104?

My expectation was that, with SCHED_REQUIREMENTS = "CLUSTER_ID=\"104\"", a matching DS of that Cluster would be choosen when I create a VM from the Template, but that seems not to have happened?

The Template for Cluster-104-VM has:

…
DISK=[
  DEV_PREFIX="vd",
  DRIVER="qcow2",
  IMAGE="ffgt_local_fks3",
  IMAGE_UNAME="oneadmin" ]
…
SCHED_DS_RANK="FREE_MB"
SCHED_RANK="-RUNNING_VMS"
SCHED_REQUIREMENTS="CLUSTER_ID=\"104\""

The template created when using above’s template with “Instantiate as persistent” has:

DISK = [
  CLONE = "NO",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "100",
  DATASTORE = "local_files",
  DATASTORE_ID = "103",
  DEV_PREFIX = "vd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  IMAGE_ID = "82",
  IMAGE_STATE = "1",
  LN_TARGET = "SYSTEM",
  PERSISTENT = "YES",
  READONLY = "NO",
  SAVE = "YES",
  SIZE = "20480",
  SOURCE = "/var/lib/one//datastores/103/2c36a94863b32e88ff62b15b9ba30ea7",
  TM_MAD = "ssh",
  TYPE = "FILE" ]

So, somehow SCHED_REQUIREMENTS="CLUSTER_ID=\"104\"" wasn’t correctly honoured here? Or what’s the issue?

Maybe later; I’d prefer to keep complexity low for now :wink:

you can not use an image that belongs to a DS that belongs to another cluster.

I see. Still, the error message pointing to the NETWORK being wrong is at least misleading, as it’s the IMAGE which is in an incompatible DS, no?

Will add the frontend’s Image DS to the remote cluster and retry, thanks!

(Just to be sure, what I want to achieve is: having images on the frontend in my local DC and deploy those to hosts in remote DCs to (host-/DC-) local Datastores via ssh. For that, the Image DS needs to be part of all Clusters, but the System DS needs to be part of the approriate remote Cluster?)

true message is missleading

exactly. image ds has to be usable by all clusters you want to use the images in and there has to be a compatible system ds in the corresponding cluster

Thanks a lot, image creation is now working (I think) as expected, but I cannot get VMs deployed.

/var/log/one/sched.log reads:

Mon Mar 13 13:29:08 2017 [Z0][VM][D]: Found 1 pending/rescheduling VMs.
Mon Mar 13 13:29:08 2017 [Z0][HOST][D]: Discovered 11 enabled hosts.
Mon Mar 13 13:29:08 2017 [Z0][SCHED][D]: Match-making results for VM 69:
	Cannot schedule VM, there is no suitable host.

Mon Mar 13 13:29:08 2017 [Z0][SCHED][D]: Dispatching VMs to hosts:
	VMID	Host	System DS
	-------------------------

The VMs template has:

root@one-1:~# onevm show 69
VIRTUAL MACHINE 69 INFORMATION                                                  
ID                  : 69                  
NAME                : fks-test-1          
USER                : oneadmin            
GROUP               : oneadmin            
STATE               : PENDING             
LCM_STATE           : LCM_INIT            
RESCHED             : No                  
START TIME          : 03/13 11:05:17      
END TIME            : -                   
DEPLOY ID           : -                   
[…]
VM DISKS                                                                        
 ID DATASTORE  TARGET IMAGE                               SIZE      TYPE SAVE
  0 lfs_img    vda    fks-test-1-disk-0                   -/20G     file  YES
  1 -          hda    CONTEXT                             -/-       -       -

VM NICS                                                                         
 ID NETWORK              BRIDGE       IP              MAC               PCI_ID  
  0 BRFKS                brfks        100.64.120.242  02:00:65:20:78:f2
                                      fe80::400:65ff:fe20:78f2
                                      2001:db8:1703:0:400:65ff:fe20:78f2
[…]
USER TEMPLATE                                                                   
DESCRIPTION="VM in FKS"
HYPERVISOR="kvm"
LOGO="images/logos/ubuntu.png"
SCHED_MESSAGE="Mon Mar 13 13:30:08 2017 : Cannot dispatch VM to any Host. Possible reasons: Not enough capacity in Host or System DS, or dispatch limit reached"
SCHED_REQUIREMENTS="CLUSTER_ID=\"104\" | CLUSTER_ID=\"105\" | CLUSTER_ID=\"106\""
[…]

Clusters 104, 105, 106 are single host “clusters” due to lack of shared storage, each Cluster has it’s unique system DS (type ssh). Why doesn’t the scheduler finding a suitable Host, how can I debug this further?

hm.

what is your scheduler config in the vm template?

Sorry, do you mean this (from onevm show 69, part after VIRTUAL MACHINE TEMPLATE)?

AUTOMATIC_DS_REQUIREMENTS="\"CLUSTERS/ID\" @> 104"
AUTOMATIC_REQUIREMENTS="(CLUSTER_ID = 104) & !(PUBLIC_CLOUD = YES)"
CLONING_TEMPLATE_ID="88"
CONTEXT=[
  DISK_ID="1",
  ETH0_CONTEXT_FORCE_IPV4="",
  ETH0_DNS="",
  ETH0_GATEWAY="100.64.120.241",
  ETH0_GATEWAY6="fd42:ffee:ff12:6464::6504",
  ETH0_IP="100.64.120.242",
  ETH0_IP6="2a06:e881:1703:0:400:65ff:fe20:78f2",
  ETH0_IP6_ULA="",
  ETH0_MAC="02:00:65:20:78:f2",
  ETH0_MASK="255.255.255.240",
  ETH0_MTU="",
  ETH0_NETWORK="100.64.120.240",
  ETH0_SEARCH_DOMAIN="",
  ETH0_VLAN_ID="",
  ETH0_VROUTER_IP="",
  ETH0_VROUTER_IP6="",
  ETH0_VROUTER_MANAGEMENT="",
  NETWORK="YES",
  ONEGATE_ENDPOINT="http://one.my.fqdn:5030",
  REPORT_READY="YES",
  SET_HOSTNAME="fks-test-1",
  SSH_PUBLIC_KEY="",
  START_SCRIPT="if [ ! -e /etc/ssh/one_keys_created ]; then /bin/rm /etc/ssh/ssh_host_* && /usr/sbin/dpkg-reconfigure openssh-server 2>&1 >/etc/ssh/one_keys_created ; fi",
  TARGET="hda",
  TOKEN="YES",
  VMID="69" ]
CPU="1"
CREATED_BY="0"
FEATURES=[
  ACPI="yes",
  APIC="yes",
  GUEST_AGENT="yes" ]
GRAPHICS=[
  LISTEN="0.0.0.0",
  TYPE="VNC" ]
MEMORY="1024"
TEMPLATE_ID="90"
VCPU="1"
VMID="69"

you actually don’t need 3 clusters for that, but it shouldn’t do any harm.

do your datastores belong to the clusters (image and system datastore)?

means: image needs to have all three clusters and system always the corresponding one.

The idea is to schedule a new VM and let OpenNebula decide on which host to instantiate it. Network is shared between the three “clusters”, datastore isn’t.

root@one-1:~# onecluster list | grep FKS
  102 UU FKS                        0     3          8
  104 FKS-3                         1     3          6
  105 FKS-2                         1     2          1
  106 FKS-1                         1     2          1
root@one-1:~# onedatastore list --csv 
ID,NAME,SIZE,AVAIL,CLUSTERS,IMAGES,TYPE,DS,TM,STAT
0,system,203.6G,13%,"100,101,102,103",0,sys,-,shared,on
1,default,203.6G,13%,"100,101,102,103,104",4,img,fs,shared,on
2,files,203.6G,13%,"100,101,102,103,104",0,fil,fs,shared,on
102,local_system,-,-,100,0,sys,-,ssh,on
103,local_files,393.6G,78%,"100,101,102,103,104",19,img,fs,ssh,on
104,lfs_img,3T,65%,"100,101,102,103,104",7,img,fs,shared,on
105,lfs_sys,3T,65%,100,0,sys,-,shared,on
106,local_img_fks,49.1G,75%,104,2,img,fs,ssh,on
107,sys_fks3,-,-,"102,104",0,sys,-,ssh,on
108,sys_fks1,-,-,"102,106",0,sys,-,ssh,on
109,sys_fks2,-,-,"102,105",0,sys,-,ssh,on

=> img & fil DS are shared across all Clusters. sys are per-Cluster (per-Host, actually) for these three hosts.

So, if I understand correctly, AUTOMATIC_REQUIREMENTS="(CLUSTER_ID = 104) & !(PUBLIC_CLOUD = YES)" means that on creation of the new VM, the choice was already made to deploy to Cluster 104?

Tried a manual deploy via Sunstone to “fks-2” host; something isn’t really right here:

root@fks-2:~# ls -la /var/lib/one//datastores/109/69/
total 384
drwxrwxr-x 2 oneadmin oneadmin   4096 Mar 13 16:32 .
drwx--x--x 3 oneadmin oneadmin   4096 Mar 13 16:32 ..
-rw-rw-r-- 1 oneadmin oneadmin   1182 Mar 13 16:32 deployment.0
lrwxrwxrwx 1 oneadmin oneadmin     60 Mar 13 16:32 disk.0 -> /var/lib/one/datastores/104/127cd0f37619f3490cd7ebf0a1f2d062
-rw-r--r-- 1 oneadmin oneadmin 374784 Mar 13 16:32 disk.1
root@dorfl:~# ls -la /var/lib/one//datastores/
total 8
drwxrwxr-x 2 oneadmin oneadmin 4096 Mar 10 00:19 .
drwxr-xr-x 6 oneadmin root     4096 Feb 16 23:33 ..
lrwxrwxrwx 1 root     root       23 Mar  7 03:00 109 -> /var/lib/libvirt/images

DS 104 is a shared DS, but it’s not present on that host, therefore the deployment failed.

So, well. All my image DS are on share storage in my local DC. Therefore the are of type shared. Adding them them to the remote DC’s clusters made OpenNebula extend the idea of them being local to the hosts there; thus a link instead of a copy. So, looks like I need to make one image DS ssh-only and use that to store images for remote deployments?

Created ssh-backed image DS 110, added that to only the remote clusters, cloned an OS image to that DS, updated VM template to use that images (and therefore that DS as source), created new VM. One of the designated hosts was selected, a directory for the VM was created in the hosts system DS, the image was copied over and in the end the VM got stated, yeah :wink:

So, to summarize: make sure that your shared image/file DS (need to fix file yet) is linked only to those Hosts that actually can access it. If you have shared and ssh, use separate ssh-DS. Sorry for the issue; still learning my way through Open Nebula :wink:

Update: Hmm, still don’t get it :frowning: On undeploy, the VM is tried to be copied to the oned-host — to the same DS subdirectory that’s used on the remote host:

Mon Mar 13 18:21:25 2017 [Z0][ReM][D]: Req:9456 UID:0 VirtualMachineAction invoked , "undeploy", 71
Mon Mar 13 18:21:25 2017 [Z0][DiM][D]: Undeploying VM 71
…
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 Command execution fail: /var/lib/one/remotes/tm/ssh/mv dorfl.uu.org:/var/lib/one//datastores/109/71 one-1:/var/lib/one//datastores/109/71 71 109
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 mv: Moving dorfl.uu.org:/var/lib/one/datastores/109/71 to one-1:/var/lib/one/datastores/109/71
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG E 71 mv: Command "set -e -o pipefail
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 tar -C /var/lib/one/datastores/109 --sparse -cf - 71 | ssh one-1 'tar -C /var/lib/one/datastores/109 --sparse -xf -'
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 rm -rf /var/lib/one/datastores/109/71" failed: ssh: Could not resolve hostname one-1: Name or service not known
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG E 71 Error copying disk directory to target host
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: LOG I 71 ExitCode: 255
Mon Mar 13 18:21:29 2017 [Z0][TM][D]: Message received: TRANSFER FAILURE 71 Error copying disk directory to target host
Mon Mar 13 18:21:30 2017 [Z0][VMM][I]: --Mark--

I was actually fortunate that the short name ssh’ing home didn’t work (fixed now):

root@one-1:~# ls -la /var/lib/one/datastores/109
total 8
drwxrwxr-x 2 oneadmin oneadmin 4096 Mär 13 18:21 .
drwxr-xr-x 3 oneadmin oneadmin 4096 Mär 13 18:21 ..
root@one-1:~# df -h /var/lib/one/datastores/109
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        15G   11G  4,4G  70% /

Why is the VM moved to /var/lib/one/datastores/109 on the oned host instead of /var/lib/one/datastores/110 where it came from?

root@one-1:~# onevm show 71
…
VM DISKS                                                                        
 ID DATASTORE  TARGET IMAGE                               SIZE      TYPE SAVE
  0 ssh-backed vda    fks-test-1-disk-0                   3.5G/20G  file  YES
  1 -          hda    CONTEXT                             1M/-      -       -
…
root@one-1:~# oneimage show fks-test-1-disk-0
IMAGE 92 INFORMATION                                                            
ID             : 92                  
NAME           : fks-test-1-disk-0   
USER           : oneadmin            
GROUP          : oneadmin            
DATASTORE      : ssh-backed-image-ds 
TYPE           : OS                  
REGISTER TIME  : 03/13 18:02:23      
PERSISTENT     : Yes                 
SOURCE         : /var/lib/one//datastores/110/44d3e6140ef13e7f1b06b8174f477e84
PATH           : /var/lib/one//datastores/110/bf6ae83ad786def4261fd6d640a230f4
FSTYPE         : qcow2               
SIZE           : 20G                 
STATE          : used                
RUNNING_VMS    : 1                   
[…]
root@one-1:~# df -h /var/lib/one//datastores/110/
Filesystem      Size  Used Avail Use% Mounted on
mfs#lfs:9421    3,0T  1,1T  2,0T  35% /lfs

So, what did I mis this time? :wink: