Unable to instanciate VM on a fresh installed system

Please, describe the problem here and provide additional information below (if applicable) …
I just installed OpenNebula according the instruction at the OpenNebula site and am now on the step where I will test the installation according to the instructions at http://docs.opennebula.org/5.4/deployment/node_installation/verify.html#verify-installation. I get the following error when trying to instanciate an VM.

> Fri Sep 15 21:16:37 2017 [Z0][VM][I]: New state is ACTIVE
> Fri Sep 15 21:16:37 2017 [Z0][VM][I]: New LCM state is PROLOG
> Fri Sep 15 21:16:38 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/shared/clone front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0 2 102
> Fri Sep 15 21:16:38 2017 [Z0][TM][I]: clone: Cloning /var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d in nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
> Fri Sep 15 21:16:38 2017 [Z0][TM][E]: clone: Command "cd /var/lib/one/datastores/103/2;     rm -f /var/lib/one/datastores/103/2/disk.0;     cp /var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d /var/lib/one/datastores/103/2/disk.0     " failed: cp: cannot create regular file '/var/lib/one/datastores/103/2/disk.0': Permission denied
> Fri Sep 15 21:16:38 2017 [Z0][TM][E]: Error copying front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d to nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
> Fri Sep 15 21:16:38 2017 [Z0][TM][I]: ExitCode: 1
> Fri Sep 15 21:16:39 2017 [Z0][TM][E]: Error executing image transfer script: Error copying front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d to nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
> Fri Sep 15 21:16:39 2017 [Z0][VM][I]: New LCM state is PROLOG_FAILURE

Here are the datastores attributes:
Datastore 0 System
ALLOW_ORPHANS NO
DISK_TYPE FILE
DS_MIGRATE YES
RESTRICTED_DIRS /
SAFE_DIRS /var/tmp
SHARED YES
TM_MAD shared
TYPE SYSTEM_DS

** Datastore 1 default **
ALLOW_ORPHANS NO
CLONE_TARGET SYSTEM
DISK_TYPE FILE
DS_MAD fs
LN_TARGET NONE
RESTRICTED_DIRS /
SAFE_DIRS /var/tmp
TM_MAD shared
TYPE IMAGE_DS

** Datastore 2 files **
ALLOW_ORPHANS NO
CLONE_TARGET SYSTEM
DS_MAD fs
LN_TARGET NONE
RESTRICTED_DIRS /
SAFE_DIRS /var/tmp
TM_MAD shared
TYPE FILE_DS


Versions of the related components and OS (frontend, hypervisors, VMs):
OpenNebula v5.4.0 running on VirtualBox using Libvirt.

Steps to reproduce:
Follow instruction outlined in the document http://docs.opennebula.org/5.4/deployment/opennebula_installation/index.html and run the test outlined in the page: Verify Installation.

Current results:
Got the following error:
Fri Sep 15 21:16:37 2017 [Z0][VM][I]: New state is ACTIVE
Fri Sep 15 21:16:37 2017 [Z0][VM][I]: New LCM state is PROLOG
Fri Sep 15 21:16:38 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/shared/clone front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0 2 102
Fri Sep 15 21:16:38 2017 [Z0][TM][I]: clone: Cloning /var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d in nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
Fri Sep 15 21:16:38 2017 [Z0][TM][E]: clone: Command "cd /var/lib/one/datastores/103/2; rm -f /var/lib/one/datastores/103/2/disk.0; cp /var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d /var/lib/one/datastores/103/2/disk.0 " failed: cp: cannot create regular file ‘/var/lib/one/datastores/103/2/disk.0’: Permission denied
Fri Sep 15 21:16:38 2017 [Z0][TM][E]: Error copying front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d to nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
Fri Sep 15 21:16:38 2017 [Z0][TM][I]: ExitCode: 1
Fri Sep 15 21:16:39 2017 [Z0][TM][E]: Error executing image transfer script: Error copying front1.dasilva.network:/var/lib/one/datastores/102/cf51861787298d32feeee5e3cd66647d to nebula2.dasilva.network:/var/lib/one/datastores/103/2/disk.0
Fri Sep 15 21:16:39 2017 [Z0][VM][I]: New LCM state is PROLOG_FAILURE

Expected results:
VM should be instatiated according to the documentation.

BTW: There is no permission issue on the folder /var/lib/one/datastores/103/2/ as I can create files with he user oneadmin.

oneadmin@front1:~/datastores/103/2$ whoami
oneadmin
oneadmin@front1:~/datastores/103/2$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 15 21:16 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 15 21:16 …
---------- 1 oneadmin oneadmin 0 jan 23 1971 disk.0
oneadmin@front1:~/datastores/103/2$ pwd
/var/lib/one/datastores/103/2
oneadmin@front1:~/datastores/103/2$ touch test1
oneadmin@front1:~/datastores/103/2$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 16 13:20 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 15 21:16 …
---------- 1 oneadmin oneadmin 0 jan 23 1971 disk.0
-rw-rw-r-- 1 oneadmin oneadmin 0 sep 16 13:20 test1
oneadmin@front1:~/datastores/103/2$ rm test1
oneadmin@front1:~/datastores/103/2$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 16 13:20 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 15 21:16 …
---------- 1 oneadmin oneadmin 0 jan 23 1971 disk.0

Provide the current output of

onedatastore list

It looks like all your datastores are shared, can nebula2.dasilva.network access front1.dasilva.network:/var/lib/one/datastores?

Consider basing your test setup on GitHub - marcindulak/vagrant-opennebula-tutorial-centos7: Configures OpenNebula cluster using libvirt/kvm nested virtualization in Vagrant - it uses kvm all way down, and nfs as shared storage.

Thank you for your answer.
The nodes can access the datastores as they are shared through NFS. Please see prove bellow:

Create a test1 file on nebula2:

oneadmin@nebula2:~/datastores/103/3$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 17 13:18 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 17 13:18 …
---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0
oneadmin@nebula2:~/datastores/103/3$ touch test1
oneadmin@nebula2:~/datastores/103/3$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 17 13:19 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 17 13:18 …
---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0
-rw-rw-r-- 1 oneadmin oneadmin 0 sep 17 13:19 test1

Remove the file the file on front1.

oneadmin@front1:~/datastores/103/3$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 17 13:19 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 17 13:18 …
---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0
-rw-rw-r-- 1 oneadmin oneadmin 0 sep 17 13:19 test1
oneadmin@front1:~/datastores/103/3$ rm test1
oneadmin@front1:~/datastores/103/3$ ls -la
totalt 8
drwxrwsr-x 2 oneadmin oneadmin 4096 sep 17 13:21 .
drwxr-sr-x 3 oneadmin oneadmin 4096 sep 17 13:18 …
---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0
oneadmin@front1:~/datastores/103/3$ whoami
oneadmin
oneadmin@front1:~/datastores/103/3$ pwd
/var/lib/one/datastores/103/3

Verify the removal of test1-file on nebula2:
oneadmin@nebula2:~/datastores/103/3$ ls -l
totalt 0
---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0
oneadmin@nebula2:~/datastores/103/3$ whoami
oneadmin
oneadmin@nebula2:~/datastores/103/3$ pwd
/var/lib/one/datastores/103/3

Bellow are the /etc/fstab entry for each nodes datastore:

front1
nas.dasilva.network:/ImageDatastore /var/lib/one/datastores nfs soft,intr,rsize=32768,wsize=32768 0 0

nebula1 and nebula2
nas.dasilva.network:/ImageDatastore /var/lib/one/datastores nfs soft,intr,rsize=32768,wsize=32768 0 0

And I can do ssh without the use of a password to and from each host
front1 <-> nebula1
front1 <-> nebula2

Can you list permissions and sizes under the datastores?

tree -ps /var/lib/one/datastores/

“---------- 1 oneadmin oneadmin 0 mar 15 1972 disk.0” looks strange.

Is ntp working on all hosts?

Hi!
I tried to change to a different NFS-server to see if that was the issue and tried the same thing, but I got the same error.

Here is the output you asked for:

/var/lib/one/datastores/
├── [drwxr-xr-x        4096]  0
│   ├── [drwxrwxr-x        4096]  0
│   └── [drwxrwxr-x        4096]  7
├── [drwxr-xr-x        4096]  1
└── [drwxr-xr-x        4096]  2

5 directories, 0 files

The NTP is working on all nodes. Here are the output of the ntpstat which are similar on all nodes (front1,nebula1 and nebula2):

synchronised to NTP server (192.36.143.130) at stratum 2 
   time correct to within 30 ms
   polling server every 512 s

Here is the /etc/export file from the NFS-server

/mnt/vm-images/datastores 192.168.1.0/24(rw,sync,no_subtree_check,root_squash)

Here is the /etc/fstab file used on the machines (front1,nebula1 and nebula2), which all mount the same share.

192.168.1.2:/mnt/vm-images/datastores /var/lib/one/datastores nfs soft,intr,rsize=32768,wsize=32768 0 0

Here is the last log message from oned.log when I re-tried to create another VM on the “new” NFS-server for reference

Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:2672 UID:0 one.acl.info result SUCCESS, "<ACL_POOL><ACL><ID>0..."
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:2400 UID:0 one.vmgrouppool.info invoked , -2, -1, -1
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:2400 UID:0 one.vmgrouppool.info result SUCCESS, "<VM_GROUP_POOL></VM_..."
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:6080 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:6080 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>7</..."
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:1280 UID:0 one.vm.deploy invoked , 7, 2, false, 0
Sun Sep 17 19:32:47 2017 [Z0][DiM][D]: Deploying VM 7
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:1280 UID:0 one.vm.deploy result SUCCESS, 7
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:7216 UID:0 one.vm.info invoked , 7
Sun Sep 17 19:32:47 2017 [Z0][ReM][D]: Req:7216 UID:0 one.vm.info result SUCCESS, "<VM><ID>7</ID><UID>0..."
Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: LOG I 7 Command execution fail: /var/lib/one/remotes/tm/shared/clone front1.dasilva.network:/var/lib/o
ne/datastores/1/82227540314cbffe541ba2753a8b6a8e nebula2.dasilva.network:/var/lib/one/datastores/0/7/disk.0 7 1

Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: LOG I 7 clone: Cloning /var/lib/one/datastores/1/82227540314cbffe541ba2753a8b6a8e in nebula2.dasilva.n
etwork:/var/lib/one/datastores/0/7/disk.0

Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: LOG E 7 clone: Command "cd /var/lib/one/datastores/0/7;     rm -f /var/lib/one/datastores/0/7/disk.0;     cp /var/lib/one/datastores/1/82227540314cbffe541ba2753a8b6a8e /var/lib/one/datastores/0/7/disk.0     " failed: cp: cannot stat '/var/lib/one/datastores/1/82227540314cbffe541ba2753a8b6a8e': No such file or directory

Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: LOG E 7 Error copying front1.dasilva.network:/var/lib/one/datastores/1/82227540314cbffe541ba2753a8b6a8e to nebula2.dasilva.network:/var/lib/one/datastores/0/7/disk.0

Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: LOG I 7 ExitCode: 1

Sun Sep 17 19:32:48 2017 [Z0][TM][D]: Message received: TRANSFER FAILURE 7 Error copying front1.dasilva.network:/var/lib/one/datastores/1/82227540314cbffe541ba2753a8b6a8e to nebula2.dasilva.network:/var/lib/one/datastores/0/7/disk.0

Sun Sep 17 19:32:48 2017 [Z0][InM][D]: Host nebula1.dasilva.network (1) successfully monitored.
Sun Sep 17 19:32:52 2017 [Z0][AuM][D]: Message received: AUTHENTICATE SUCCESS 67 -

Sun Sep 17 19:32:52 2017 [Z0][ReM][D]: Req:1248 UID:0 one.vm.info invoked , 7
Sun Sep 17 19:32:52 2017 [Z0][ReM][D]: Req:1248 UID:0 one.vm.info result SUCCESS, "<VM><ID>7</ID><UID>0..."
Sun Sep 17 19:33:03 2017 [Z0][InM][D]: Host nebula2.dasilva.network (2) successfully monitored.
Sun Sep 17 19:33:08 2017 [Z0][InM][D]: Host nebula1.dasilva.network (1) successfully monitored.
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:3072 UID:0 one.zone.raftstatus invoked 
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:3072 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>-1<..."
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:8496 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:8496 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>7</..."
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:1600 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Sep 17 19:33:17 2017 [Z0][ReM][D]: Req:1600 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>7</..."
Sun Sep 17 19:33:23 2017 [Z0][InM][D]: Host nebula2.dasilva.network (2) successfully monitored.
Sun Sep 17 19:33:25 2017 [Z0][MKP][D]: Monitoring marketplace OpenNebula Public (0)
Sun Sep 17 19:33:25 2017 [Z0][InM][D]: Monitoring datastore system (0)
Sun Sep 17 19:33:25 2017 [Z0][InM][D]: Monitoring datastore default (1)
Sun Sep 17 19:33:25 2017 [Z0][InM][D]: Monitoring datastore files (2)
Sun Sep 17 19:33:26 2017 [Z0][ImM][D]: Datastore default (1) successfully monitored.
Sun Sep 17 19:33:26 2017 [Z0][ImM][D]: Datastore files (2) successfully monitored.
Sun Sep 17 19:33:26 2017 [Z0][ImM][D]: Datastore system (0) successfully monitored.
Sun Sep 17 19:33:26 2017 [Z0][MKP][D]: Marketplace OpenNebula Public (0) successfully monitored.
Sun Sep 17 19:33:28 2017 [Z0][InM][D]: Host nebula1.dasilva.network (1) successfully monitored.
Sun Sep 17 19:33:43 2017 [Z0][InM][D]: Host nebula2.dasilva.network (2) successfully monitored.

Hope this is information is of any help.
Please let me know if you need any additional information.

How the output looks like after there is an image downloaded into the datastore?

[root@frontend ~]# tree -ps /var/lib/one/datastores/
/var/lib/one/datastores/
├── [drwxr-x— 15] 0
│ └── [drwxrwxr-x 54] 0
│ ├── [-rw-rw-r-- 1014] deployment.0
│ ├── [-rw-r–r-- 399373824] disk.0
│ └── [-rw-r–r-- 372736] disk.1
├── [drwxr-x— 46] 1
│ └── [-rw-r–r-- 399373824] 5a433c01114abf776996734148b447ba
└── [drwxr-xr-x 6] 2

Try also to experiment with no_root_squash instead (https://github.com/marcindulak/vagrant-opennebula-tutorial-centos7/blob/dbf8ec68593eebd5b8908c0da74583b0a5aa8f5f/Vagrantfile#L225) - if this turns out to the the problem, it is worth investigating what opennebula really does on nfs.

Hi!
I turned off IPv6 and the firewall on all machines, including the Virtualbox host, modified the /etc/export to have no_root_sqaush, restarted NFS-server and all the VM:s.
I downloaded the ttylinux - kvm from Apps gave the template a different name (ttylinux1 - kvm).
I Instantiated it and now I get a different error message.

Mon Sep 18 21:34:27 2017 [Z0][VM][I]: New state is ACTIVE
Mon Sep 18 21:34:27 2017 [Z0][VM][I]: New LCM state is PROLOG
Mon Sep 18 21:34:29 2017 [Z0][VM][I]: New LCM state is BOOT
Mon Sep 18 21:34:29 2017 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/9/deployment.0
Mon Sep 18 21:34:30 2017 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon Sep 18 21:34:30 2017 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon Sep 18 21:34:31 2017 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one/datastores/0/9/deployment.0' 'nebula2.dasilva.network' 9 nebula2.dasilva.network
Mon Sep 18 21:34:31 2017 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/9/deployment.0
Mon Sep 18 21:34:31 2017 [Z0][VMM][I]: error: invalid argument: could not find capabilities for arch=x86_64 domaintype=kvm
Mon Sep 18 21:34:31 2017 [Z0][VMM][E]: Could not create domain from /var/lib/one/datastores/0/9/deployment.0
Mon Sep 18 21:34:31 2017 [Z0][VMM][I]: ExitCode: 255
Mon Sep 18 21:34:31 2017 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Mon Sep 18 21:34:31 2017 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/9/deployment.0
Mon Sep 18 21:34:31 2017 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Bellow is the output of the tree-command.

oneadmin@front1:/home/cesar$ tree -ps /var/lib/one/datastores/
/var/lib/one/datastores/
├── [drwxr-xr-x        4096]  0
│   ├── [drwxrwxr-x        4096]  0
│   └── [drwxrwxr-x        4096]  9
│       ├── [-rw-rw-r--         854]  deployment.0
│       ├── [-rw-r--r--    24933376]  disk.0
│       └── [-rw-r--r--      372736]  disk.1
├── [drwxr-xr-x        4096]  1
│   └── [-rw-r--r--    24933376]  c199101eba7f4f990f3ec3d011020530
└── [drwxr-xr-x        4096]  2

5 directories, 4 files

I also have the KVM module loaded on all the machines:
cesar@nebula1:~$ lsmod| grep kvm
kvm 544768 0
irqbypass 16384 1 kvm

See if the suggestion from https://forum.opennebula.io/t/test-vm-trouble helps, but I would recommend to switch to kvm, as in the details of my first post. Also, there is no point of working with ttylinux, in my experience networking on ttylinux recently is not working out of the box, use the marketplace centos7 or other full linux instead.

Good news! I got it working now, by just editing the /etc/oned.conf files and changing from KVM to QEMU according to the instruction in the link of your previous reply.
Thank you for your patience and support.

Just a question: what OS your frontend and worker nodes use?

https://docs.opennebula.org/5.4/deployment/open_cloud_storage_setup/fs_ds.html contains a passage “In case the files must be read by root the option no_root_squash must be added”, which is not clear to me. Which files are read by root?
Is the no_root_squash requirement limited to CentOS 7 only https://www.redhat.com/archives/libvirt-users/2016-April/msg00019.html? The problem may be worth investigating by someone among the developers.

The host is running Linux Mint 18.2. The virtual machines running on VirtualBox ( Version 5.1.28 r117968 (Qt5.5.1) ) which OpenNebula is installed on are as follow: one front-end and two nodes, which all are running the latest Ubuntu 16.04 with all the latest security fixes.