Bionic lxd won't boot on ssh datastore

I get:
Sun Oct 27 22:24:17 2019 [Z0][VM][I]: New LCM state is BOOT
Sun Oct 27 22:24:17 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/13/deployment.1
Sun Oct 27 22:24:20 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: ExitCode: 0
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy ‘/var/lib/one//datastores/115/13/deployment.1’ ‘lxd.cloud.sa16.lan’ 13 lxd.cloud.sa16.lan
Sun Oct 27 22:24:23 2019 [Z0][VMM][E]: deploy: Error: not found
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"a95afb7f-5d5f-4972-9f72-fd75217ae910", "class"=>"task", "description"=>"Creating container", "created_at"=>"2019-10-27T22:24:23.486291571+08:00", "updated_at"=>"2019-10-27T22:24:23.486291571+08:00", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-13"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Invalid devices: Missing source '/var/lib/one/datastores/115/13/mapper/disk.1' for disk 'context'", "location"=>"none"}} (LXDError) Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:473:in wait?’
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:134:in create' Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:52:in
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: ExitCode: 1
Sun Oct 27 22:24:23 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Sun Oct 27 22:24:23 2019 [Z0][VMM][E]: Error deploying virtual machine
Sun Oct 27 22:24:24 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE


Versions of the related components and OS (frontend, hypervisors, VMs):
On Ubuntu 18.04 Opennebula 5.8.5

Steps to reproduce:

  1. Download lxd image from market place
  2. Create ssh SYSTEM datastore
  3. Create ssh IMAGE datastore
  4. Create instance of lxd image pointing it to just created ssh SYSTEM datastore
  5. See error in log

Current results:
See error log above

Expected results:
It would boot.

Thanks for your help.

LXD complains about the context disk missing. The problem shouldn’t be the bionic image. Could you try instantiating the imported app template without context and then debug from there ?

So I did a fresh install of 5.8.5 and the KVM verification worked, but the LXD won’t. I removed the context from the bionic template, as you said, and got this:

Thu Oct 31 23:56:37 2019 [Z0][VM][I]: New state is ACTIVE
Thu Oct 31 23:56:37 2019 [Z0][VM][I]: New LCM state is PROLOG
Thu Oct 31 23:56:52 2019 [Z0][VM][I]: New LCM state is BOOT
Thu Oct 31 23:56:53 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/5/deployment.0
Thu Oct 31 23:56:53 2019 [Z0][VM][I]: Virtual Machine has no context
Thu Oct 31 23:56:53 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/0/5/deployment.0' 'lxd.cloud.sa16.lan' 5 lxd.cloud.sa16.lan
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Processing disk 0
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Using raw filesystem mapper for /var/lib/one/datastores/0/5/disk.0
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-5/rootfs using device /dev/loop12
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop12 at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-5/rootfs
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: --- Starting container ---
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Name: one-5
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Location: none
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Remote: unix://
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Architecture: x86_64
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Created: 2019/10/31 15:56 UTC
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Status: Stopped
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Type: persistent
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Profiles: default
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]:
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Log:
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]:
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc one-5 20191031155702.613 ERROR start - start.c:start:2124 - No such file or directory - Failed to exec "/sbin/init"
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc one-5 20191031155702.613 ERROR sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 7)
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc one-5 20191031155702.613 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:873 - Received container state "ABORTING" instead of "RUNNING"
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc one-5 20191031155702.614 ERROR start - start.c:__lxc_start:2039 - Failed to spawn container "one-5"
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc 20191031155702.654 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command "get_state"
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]:
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Processing disk 0
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Using raw filesystem mapper for /var/lib/one/datastores/0/5/disk.0
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-5/rootfs
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: deploy: Umounting disk mapped at /dev/loop12
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in `wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"c78dbf32-6d77-4015-9650-5eb7634fb6c1", "class"=>"task", "description"=>"Starting container", "created_at"=>"2019-10-31T23:57:01.51995379+08:00", "updated_at"=>"2019-10-31T23:57:01.51995379+08:00", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-5"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Failed to run: /snap/lxd/current/bin/lxd forkstart one-5 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/one-5/lxc.conf: ", "location"=>"none"}} (LXDError)
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:473:in `wait?'
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:485:in `change_state'
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:210:in `start'
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:68:in `<main>'
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: ExitCode: 1
Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Thu Oct 31 23:57:08 2019 [Z0][VMM][E]: Error deploying virtual machine
Thu Oct 31 23:57:08 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Thanks for your help.

Sure enough when I run lxd it says:

/snap/lxd/current/bin/lxd: error while loading shared libraries: libdqlite.so.0: cannot open shared object file: No such file or directory

I guess I’ll test first if lxd is going to run: LXD ERROR create VM

Thu Oct 31 23:57:08 2019 [Z0][VMM][I]: lxc one-5 20191031155702.613 ERROR start - start.c:start:2124 - No such file or directory - Failed to exec “/sbin/init”

It is possible that the image could be empty. Can you try manually mounting the VM disk located in the system datastore and list its contents ? It should be like /var/lib/one/datastores/0/5/disk.0 with 5 in this case being the VM id.

I had lxd 3.18 installed before reinstalling opennebula. So i purge opennebula-lxd-snap opennebula-node-lxd and lxd snap and reinstalled them and it worked:

Fri Nov 1 01:01:19 2019 [Z0][VM][I]: New state is ACTIVE
Fri Nov 1 01:01:19 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Nov 1 01:01:37 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Nov 1 01:01:37 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/6/deployment.0
Fri Nov 1 01:01:40 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Nov 1 01:01:41 2019 [Z0][VMM][I]: ExitCode: 0
Fri Nov 1 01:01:41 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Processing disk 0
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Using raw filesystem mapper for /var/lib/one/datastores/0/6/disk.0
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-6/rootfs using device /dev/loop11
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop11 at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-6/rootfs
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/0/6/mapper/disk.1 using device /dev/loop12
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop12 at /var/lib/one/datastores/0/6/mapper/disk.1
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: deploy: --- Starting container ---
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: ExitCode: 0
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: ExitCode: 0
Fri Nov 1 01:01:51 2019 [Z0][VMM][I]: Successfully execute network driver operation: post.
Fri Nov 1 01:01:51 2019 [Z0][VM][I]: New LCM state is RUNNING 

Thanks for your help.

Glad it worked. Keep in mind we only support lxd 3.0.x