What to do with a ghost VM?

FrankKC · March 19, 2024, 10:06pm

The Problem:

Hi there guys. My team and I recently updated one of our OpenNebula Clusters but once upgraded one VM went somehow a ghost, as it only seems in the Sunstone and while typing onevm list, if I try to access the VM page I get:

And when I try to show it via opennebula cli I get the very same error:

onevm show 62
[one.vm.info] Error getting virtual machine [62].

I can’t do anything with that VM, can’t delete, not show, not resume, not even undeploy it! So I can’t undeploy it, that VM still has attached 2 NICs which now I can’t release

Context:

The opennebula is running inside an upgraded Ubuntu 22.04.4 LTS.
The previous version was OpenNebula 6.4, and the upgrade was up to the 6.8.
In order to upgrade we follow the official documentation.

We noticed that the VM’s log (the ones under /var/log/one/{$ID}.log) have completly dissapear! BUT the disks that VM has attached did not disappear from our SDS.

What have we tried in order to recover or at least delete the VM?

The list shows actions we have made in order to recover or delete the VM but with no luck at all

We tried the recover options from the Sunstone
We tried the recover using the cli: onevm recover --delete-db $ID
We tried with onedb purge-history --id $ID

From all of the previous attempt always we always got the [one.vm.info] Error getting virtual machine [62].

What we want?

At this point we only want to know what the heck happened with that VM and how we get ride of it!

Also we want to delete it as we were already able to, using the disks from the SDS, replicate the VM in a new template.

Thanks!!!

dclavijo · March 20, 2024, 2:01pm

Try running a onedb fsck to correct information on the database. Things like references to a missing VM should be handled by it. Is the VM actually running on the hypervisor node ? Try also looking on /var/log/one/oned.log for errors referencing said VM. Send also the output of the following SQL commands run against the opennebula database

select oid from vm_pool;
select body from vm_pool where oid=62;

As for what happened, is hard to tell. Ideally you have a backup of the database prior to the upgrade so you can repeat the upgrade process and inspect what happens with --verbose mode when running onedb upgrade.

FrankKC · March 20, 2024, 10:40pm

Hi there @dclavijo, thanks so much for kind answer.

Yes, I do have a backup of the database before any upgrade. My team and I recreated the situation in a virtual environment just to know more about this.

Before to proceed: I tried the onedb restore -f -v (please, note the -v) command but I got no feedback on the command line, is there any way to get feedback in my terminal from the onedb restore?

So… We repeated the process, and hoping to get more info about this problem we rised the log level to 5 on the /etc/one/oned.conf:

LOG = [
  SYSTEM      = "file",
  DEBUG_LEVEL = 5,
  USE_VMS_LOCATION = "NO"
]

But no luck at all, meaning that we got no info about this VM on the /var/log/one/oned.log

Here is the output of the onedb upgrade --verbose:

And we still got the very same error:

About the command’s output you ask for I will leave them here (sorry about the long texts):

select_oid.txt (534 Bytes)
vm62_body.txt (12.2 KB)

I guess we could, somehow delete the VM from the database, but: is there any way to recover the VM from this state?

dclavijo · March 25, 2024, 3:15pm

Have you tried issuing a onedb fsck. On the output you sent, the vm appears on the vm_pool table, with a clear XML template on its body column, yet somehow your system is not able to query it.

You can manually tinker with the database entries using onedb change-body and onedb update-body to correct problems like this. You’d have to make it so the VM database entry matches what is really happening on the hypervisor node. Take a look at this section.

pczerny · March 28, 2024, 9:05pm

You write that you executed onedb purge-history --id $ID but in the VM body I can see HISTORY_RECORDS->HISTORY->SEQ = 16 This is strange, because after the purge-history the SEQ should be 1.
This indicates the issue is in the VM history records, please paste output of the following SQL

select * from history where vid = 62; - this is to check the VM history
select * from local_db_versioning; - to check the history of OpenNebula upgrades

Do you remember if onevm show 62 works on version 6.4?

Topic		Replies	Views
Remove vm from list GUI - Sunstone	2	726	September 21, 2021
VM ghost in Sunstone Community Support	6	1167	January 13, 2016
Opennebula 5 beta snapshot & host Community Support	7	1028	June 21, 2016
VMs are on different Hosts than ONE is thinking Storage	1	230	December 28, 2020
Hi. is it possible to remove a vm from opennebula without removing from vcenter? Community Support	9	1312	June 29, 2019

What to do with a ghost VM?

The Problem:

Context:

What have we tried in order to recover or at least delete the VM?

What we want?

Related Topics