Hi there guys. My team and I recently updated one of our OpenNebula Clusters but once upgraded one VM went somehow a ghost, as it only seems in the Sunstone and while typing onevm list, if I try to access the VM page I get:
And when I try to show it via opennebula cli I get the very same error:
onevm show 62
[one.vm.info] Error getting virtual machine [62].
I can’t do anything with that VM, can’t delete, not show, not resume, not even undeploy it! So I can’t undeploy it, that VM still has attached 2 NICs which now I can’t release
Context:
The opennebula is running inside an upgraded Ubuntu 22.04.4 LTS.
The previous version was OpenNebula 6.4, and the upgrade was up to the 6.8.
We noticed that the VM’s log (the ones under /var/log/one/{$ID}.log) have completly dissapear! BUT the disks that VM has attached did not disappear from our SDS.
What have we tried in order to recover or at least delete the VM?
The list shows actions we have made in order to recover or delete the VM but with no luck at all
We tried the recover options from the Sunstone
We tried the recover using the cli: onevm recover --delete-db $ID
We tried with onedb purge-history --id $ID
From all of the previous attempt always we always got the [one.vm.info] Error getting virtual machine [62].
What we want?
At this point we only want to know what the heck happened with that VM and how we get ride of it!
Also we want to delete it as we were already able to, using the disks from the SDS, replicate the VM in a new template.
Try running a onedb fsck to correct information on the database. Things like references to a missing VM should be handled by it. Is the VM actually running on the hypervisor node ? Try also looking on /var/log/one/oned.log for errors referencing said VM. Send also the output of the following SQL commands run against the opennebula database
select oid from vm_pool;
select body from vm_pool where oid=62;
As for what happened, is hard to tell. Ideally you have a backup of the database prior to the upgrade so you can repeat the upgrade process and inspect what happens with --verbose mode when running onedb upgrade.
Hi there @dclavijo, thanks so much for kind answer.
Yes, I do have a backup of the database before any upgrade. My team and I recreated the situation in a virtual environment just to know more about this.
Before to proceed: I tried the onedb restore -f -v (please, note the -v) command but I got no feedback on the command line, is there any way to get feedback in my terminal from the onedb restore?
So… We repeated the process, and hoping to get more info about this problem we rised the log level to 5 on the /etc/one/oned.conf:
Have you tried issuing a onedb fsck. On the output you sent, the vm appears on the vm_pool table, with a clear XML template on its body column, yet somehow your system is not able to query it.
You can manually tinker with the database entries using onedb change-body and onedb update-body to correct problems like this. You’d have to make it so the VM database entry matches what is really happening on the hypervisor node. Take a look at this section.
You write that you executed onedb purge-history --id $ID but in the VM body I can see HISTORY_RECORDS->HISTORY->SEQ = 16 This is strange, because after the purge-history the SEQ should be 1.
This indicates the issue is in the VM history records, please paste output of the following SQL
select * from history where vid = 62; - this is to check the VM history
select * from local_db_versioning; - to check the history of OpenNebula upgrades
Do you remember if onevm show 62 works on version 6.4?