[SOLVED] 4.14.2 - VMs state change to POWEROFF for no evident reason

Hi,

I had this problem this morning where some VMs state changed to POWEROFF and I don’t understand why. I was not interacting with OpenNebula (I was sleeping) and the VMs state changed to SAVE_MIGRATE, PROLOG_MIGRATE, BOOT_MIGRATE. The command “/var/tmp/one/vnm/802.1Q/clean” failed and finally, the VM entered POWEROFF state.

I created a pastebin with the VM log : http://pastebin.com/BihPEhAt

Thank you.

ONE 4.14.2 on CentOS 7.2

Found the error below in the oned.log but I don’t know why it occured.

Thu Dec 17 23:24:31 2015 [Z0][ONE][E]: Error monitoring Host r620 (10): error: failed to connect to the hypervisor
error: authentication failed : access denied by policy
ERROR MESSAGE --8<------
Error executing kvm.rb
ERROR MESSAGE ------>8–
ARCH=x86_64
MODELNAME=“Intel® Xeon® CPU E5-2640 0 @ 2.50GHz”

My problem comes from the Polkitd error below on the host. It is also related to this post. I will try the Vlastimil_Holer solution.

dec 19 01:43:29 x10 kernel: traps: polkitd[22331] general protection ip:7f8439aa27f2 sp:7fff04b9cb10 error:0 in libmozjs-17.0.so[7f8439966000+3af000]
dec 19 01:43:29 x10 libvirtd[1747]: error from service: CheckAuthorization: Message did not receive a reply (timeout by message bus)
dec 19 01:43:29 x10 libvirtd[1747]: End of file while reading data: Erreur d’entrée/sortie

Same problem that this post.
Solved by applying the Vlastimil_Holer solution.

I had similar problem when prepare migration script for migrate VMs between clusters. I made several mistakes there, so my VMs where counted on multiple hosts.

So if you have, this two types of alternating repeating messages in the vm log:

[Z0][LCM][I]: VM found again by the drivers
[Z0][VM][I]: New LCM state is RUNNING

and


[Z0][LCM][I]: VM running but monitor state is POWEROFF
[Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
[Z0][VM][I]: New state is POWEROFF
[Z0][VM][I]: New LCM state is LCM_INIT

And long placement history with a lot of monitor operations, you have exactly same situation.

That is how did I solved it.

find duplicated vms:

onehost list -x | xmlstarlet sel -t -v '/HOST_POOL/HOST/VMS/ID' -n | sort | uniq -c | sort | while read dup vm; do
 if [ "$dup" != "1" ]; then
   echo "vm $vm duplicated $dup times, on hosts:"
   onehost list -x | xmlstarlet sel -t -v "/HOST_POOL/HOST[VMS/ID/.=${vm}]/ID" -n
 fi
done

remove information about vm from the host in db:

onedb change-body host --id "${HOSTID}" "HOST/VMS/ID[.=${VMID}]" -d

clear placement history for vm:

onedb change-body vm --id ${VMID} '/VM/HISTORY_RECORDS/HISTORY/SEQ' 0