Problems setting HA

I have the following problem, I have implemented an OpenNebula 5.4 HA as indicated by the documentation:

http://docs.opennebula.org/5.4/advanced_components/ha/frontend_ha_setup.html

And I have the following errors when I created a VM, I have been searched all over the web without finding any solution, I hope they can help me.

Thu Sep 28 16:45:45 2017 [Z0][VM][I]: New state is ACTIVE
Thu Sep 28 16:45:45 2017 [Z0][VM][I]: New LCM state is PROLOG
Thu Sep 28 16:45:47 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ssh/clone century-node1:/var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121 century-node2:/var/lib/one//datastores/0/13/disk.0 13 1
Thu Sep 28 16:45:47 2017 [Z0][TM][I]: clone: Cloning century-node1:/var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121 in /var/lib/one/datastores/0/13/disk.0
Thu Sep 28 16:45:47 2017 [Z0][TM][E]: clone: Command “scp -r century-node1:/var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121 century-node2:/var/lib/one//datastores/0/13/disk.0” failed: /var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121: No such file or directory
Thu Sep 28 16:45:47 2017 [Z0][TM][E]: Error copying century-node1:/var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121 to century-node2:/var/lib/one//datastores/0/13/disk.0
Thu Sep 28 16:45:47 2017 [Z0][TM][I]: ExitCode: 1
Thu Sep 28 16:45:47 2017 [Z0][TM][E]: Error executing image transfer script: Error copying century-node1:/var/lib/one//datastores/1/f3d3b86a9d8892d4dccce2a2a7d28121 to century-node2:/var/lib/one//datastores/0/13/disk.0
Thu Sep 28 16:45:47 2017 [Z0][VM][I]: New LCM state is PROLOG_FAILURE
Thu Sep 28 17:12:16 2017 [Z0][VM][I]: New LCM state is CLEANUP_DELETE
Thu Sep 28 17:12:20 2017 [Z0][VM][I]: New state is DONE
Thu Sep 28 17:12:20 2017 [Z0][VM][I]: New LCM state is LCM_INIT

I hope you can help me.

Hi Oscar,

Can you provide some more details regarding your setup?

Without more context it is hard to figure out what is the role of the ‘century-node1’, ‘century-node2’ and why OpenNebula expect to have an IMAGE datastore there?

Best Regards,
Anton Todorov

Thanks for your answer Anton

I have the following configuration.

In the zones:

ZONE 0 INFORMATION
ID : 0
NAME : Opennebula

ZONE SERVERS
ID NAME ENDPOINT
0 century-node1 http://192.168.14.2:2633/RPC2
1 century-node2 http://192.168.14.3:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 century-node1 leader 3811 86231 86231 0 -1
1 century-node2 follower 3811 86231 86231 0 -1

ZONE TEMPLATE
ENDPOINT=“http:// localhost:2633/RPC2”

And the file /etc/one/oned.conf:

FEDERATION = [
MODE = “STANDALONE”,
ZONE_ID = 0,
SERVER_ID = 0, #0 for the node1 and 1 for the node2
MASTER_ONED = “”
]

RAFT = [
LOG_RETENTION = 500000,
LOG_PURGE_TIMEOUT = 600,
ELECTION_TIMEOUT_MS = 2500,
BROADCAST_TIMEOUT_MS = 500,
XMLRPC_TIMEOUT_MS = 2000
]

Executed when a server transits from follower->leader

RAFT_LEADER_HOOK = [
COMMAND = “raft/vip.sh”,
ARGUMENTS = “leader br0 192.168.14.5/24”
]

Executed when a server transits from leader->follower

RAFT_FOLLOWER_HOOK = [
COMMAND = “raft/vip.sh”,
ARGUMENTS = “follower br0 192.168.14.5/24”
]

I have MySQL in both server, no galera, no cluster, in the documentation that they recommended using MySQL separately for each node.

In both server exists key SSH, and both servers have same configurations, so same files in /var/lib/one/.one.

I hope you can help me.

If you need more information let me know.

Best Regards

Hi Oscar,

Thank you for the provided details.

First of all your setup is in a weird state because two nodes are not enough to have a consensus which to be a leader in a split-brain situations. More on the topic you could find in the Requirements and Architecture section of the documentation. Also in the above link is a hint regarding the root cause of your issue - you must have a shared filesystem between the HA nodes to hold the image files by default (it is possible to omit this requirement if you are using a distributed storage with driver that support such possibility).

So leaving the incomplete setup aside, you are in situation where you have an appliance imported in one of the nodes, then the leader changed and when you try to instantiate a VM with the other node - the file is not available.

I would like to suggest that if you can’t provide third server with same functionality to leave the HA setup for now and use only one of the nodes as front-end. If you can add third node, make sure to have a shared filesystem between them to hold the imported files.

Hope this helps,

Best Regards,
Anton Todorov

I appreciate your help, it has been very valuable, just one more thing, which folder do you recommend to share in the nodes for a failover, /var/lib /one? o / var/lib/one/datastores /?

Thanks for your help.

Regards.

Hi Oscar,

For most flexibility I would like to suggest to mount a shared folder somewhere and replace the datastore’s folders with symlinks to relevant folders in the shared mount point. Something like:

Because the nodes are controllers and hypervizors I suggest to switch to setup with complete shared filesystem:

mv /var/lib/one/remotes/datastores/0 /shared_mountpoint/0 ; ln -s /shared_mountpoint/0 /var/lib/one/datastores/0
mv /var/lib/one/remotes/datastores/1 /shared_mountpoint/1 ; ln -s /shared_mountpoint/1 /var/lib/one/datastores/1
mv /var/lib/one/remotes/datastores/2 /shared_mountpoint/2 ; ln -s /shared_mountpoint/2 /var/lib/one/datastores/2

Where 0,1 and 2 are the datastore IDs for default SYSTEM and FILES datastores. Take care to have proper permissions in place.

Change the TM_MAD to shared in the SYSTEM datastore attributes to have a complete setup with shared datastore.

Best Regards,
Anton Todorov

Hi Anton

I thank you again for your support, apologize for the audacity but you will have some manual that you can guide me or some url where you can guide me?
Thanks for the support.

Regards.

You can look at https://github.com/marcindulak/vagrant-opennebula-ha-tutorial-centos7 - it’s an example of so called hyper-converged setup, consisting of 3 opennebula hosts configured for HA using gluster and sqlite.