OpenNebula 5.6 RAFT two nodes


(Razvan Crainea) #1

Hello!

I’m running OpenNebula 5.6.1 on Debian 9, with two nodes in a HA scenario using RAFT. Everything runs OK until the leader fails (or is turn down using systemctl). When I shut the leader down (systemctl stop opennebula), the second node, which (I hope) should become leader, gets stuck in the candidate state:
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 10.0.0.2 error - - - - -
1 10.0.0.3 candidate 1006 27400 0 -1 -1

The only logs I see are:
Mon Oct 15 19:25:00 2018 [Z0][RCM][I]: Error requesting vote from follower 0:libcurl failed to execute the HTTP POST transaction, explaining: Failed to connect to 10.0.0.2 port 2633: Connection refused
Mon Oct 15 19:25:00 2018 [Z0][RCM][I]: No leader found, starting new election in 2790ms

I am expecting to see these errors, since the leader node is down, but my expectations are also that the failover node (10.0.0.3) to take over and become leader. Are my expectations correct, or there is something wrong with my scenario?

Thank you!
Răzvan


(Anton Todorov) #2

Hi @razvanc,

OpenNebula recommends 3 or 5 nodes

The RAFT consensus algorithm needs to have N/2+1 nodes available to create a quorum. The remaining node in your case is in split-brain situation waiting for other node(s) to become available to start the election.

Hope this helps.

Best Regards,
Anton Todorov


(Razvan Crainea) #3

Hi, Anton!

Thank you for your prompt response! This was actually what I was thinking too, but I couldn’t pinpoint the hard requirement of having N/2+1 nodes, I only saw the recommendation you pointed out. TBH, I didn’t read the RAFT specifications, I am sorry about that!
Do you know if there is a method of adding a 3rd, lightweight node (not an actual installment), just for ensuring consensus? Not sure whether only deploying a RAFT generic implementation will help, since it will still need to implement some of the OpenNebula logic.

Thanks,
Răzvan


(Anton Todorov) #4

Hi Razvan,

I am not sure is it possible to add just a voting beacon. Please feel free to issue a feature request though.

Best Regards,
Anton Todorov


(Razvan Crainea) #5

Done! One can follow the feature request here.

Thank you very much for your help!
Răzvan


(petr108m) #6

status of feature
Code committed to upstream release/hotfix branches

does it mean a possibility to download and install?
can u provide details?


(Razvan Crainea) #7

According to the ticket, nothing was done yet - those are just bullets that need to be checked when completed.


(Ruben S. Montero) #8

Hi,

AFAIK this is not possible for RAFT. A node must be leader, follower or candidate. Note that log entries are committed once a majority of followers have replicated the entry, so the algorithm assumes that any of them could take the leadership in case of failure…

I guess the light way approach would be to create a VM with your third oned server running in it…