Debugging error adding host

#1

hi,

the first host i added in sunstone was working but i had issues to create guest so i reformated it and tried to add it again in sunstone after removing the old one and i got an issue.

The host fail to add with

Mon Feb 11 06:35:44 2019 : Error monitoring Host xxxxx (7): -

I pushed the logging level to 99 but still i only have one line in the log: error monitoring host.

I dont know what to do with so little information so i was wondering if /var/log/one/*log is the only place i could look for clues because the message do not help a lot :slight_smile:

best regards,
ghislain.

0 Likes

(Alejandro Huertas) #2

Hello,

After changing the logging level, did you restart oned?

Could you please send me the output of onehost show <HOST_ID> -x?

Thanks.

0 Likes

#3

yes but, no change, so i done apt-get upgrade to see and i got an upgrade from 5.7.80 to 5.7.85 and since then, oned refuse to restart claiming it need a db of version 5.6.0 and got a version of 5.7.80…

so it seems i broken everything. I will reformat and restart from 0. But i keep this command to try if i have the same issue again.

Ghislain.

0 Likes

#4

well after a complete reinstall i have the same result. Debug level to 99, restarted ‘opennebula’. The only error i have in /var/log/one is

Tue Feb 12 14:52:23 2019 [Z0][ONE][E]: Error monitoring Host xxxx

i replace name by xxx, i removed the x has it interfere with the forum

HOST 2 INFORMATION
ID : 2
NAME : xxxxxxxxxxxxxxxxxxxxxx
CLUSTER : default
STATE : ERROR
IM_MAD : lxd
VM_MAD : lxd
LAST MONITORING TIME : 02/12 16:12:05

HOST SHARES
RUNNING VMS : 0
MEMORY
TOTAL : 0K
TOTAL +/- RESERVED : 0K
USED (REAL) : 0K
USED (ALLOCATED) : 0K
CPU
TOTAL : 0
TOTAL +/- RESERVED : 0
USED (REAL) : 0
USED (ALLOCATED) : 0

MONITORING INFORMATION
CLUSTER_ID=“0”
ERROR=“Tue Feb 12 15:12:05 2019 : Error monitoring Host xxxxxxxxxxxxxxxx (2): -”
IM_MAD=“lxd”
NAME=“xxxxxxxxxxxxxxxx”
RESERVED_CPU=""
RESERVED_MEM=""
VM_MAD=“lxd”

WILD VIRTUAL MACHINES

NAME IMPORT_ID CPU MEMORY

VIRTUAL MACHINES

ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME
0 Likes

(Alejandro Huertas) #5

Hello @ghis_le_curieux,

Could you please try the solution proposed here: 5.8 beta 2 upgrade?

Let me know if it works after making the changes.

0 Likes

#6

i reformatted both the controller and host and do the reinstall completly from the start but again i have the same thing. The host refuse to be added with one error line saying

[Z0][ONE][E]: Error monitoring Host xxxxx

and nothing to help debug me further.

i see the controller connection in ssh to the host but it just open/close ssh connections and thats all that happen. I tried to install the hsot as kvm instead of lxd and got the same result.

0 Likes

#7

on the host:

Feb 13 14:18:44 xxx sshd[10603]: User child is on pid 10681
Feb 13 14:18:44 xxx sshd[10681]: Starting session: command for oneadmin from xx.77.134.120 port 42186 id 0
Feb 13 14:18:44 xxx sshd[10681]: Close session: user oneadmin from xx.77.134.120 port 42186 id 0
Feb 13 14:18:44 xxx sshd[10681]: Received disconnect from xx.77.134.120 port 42186:11: disconnected by user
Feb 13 14:18:44 xxx sshd[10681]: Disconnected from user oneadmin xx.77.134.120 port 42186

0 Likes

#8

upgraded to 5.7.90, no chnage, no logs, no other error than the one line

0 Likes

#9

ohhhh

just got one:
Wed Feb 13 15:39:15 2019 [Z0][InM][I]: Command execution failed (exit code: 1): scp -rp /var/lib/one/remotes/. xxxxxxxxxxxx:/var/tmp/one
Wed Feb 13 15:39:15 2019 [Z0][InM][I]: scp: error: unexpected filename: .

gona inquire it

0 Likes

#10

well i tried but cannot find why the scp fail, errors are:

https://paste.lucko.me/bdag4tlZ6b

i really dont see why scp fail and why the command say dir/. instead of dir/

both controller and host are ubuntu 18.04 stock install , any idea?

Ghislain

0 Likes

#11

if i try the very same scp command without the ‘.’ at the end and it works well…
copying the file manually do not help the host to be added.

0 Likes

#12

so i created a proxy command

`#!/bin/bash

commande=‘scp’
#commande=‘echo’;

for string in “@" do string={string%.};
commande+=” $string";
done

$commande;`

and

0 Likes

(Daniel Clavijo Coca) #13

hey, this may solve your issues https://github.com/OpenNebula/one/commit/f020fa20b90154d37ce06b37f07ea38cac5fbeb2

0 Likes

#14

Hi Daniel,

well it does not, :frowning:

the fun fact is that i still have the same message even if grepping the whole directory show no other appearance of the scp … thing…

I even rebooted the host to be sure no cache was there.

for the dir creation should it be

    #recreate remote dir structure
    SSHCommand.run("mkdir -p #{remote_dir}",host,logger)

instead of the first part ? this is how they do in /usr/lib/one/ruby/CommandManager.rb

so i got the same error even after a complete reboot of the controller:

Fri Feb 15 17:33:16 2019 [Z0][InM][I]: Command execution failed (exit code: 1): scp -r /var/lib/one/remotes/. xxxxxxxx:/var/tmp/one
Fri Feb 15 17:33:16 2019 [Z0][InM][I]: scp: error: unexpected filename: .
Fri Feb 15 17:33:16 2019 [Z0][ONE][E]: Error monitoring Host

there is no more scp with a . in both /usr/lib/one/ruby/CommandManager.rb and onehost_helper.rb but still this error… i dont understand.

regards,
Ghislain.

0 Likes

#15

seems that changing
/var/lib/one/remotes/tm/ssh/ln

by adding before SCP

SRC_PATH_SNAP={SRC_PATH_SNAP%.};
SRC={SRC%.};

solve it

0 Likes

#16

seems the bug has been found and corrected for the next release.

0 Likes

(Rafael) #17

this fix doesn’t work for me… perhaps I have to restart something?

0 Likes