Bonding, Vlan, Bridge Setup fails

Hello together,

I have some serious problems getting our huge setup working. The main problem seems to be with arp.

First of all some infos about my setup:

OS: CentOS 7
One: 5.4
Network: Vlan Setup
Storage: Ceph, but this is not the problem at all :slight_smile: 

I have several nodes using KVM based virtualization. I have setup a bond, named bond1 which is being trunked to get several vlans to my nodes. The bond is using 802.3ad with lacp and so far this is working.

Now I have created several vms which are being served by the following virtual network (IPs are changed…):

[root@sun01 ~]# onevnet show 0
VIRTUAL NETWORK 0 INFORMATION
ID             : 0
NAME           : cloudtest-193
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
BRIDGE         : onebr.57
VN_MAD         : 802.1Q
PHYSICAL DEVICE: bond1
VLAN ID        : 57
USED LEASES    : 7

PERMISSIONS
OWNER          : um-
GROUP          : ---
OTHER          : ---

VIRTUAL NETWORK TEMPLATE
BRIDGE="onebr.57"
DNS="8.8.8.8"
GATEWAY="aa.bb.193.1"
NETWORK_MASK="255.255.255.0"
PHYDEV="bond1"
SECURITY_GROUPS="0"
VLAN_ID="57"
VN_MAD="802.1Q"

ADDRESS RANGE POOL
AR 0
SIZE           : 50
LEASES         : 7

RANGE                                   FIRST                               LAST
MAC                         02:00:3e:71:c1:0a                  02:00:3e:71:c1:3b
IP                              aa.bb.193.10                      aa.bb.193.59


LEASES
AR  OWNER                         MAC              IP                        IP6
0   V:0             02:00:3e:71:c1:0a   aa.bb.193.10                          -
0   V:1             02:00:3e:71:c1:0b   aa.bb.193.11                          -
0   V:3             02:00:3e:71:c1:0c   aa.bb.193.12                          -
0   V:5             02:00:3e:71:c1:0d   aa.bb.193.13                          -
0   V:6             02:00:3e:71:c1:0e   aa.bb.193.14                          -
0   V:7             02:00:3e:71:c1:0f   aa.bb.193.15                          -
0   V:8             02:00:3e:71:c1:10   aa.bb.193.16                          -

VIRTUAL ROUTERS
[root@sun01 ~]#

So the vms are created fine, bridges are created correctly and macs seem to be learned. Now i have the problem, that I do not have connectivity. I cannot ping my vms, nor can the vms ping the outside world.

See the macs on the bridge of one of the virtualization nodes:

[root@virt01 ~]# brctl showmacs onebr.57
port no    mac addr        is local?    ageing timer
  2    02:00:3e:71:c1:0a    no         135.40
  3    02:00:3e:71:c1:10    no         111.37
  1    48:df:37:03:00:10    yes           0.00
  1    ec:3e:f7:93:9b:c0    no         111.36
  2    fe:00:3e:71:c1:0a    yes           0.00
  2    fe:00:3e:71:c1:0a    yes           0.00
  3    fe:00:3e:71:c1:10    yes           0.00
  3    fe:00:3e:71:c1:10    yes           0.00
[root@virt01 ~]#

The macs are also correctly learned on the switch (switch sees some more macs from other nodes):

user@fra1-pod02-c18-vccs01> show ethernet-switching table vlan 57
Ethernet-switching table: 6 unicast entries
  VLAN                MAC address       Type         Age Interfaces
  vlan57            *                 Flood          - All-members
  vlan57            02:00:3e:71:c1:0a Learn          0 ae14.0
  vlan57            02:00:3e:71:c1:0b Learn         52 ae15.0
  vlan57            02:00:3e:71:c1:0d Learn       2:57 ae15.0
  vlan57            02:00:3e:71:c1:0e Learn       2:06 ae15.0
  vlan57            02:00:3e:71:c1:10 Learn       3:02 ae14.0
  vlan57            ec:3e:f7:93:9b:c0 Learn          0 ae0.0

{master:0}
user@fra1-pod02-c18-vccs01>

I have double checked my network configuration on our switches and routes and everything is fine. I have even directly connected a server to the vlan over the bond and this is also working.

From what i found out right now is that there seems to be a problem with arp or something. I have done some basic tcpdump tests and see that arp packets do not seem to reach the vms. Somehow the arps are not correctly distributed.

Is there anyone out there who has faced a similar problem or has an idea what might be causing this? Maybe I am just missing a kernel option?

Thank you for for your input.

Could you check the firewall rules? something like iptables-save in the hypervisors. Just to double-check your security group configuration is not filtering out ICMP’s

I have checked for any iptable rules. There no rules except the ones created by one. I am using the default security group which accepts everything from what I understand. I have now digged a bit deeper:

  1. I created a monitor port for the switch interface to see if the packets are correctly transmitted. Yes they are, see here:

         [root@sun03 ~]# tcpdump -n -i eno49 -v "icmp"
     tcpdump: WARNING: eno49: no IPv4 address assigned
     tcpdump: listening on eno49, link-type EN10MB (Ethernet), capture size 65535 bytes
     14:36:29.532780 IP (tos 0x0, ttl 57, id 44291, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 765, length 64
     14:36:30.532912 IP (tos 0x0, ttl 57, id 50880, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 766, length 64
     14:36:31.533265 IP (tos 0x0, ttl 57, id 46975, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 767, length 64
     14:36:32.533542 IP (tos 0x0, ttl 57, id 16161, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 768, length 64
     14:36:33.535101 IP (tos 0x0, ttl 57, id 42287, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 769, length 64
     14:36:34.533960 IP (tos 0x0, ttl 57, id 15481, offset 0, flags [none], proto ICMP (1), length 84)
         xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 770, length 64
    

So the icmp echo request is transmitted via the switch interface towards my node.

  1. In the next step I have checked if the packet reaches my bridge:

     [root@virt01 ~]# tcpdump -n -i onebr.57 -v "icmp or arp"
     tcpdump: WARNING: onebr.57: no IPv4 address assigned
     tcpdump: listening on onebr.57, link-type EN10MB (Ethernet), capture size 65535 bytes
     14:36:17.667438 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has aa.bb.193.1 tell aa.bb.193.10, length 28
     14:36:17.682304 ARP, Ethernet (len 6), IPv4 (len 4), Reply aa.bb.193.1 is-at ec:3e:f7:93:9b:c0, length 46
     14:37:01.603323 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has aa.bb.193.1 tell aa.bb.193.10, length 28
     14:37:01.610296 ARP, Ethernet (len 6), IPv4 (len 4), Reply aa.bb.193.1 is-at ec:3e:f7:93:9b:c0, length 46
     14:37:43.267190 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has aa.bb.193.1 tell aa.bb.193.10, length 28
     14:37:43.277501 ARP, Ethernet (len 6), IPv4 (len 4), Reply aa.bb.193.1 is-at ec:3e:f7:93:9b:c0, length 46
     14:38:13.427124 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has aa.bb.193.1 tell aa.bb.193.10, length 28
     14:38:13.439502 ARP, Ethernet (len 6), IPv4 (len 4), Reply aa.bb.193.1 is-at ec:3e:f7:93:9b:c0, length 46
     14:38:47.363104 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has aa.bb.193.1 tell aa.bb.193.10, length 28
     14:38:47.379275 ARP, Ethernet (len 6), IPv4 (len 4), Reply aa.bb.193.1 is-at ec:3e:f7:93:9b:c0, length 46
    

I do not receive any icmp requests on my bridge, so I have no connectivity

  1. So I checked again my bonding configuration and do not see any problems. I have the same bonding setup running in several servers but not with bridges attached to it.

I see that the packets are coming in on one of my bonding slaves:

[root@virt01 ~]# tcpdump -n -i ens3f0 -v "icmp"
tcpdump: WARNING: ens3f0: no IPv4 address assigned
tcpdump: listening on ens3f0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:43:28.233809 IP (tos 0x0, ttl 57, id 63566, offset 0, flags [none], proto ICMP (1), length 84)
    xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 1183, length 64
14:43:29.234201 IP (tos 0x0, ttl 57, id 18080, offset 0, flags [none], proto ICMP (1), length 84)
    xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 1184, length 64
14:43:30.234246 IP (tos 0x0, ttl 57, id 47359, offset 0, flags [none], proto ICMP (1), length 84)
    xx.yy.250.54 > aa.bb.193.10: ICMP echo request, id 54901, seq 1185, length 64

But they finally never make it to my bridge which serves my vms.

I have now spent several days with this problem and really have no clue what is going on there. Do you have information of someone is using a setup like me? Am I hitting some limitations? I really don’t know but it is making me going crazy. :laughing:

Maybe I am wrong but dot in bridge name is supposed to select vlan SUB-interface. I suggest you to name bridge without dot, onebr57 for example

I am using vlan sub interfaces, so onebr.57 is my bridge for vlan 57 on bond1 (in my config).

See brctl show:

bridge name	bridge id		STP enabled	interfaces
onebr.57		8000.48df37030011	no		bond1.57
							one-0-0

bond1 = physical interface
bond1.57 = tagged vlan interface with vlan id 57
onebr.57 = the bridge automatically created by one

bond1.57 is not tagged, just try rename

How shall I rename? It is created automatically. See the log excerpt of my VM:

Fri Jul 21 14:57:59 2017 [Z0][VMM][I]: pre: Executed "sudo brctl addbr onebr.57".
Fri Jul 21 14:57:59 2017 [Z0][VMM][I]: pre: Executed "sudo ip link set onebr.57 up".
Fri Jul 21 14:57:59 2017 [Z0][VMM][I]: pre: Executed "sudo ip link add link bond1 name bond1.57 mtu 1500 type vlan id 57 ".
Fri Jul 21 14:57:59 2017 [Z0][VMM][I]: pre: Executed "sudo ip link set bond1.57 up".
Fri Jul 21 14:57:59 2017 [Z0][VMM][I]: pre: Executed "sudo brctl addif onebr.57 bond1.57".

From what I understand this adds a vlan subinterface to bond1. So it is tagged. Or what do you mean?

I mean that packets from that interface srtipped from tag. Tagged are olny on bond1.
BRIDGE : onebr.57 -> BRIDGE : onebr57

bond1 is only physical. The tagged frames should reach bond1.57 which is configured correctly:

11: bond1.57@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master onebr.57 state UP mode DEFAULT qlen 1000
link/ether 48:df:37:03:00:11 brd ff:ff:ff:ff:ff:ff promiscuity 1
vlan protocol 802.1Q id 57 <REORDER_HDR>
bridge_slave addrgenmode none

It has tag 57 and is connected to onebr.57.

The bridge cannot have a vlan tag as it acts as simple software-like switch. Mayber I am not understanding what you exactly mean.

packets that os can see on bond1.57 have no tag afaik
so bridge dont receive tagged traffic, it recevies plain non-tagged packets

I finally found my problem.

It seems to be a bug with CentOS kernel. I have deactivated one link of my bonding interface and it works without a problem. As soon as I enable it again I am loosing packets.

I have digged a bit deeper and found out that the different packet streams are evenly balanced on the bonds. But the problem is that the packets of one link are loosed on the way to my bridge.

I will fill out a bug report with CentOS and will see if its the kernel (bridge) or the ixgbe driver of the kernel.

Thank you for your help.

Hi tobx,

have you tried to work around this bug and use different bonding modes ? Like round-robin for example, or active-passive failover mode just for debugging ?

Is the bug related to the 802.3ad mode you use, or to any bonding mode ?

Also, 802.3ad with lacp seems an extra complexity compared to simple round robin, any advantage why you picked this one ?

I’ve done similar configuration and faced difficulties configuring the virtual routers, would be interesting to see if they work in your setup.

Thanks,

K.

Hello Kita,

we now started to use elrepo kernels and do not have these issues anymore.

I am using 802.3ad + LACP for better fault tolerance. Both NICs are connected to separate switches, so I can loose one switch and my host remains online.

Hello tobx, I also used CentOS 7 in past, but we finally switched to Fedora 26 - newer kernel, never libvirt and qemu, newer corosync/pacemaker…I little bit hate el7 because of libqb bug in el7 releases prior to 7.3, which randomly hangs our cluster…I have to use backported fedora packages of libqb and also elrepo kernels… Now we are on fedora 26 and it is working good. Configured via ansible, so update will be simple.

Thank you Kristian. We will stay with centos as we have made best experience with it.