VXLAN, 2 KVM Hosts, 50K multicasts/second

(Spencer) #1

Hello!

TL;DR:
Using 1 VXLAN across 2 KVM hosts causes instability due to high multicast packet volume. Is this normal, or just me?

Physical Environment:
-1 CentOS Linux release 7.6.1810 Front end server mode4/802.3ad/lacp bond
-2 CentOS Linux release 7.6.1810 KVM Hosts mode4/802.3ad/lacp bond
-1 Windows Server 2016 Hyper-V host for support VMs (remote access etc) NIC Teaming (Aggregate)
NICs for ONe nodes are Mellanox ConnectX-4 and ConnectX-2 for the Hyper-V host. (10 GB/S interfaces and switch)

OpenNebula Version:
5.4.13 #This issue also occurred on 5.6. I rolled back to practice the upgrade process

Networking:
Bond Creation:
nmcli con add type bond con-name bond0 ifname bond0 mode 4 ipv4.method disabled ipv6.method ignore
nmcli con add type bond-slave ifname enp2s0f0 bond0
nmcli con add type bond-slave ifname enp2s0f0 master bond0
nmcli con add type bond-slave ifname enp2s0f1 master bond0
nmcli con up bond-slave-enp2s0f0
nmcli con up bond-slave-enp2s0f1
nmcli con up bond0

KVM Hosts have 4 Tagged Bridges created via the following commands:
‘nmcli c add type bridge con-name br1002 ifname br1002 ipv4.method manual ipv6.method ignore ’ #Management Network for internet and Remote Access
‘nmcli c add type bridge con-name br1003 ifname br1003 ipv4.method disabled ipv6.method ignore’ #For ONe Virtual Networks
‘nmcli c add type bridge con-name br1004 ifname br1004 ipv4.method manual ipv6.method ignore ’ #For Ceph Cluster
‘nmcli c add type bridge con-name br1003 ifname br1003 ipv4.method disabled ipv6.method ignore’ #For external facing IPs for tenants’ router/firewall’
‘nmcli con add type vlan ifname bond0.1002 dev bond0 id 1002 master br1002 slave-type bridge’ #updated for each bridge

This followed the guide here:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-vlan_on_bond_and_bridge_using_the_networkmanager_command_line_tool_nmcli

I then created a VXLAN via the ONe web ui.

[root@cloud1 ~]# su oneadmin -c ‘onevnet list’
ID USER GROUP NAME CLUSTERS BRIDGE LEASES
0 oneadmin oneadmin vxlan-upgrade 0 br1003 6
6 oneadmin oneadmin edge 0 br1011 1

[root@cloud1 ~]# su oneadmin -c ‘onevnet show 0’
VIRTUAL NETWORK 0 INFORMATION
ID : 0
NAME : vxlan-upgrade
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
BRIDGE : br1003
VN_MAD : vxlan
PHYSICAL DEVICE: bond0.1003
VLAN ID : 2
USED LEASES : 6

PERMISSIONS
OWNER : um-
GROUP : —
OTHER : —

VIRTUAL NETWORK TEMPLATE
BRIDGE=“br1003”
GATEWAY=“10.0.0.1”
NETWORK_ADDRESS=“10.0.0.0”
NETWORK_MASK=“255.255.255.0”
PHYDEV=“bond0.1003”
SECURITY_GROUPS=“0”
VN_MAD=“vxlan”

ADDRESS RANGE POOL
AR 0
SIZE : 100
LEASES : 6

RANGE FIRST LAST
MAC 02:00:0a:00:00:01 02:00:0a:00:00:64
IP 10.0.0.1 10.0.0.100

LEASES
AR OWNER MAC IP IP6
0 V:184 02:00:0a:00:00:01 10.0.0.1 -
0 V:185 02:00:0a:00:00:02 10.0.0.2 -
0 V:186 02:00:0a:00:00:03 10.0.0.3 -
0 V:187 02:00:0a:00:00:04 10.0.0.4 -
0 V:188 02:00:0a:00:00:05 10.0.0.5 -
0 V:178 02:00:0a:00:00:06 10.0.0.6 -

VIRTUAL ROUTERS
6

[root@cloud3 ~]# brctl showmacs br1003
port no mac addr is local? ageing timer
1 00:01:e8:8b:2e:9c no 1.78
6 02:00:0a:00:00:01 no 0.88
7 02:00:0a:00:00:02 no 0.87
4 02:00:0a:00:00:03 no 0.87
5 02:00:0a:00:00:04 no 2.94
3 02:00:0a:00:00:05 no 0.88
8 02:00:0a:00:00:06 no 1.30
2 e2:58:51:a9:8f:a6 yes 0.00
2 e2:58:51:a9:8f:a6 yes 0.00
1 ec:0d:9a:9c:79:52 yes 0.00
1 ec:0d:9a:9c:79:52 yes 0.00
1 ec:0d:9a:9c:79:5e no 0.00
6 fe:00:0a:00:00:01 yes 0.00
6 fe:00:0a:00:00:01 yes 0.00
7 fe:00:0a:00:00:02 yes 0.00
7 fe:00:0a:00:00:02 yes 0.00
4 fe:00:0a:00:00:03 yes 0.00
4 fe:00:0a:00:00:03 yes 0.00
5 fe:00:0a:00:00:04 yes 0.00
5 fe:00:0a:00:00:04 yes 0.00
3 fe:00:0a:00:00:05 yes 0.00
3 fe:00:0a:00:00:05 yes 0.00
8 fe:00:0a:00:00:06 yes 0.00
8 fe:00:0a:00:00:06 yes 0.00

Currently all VMs are on this KVM host.

[root@cloud3 ~]# nmcli c
NAME UUID TYPE DEVICE
bond0 9154687c-309a-4ccb-aa2c-03b212b6a9a1 bond bond0
bond0.1003.2 881ba273-627c-40cf-9d94-3d0e597d7be0 vxlan bond0.1003.2
bond-slave-enp2s0f0 fbf1fc1a-3af8-4315-b705-3fefb8cbdc63 ethernet enp2s0f0
bond-slave-enp2s0f1 79494087-b028-4f54-96c8-cec802bd29f3 ethernet enp2s0f1
br1002 07f971c9-94bd-41e4-917c-278cf546740b bridge br1002
br1003 169e3e6d-9398-4694-90c4-7752277236c0 bridge br1003
br1004 59e8e5a6-c465-445b-8a5b-84c1ccb9ce3b bridge br1004
br1011 0915a3c2-649a-4fa2-9fd0-0627c14abcbc bridge br1011
bridge-slave-bond0.1002 c2160853-0c08-4df8-bb3a-4edb579f148e vlan bond0.1002
bridge-slave-bond0.1003 40a8e165-aad0-4669-a777-0ced65d8ca4d vlan bond0.1003
bridge-slave-bond0.1004 21d6ad24-1343-44ca-9aae-0765a2aa616b vlan bond0.1004
bridge-slave-bond0.1011 98f2ec39-03ad-4fe0-b2fa-97f9e1adfeff vlan bond0.1011
one-178-0 f813a77d-4fec-4955-b566-c0ba266c43f9 tun one-178-0
one-184-0 3e77c05b-e509-4453-b695-7c7954e92a43 tun one-184-0
one-184-1 0e04bb78-bb3c-418b-ab08-d1092d1501c6 tun one-184-1
one-185-0 4fc12c26-5a9f-4662-8ff0-3b14f2e5f4eb tun one-185-0
one-186-0 36c521ec-f215-44af-84ce-b81acfd99c56 tun one-186-0
one-187-0 0bbf24cf-2dcf-4047-9641-9c72bba6cc55 tun one-187-0
one-188-0 b33a7c79-6d57-46da-b6f5-ba1d5c0d07e4 tun one-188-0
enp2s0f0 abec93f5-45b0-4809-aae0-3687f7a929b5 ethernet –
enp2s0f1 7d570c77-11f3-4df1-8b26-229531766edb ethernet –
enp4s0f0 76ef8b34-4b97-42d6-bfc2-970ecd5b046b ethernet –
enp4s0f1 3bdc9e25-ef49-4d94-b326-8d4545a72d60 ethernet –
enp4s0f2 722e5282-bc81-47bd-ab14-0b81d184d1ad ethernet –
enp4s0f3 16adcdcc-8958-434e-a0f1-54691eca525e ethernet –

Virtual Machines:
one-184 is a router with 2 vNICS, 1 on the Edge and 1 on the VXLAN performing NAT.
one-178-0 is a windows server which performed ceph testing.
one-187-0 is a Zabbix Monitoring Server, which is pinging out to google every 5 seconds.
The rest are Zabbix Monitoring Agents, which perform some data transfer every half hour to update what to monitor, as well as a 5 second server->agent ping.
These are not generating 50K+ multicast packets / second.

Problems that occured which caused investigation:
The switch that I use for ONe was trunked to another switch which is for a HyperV lab cluster. One day the HyperV lab cluster’s networking became interrupted. I executed systemctl restart network after deleting my ONe bridges and the network stabilized. We then isolated our ONe test environment. The current issue is when I tried to create another VM on my hyper-v host. The moment I mounted vNICs to the vm the hyperv host locked up. Disconnecting the physical networking from the server made the host operable.

Investigative Findings:
My coworker ran a packet counter on the switch and described 500K/Second multicasts being transmitted through the LACP port channels associated with my KVM hosts.
On the switch we mirrored 1 port channel for a laptop to run a wireshark capture.

Some context for the following PDFs:
At first my VMs were balanced across my KVM hosts. I migrated all VMs to a second KVM host and then executed on the empty host: nmcli c down br1003; nmcli c up br1003. This cleared the multicasts and provided a baseline.
Then I live migrated 1 VM to a second KVM host, and then ran a 1 minute packet capture to measure Quantity of Packets / Time.
Each subsequent test is an additional VM.
I then finished with migrating all VMs back to the original KVM host, and running a packet capture. The main thing here is that the multicasts were still casting.


https://www.scribd.com/document/401908515/1vm1min
https://www.scribd.com/document/401908553/2vm1min
https://www.scribd.com/document/401908560/3vm1min
https://www.scribd.com/document/401908569/Post-Migrate-1-Min

TCPdump on the KVM host #tcmpdump -n multicast shows packets of the following:

IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .44278 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP > 239.0.0.2: ip-proto-17
15:01:13.059226 IP .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .38244 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP truncated-ip - 50 bytes missing! .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP truncated-ip - 50 bytes missing! .38244 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2

Related thoughts/questions/ignorance:
-I do not think I have a loop as the volume is not rising exponentially.
-Do I need to provide an IP on my VLAN tagged bridge that I will be using for my ONe VNets? Traffic is passing between my VMs on different hosts without it, but Not getting traffic through vxlan bridge states that I should. I am confused!

VXLAN as it is now is taking up an insane amount of bandwidth. Does anybody have any config or troubleshooting ideas?

(Spencer) #2

Resolved this issue by upgrading the kernel to the elrepo LT 4.4 kernel.

If anyone has any ideas where to look / how I should submit a bug report to the centos community, that would be super helpful.

(David) #3

Hello, I’m fairly new to VxLAN but I think this is the expected behavior of this technology.

Each of the packages sent by your virtual machine causes a multicast send to find out the destination VTEP.
To solvert this it is recommended to use a switch that supports BGP EVPN technology and can recognize the routes without the need to flood the network with multicast.
You can take a look at a recent post in the pose questions about the suitability of implementing these technologies and how to do it.

As a source of information about how this protocol works (VxLAN) you can see a series of videos in the following url:
Vxlan Videos

In particular I think it will be very interesting to see the number 4 of the series.

Greetings.