[Solved] Help with 802.1Q VLAN

Hello,

Previously I had bridging working without a problem, but after setting up VLAN 802.1Q, following the deployment guide, all VMs attached to that vNet are not reachable from any host inside or outside our OpenNebula environment (and they’re getting their network set up without any trouble).
I did look for any post with my problem, but could not find any (somebody was having similar issues, but it was related to CentOS not handling well his LACP bond setup…).

Our environment:

  • OpeNebula 5.4.1

  • Ubuntu 16.04 LTS

    • 802.1Q module is loaded on the KVM host:
      root@xxxxxx:~# lsmod | grep 802
      8021q 32768 0
      garp 16384 1 8021q
      mrp 20480 1 8021q
  • vNet configuration:
    BRIDGE = "onebr10"
    DNS = "XX.XX.XX.XX"
    FILTER_IP_SPOOFING = "YES"
    FILTER_MAC_SPOOFING = "YES"
    GATEWAY = "XX.XX.XX.XX"
    GUEST_MTU = "1500"
    MTU = "1500"
    NETWORK_ADDRESS = "XX.XX.XX.XX"
    NETWORK_MASK = "XX.XX.XX.XX"
    PHYDEV = "bond0"
    SECURITY_GROUPS = "0"
    VLAN_ID = "481"
    VN_MAD = “802.1Q”

“bond0” => it’s a LACP aggregation that has been tested and it’s working no problem.

  • VM creation log:
    Wed Jan 31 12:17:38 2018 [Z0][VM][I]: New state is ACTIVE
    Wed Jan 31 12:17:38 2018 [Z0][VM][I]: New LCM state is PROLOG
    Wed Jan 31 12:17:43 2018 [Z0][VM][I]: New LCM state is BOOT
    Wed Jan 31 12:17:43 2018 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/89/deployment.0
    Wed Jan 31 12:17:43 2018 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: pre: Executed “sudo brctl addbr onebr10”.
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: pre: Executed “sudo ip link set onebr10 up”.
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: pre: Executed "sudo ip link add link bond0 name bond0.481 mtu 1500 type vlan id 481 ".
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: pre: Executed “sudo ip link set bond0.481 up”.
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: pre: Executed “sudo brctl addif onebr10 bond0.481”.
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: ExitCode: 0
    Wed Jan 31 12:17:44 2018 [Z0][VMM][I]: Successfully execute network driver operation: pre.
    Wed Jan 31 12:17:46 2018 [Z0][VMM][I]: ExitCode: 0
    Wed Jan 31 12:17:46 2018 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
    Wed Jan 31 12:17:46 2018 [Z0][VMM][I]: ExitCode: 0
    Wed Jan 31 12:17:46 2018 [Z0][VMM][I]: Successfully execute network driver operation: post.
    Wed Jan 31 12:17:46 2018 [Z0][VM][I]: New LCM state is RUNNING

  • Dynamic bridge created by OpenNebula on KVM host:
    root@xxxxxx:~# brctl show
    bridge name bridge id STP enabled interfaces
    onebr10 8000.6cae8b1edf1a no bond0.481
    one-89-0
    virbr0 8000.52540079c46e yes virbr0-nic

Could anybody point if I am missing anything, please??
Although the documentation says that I don’t have to do any extra configuration on the on the KVM host, should I actually do something else? Like enabling 802.1Q on the “bond0” interface?

Thanks a lot,

Alex


Versions of the related components and OS (frontend, hypervisors, VMs):

Steps to reproduce:

Current results:

Expected results:

Hi,
I am not a ubuntu specialist … But
If you “tcpdump -i one-89-0 -n -a -A”, do you see any packet from/to the host ? arp or Ip ?
If you ping from/to the vhost, do you see something ?
Have you enable “mac spoofing” on vnet ?
same question on virbr0 ?

Regards,
Nicolas.

Hello Nicolas,

Thanks for helping…! (Things are a “bit” busy here, sorry for the delay…)

ARP traffic seams to be going through, but not IP…:
root@xxxxxx:/proc/net/vlan# tcpdump -i one-89-0 -n -a -A
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on one-89-0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:20:54.136493 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:20:57.144571 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:20:58.144414 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:20:59.144385 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:21:02.149850 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:21:03.148319 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:21:04.148303 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…
09:21:07.155082 ARP, Request who-has xxx.xxx.xxx.254 tell xxx.xxx.xxx.34, length 28
…j.".j."…j…

No, the VMs are completely isolated. The only way to get to them is from their console.

Yes, “mac_spoofing is” enabled on the vnet. But for the life of me, I could not find how to check that on virbr0.
I just double-checked, and “net.ipv4.ip_forward” is set to “1” on “/etc/sysctl.conf”.

(I’m still having my head wrap around VLAN tagging, so forgive me if my next questions make no sense whatsoever.)
The switches’ interfaces where the OpenNebula environment is connected are already associated to an existing VLAN tag (481) which is the same I’m defining within my vNet. Should I be using a different/non-used tag number?

And again: do I have to change any configuration on the bond0 interface regarding enabling 802.1Q?

And again, Thanks…!

Alex

Sorry for the delay also :slight_smile:
I am using bond0 with 802.1Q driver on CentOS using:

DEVICE=bond0
BOOTPROTO=none
USERCTL=no
ONBOOT=yes
BONDING_OPTS="mode=active-backup miimon=100 primary=em1 primary_reselect=always"

One then create vlans over bond0:

onebr14         8000.90b11c4c6208       no              bond0.1006
                                                        vnet11
                                                        vnet14

If you set mac protection, perhaps you have a deny rule on ebtables ?

 ebtables -L

gives you the rules. Without any restriction, you shoud have:

Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 0, policy: ACCEPT

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

Talking about your switchs, you should have a ‘trunk’ configuration, not an access configuration. Then, you may set all vlans into the configuration, or just the required ones.
You can check also that on switch side, tcpdump is giving you some packets:

tcpdump -i bond0.481 -n -a -A

good investigations :slight_smile:

Hey Nicolas,

No worries at all, and thanks for keep trying to help…!

I bit of improvement here: the VMs can ping each other, but that’s it.

ebtables is not restricting anything, I had the same output as you posted.

I’m really start to think that I’m missing something very basic, given my lack of experience with network tagging… See what I learned so far (sorry, it may become a long post…)

If the network guys enable trunking on the switch interface where the KVM host is connected to and I don’t change anything for that host’s bonding, it will stop talking to the network. So, I had to modify the host’s /etc/network/interfaces file to look like this:

eno2 configuration

auto eno2
iface eno2 inet manual
bond-master bond0

eno3 configuration

auto eno3
iface eno3 inet manual
bond-master bond0

bond0 configuration

auto bond0
iface bond0 inet manual
bond-slaves eno2 eno3
bond-mode 4
bond-miimon 100
bond-lacp-rate fast
bond-downdelay 0
bond-updelay 0
bond-xmit_hash_policy 1

802.1Q access definition

auto vlan481
iface vlan481 inet static
vlan-raw-device bond0
address aaa.bbb.ccc.11
netmask 255.255.255.0
network aaa.bbb.ccc.0
broadcast aaa.bbb.ccc.255
gateway aaa.bbb.ccc.254
dns-nameservers xxx.xxx.xxx.xxx
dns-search

OK, the KVM host can talk to anybody now. But…

Here where I think it becomes confusing/wrong:

  • VLAN 481 already has network aaa.bbb.ccc.0 associated to it and is defined on both switches where the LACP bonding interfaces are connected to.
  • For now (until we need more IPs) I’m trying to use aaa.bbb.ccc.[1-32] for the physical servers in our environment and aaa.bbb.ccc.[33-244] for the 802.1Q vNet.

If I set the vNet like this:
VIRTUAL NETWORK TEMPLATE
BRIDGE="br481"
DNS="xxx.xxx.xxx"
FILTER_IP_SPOOFING="YES"
FILTER_MAC_SPOOFING="YES"
GATEWAY="aaa.bbb.ccc.254"
GUEST_MTU="1500"
MTU="1500"
NETWORK_ADDRESS="aaa.bbb.ccc.0"
NETWORK_MASK="255.255.255.0"
PHYDEV="vlan481"
SECURITY_GROUPS="0"
VLAN_ID="481"
VN_MAD=“802.1Q”

The bridge gets created on top of vlan481 interface, but VM creation fails on network setup (a message like “…file exists…”). That I’m associating to the fact that “481” is already defined elsewhere.

If I change VLAN_ID to “10”, which the Network guys told me the switches know about, the VMs get created no problem, but can only talk to each other. (They can’t even ping the default gateway.)

Should I actually use a different network on the 802.1Q vNet, say, “aaa.bbb.yyy.0”, for the VMs and ask the Network guys to associated it to VLAN 10?

On your bond0 set up, I didn’t notice any reference to 802.1Q. Didn’t you have to change anything on the OS side (other than loading the 802.1q module) after having trunk enabled on the switch’s interfaces?

Best regards,

Alex

try the vnet definition with

PHYDEV="bond0"

On my configuration, I have:
em1/em2 as master/slave on bond0 (not a lacp, but no importance I think now)
Then, my vnet is defined like:

BRIDGE="onebr14"
CONTEXT_FORCE_IPV4="yes"
DNS="192.0.2.254"
GATEWAY="192.0.2.254"
GATEWAY6=2001:db8::ffff"
NETWORK_ADDRESS="192.0.2.0"
NETWORK_MASK="255.255.255.0"
PHYDEV="bond0"
VLAN="YES"
VLAN_ID="1006"

(do not copy/paste, addresses are “documentation” addresses)

Hi Nicolas,

I tried that:

VIRTUAL NETWORK TEMPLATE
BRIDGE="onebr481"
DNS="xxx.xxx.xxx.xxx"
FILTER_IP_SPOOFING="YES"
FILTER_MAC_SPOOFING="YES"
GATEWAY="aaa.bbb.ccc.254"
GUEST_MTU="1500"
MTU="1500"
NETWORK_ADDRESS="aaa.bbb.ccc.0"
NETWORK_MASK="255.255.255.0"
PHYDEV="bond0"
SECURITY_GROUPS="0"
VLAN_ID="481"
VN_MAD=“802.1Q”

But VM creation fails:
pre: Executed “sudo brctl addbr onebr481”.
pre: Executed “sudo ip link set onebr481 up”.
pre: Command "sudo ip link add link bond0 name bond0.481 mtu 1500 type vlan id 481 " failed.
pre: RTNETLINK answers: File exists
RTNETLINK answers: File exists

ExitCode: 2
Failed to execute network driver operation: pre.
Error deploying virtual machine: 802.1Q: RTNETLINK answers: File exists
New LCM state is BOOT_FAILURE

Could it be because the KVM host already has “vlan481” interface attached to VLAN 481? (I’m grasping at straws here…)

root@yyyy:~# ls /proc/net/vlan
config vlan481
root@yyyy:~# cat /proc/net/vlan/vlan481
vlan481 VID: 481 REORDER_HDR: 1 dev->priv_flags: 1001
total frames received 169785
total bytes received 344952036
Broadcast/Multicast Rcvd 8757

  total frames transmitted        30604
   total bytes transmitted    338087205

Device: bond0
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
EGRESS priority mappings:
root@yyyy:~#

Best regards,

Alex

Hello,

It’s working…! VMs can go anywhere in our network.
But I’m still open to comments/suggestions, as I’m not sure if this actually is the best/scalable solution, because we’re expecting this POC environment to get much bigger when it goes into production…

So, I changed the KVM node /etc/network/interfaces configuration so it’s looking like this:

eno2 configuration

auto eno2
iface eno2 inet manual
bond-master bond0

eno3 configuration

auto eno3
iface eno3 inet manual
bond-master bond0

bond0 configuration

auto bond0
iface bond0 inet manual
bond-slaves eno2 eno3
bond-mode 4
bond-miimon 100
bond-lacp-rate fast
bond-downdelay 0
bond-updelay 0
bond-xmit_hash_policy 1

local interface for vlan 481

auto bond0.481
iface bond0.481 inet static
vlan-raw-device bond0

bridge for vlan 481

auto br481
iface br481 inet static
bridge_ports bond0.481
address aaa.bbb.ccc.11
network aaa.bbb.ccc.0
netmask 255.255.255.0
broadcast aaa.bbb.ccc.255
gateway aaa.bbb.ccc.254
dns-nameservers xxx.xxx.xxx.xxx
dns-search xxx.xxx.xxx.xxx
bridge_hello 2
bridge_maxage 12
bridge_stp off
bridge_fd 9

And for the vNet configuration, I gave it the already tagged bridge, br481, and did not assigned any physical device to it:

VIRTUAL NETWORK TEMPLATE
BRIDGE="br481"
DNS="xxx.xxx.xxx.xxx"
FILTER_IP_SPOOFING="YES"
FILTER_MAC_SPOOFING="YES"
GATEWAY="aaa.bbb.ccc.254"
GUEST_MTU="1500"
MTU="1500"
NETWORK_ADDRESS="aaa.bbb.ccc.0"
NETWORK_MASK=“255.255.255.0"
PHYDEV=”"
SECURITY_GROUPS="0"
VLAN_ID="481"
VN_MAD=“802.1Q”

My goal now is trying to improve the VMs network performance: when I ping to/from the VMs I see trafic going twice slower than when I ping to/from the KVM node.

Nicolas, thank you very much form your insights and comments…!

Alex

Hi, Alex… good to see it working :slight_smile:

Just a question.
The ‘switch’ port you have is a trunk vlan, with only one member ? right ?
Why don’t you have a ‘simple’ access port, untagged on the right vlan ?
The configuration should be simpler.

so, you can have:

  • 1 access port (vlan 481) untagged and build a br481 on top of the bond interface
    or
  • 1 trunk port, tagged port, and vlan tagging on the top of the (trunk) bond interface

If you manage only one vlan, the first is easier.
If you plan to manage several vlans, the second is right.
You can then handle many vlans (bond0.XX), and separate vm from each other.

I have deployed the same configuration in the past, successfully.
Now, I am using openvswitch, because I am planning to use vxlan and to manage more complex network configuration on vlans (mirroring ports, …)

finally, you should take a look at /var/log/one/oned.log on debug mode to trace commands that are called when instantiating VNET/VM to understand more precisely how it is created and what is the expected ‘start’ state of the vnet :wink:

Hey Nicolas,

Thanks! I’ll give that a try and see how it goes…

One restriction I think I could face is the fact that the KVM host interface has to be tagged to 481. Otherwise it can’t talk to anybody.

Regards,

Alex