How to request two different PCI devices with same vendor/class values

solved

(Álvaro Simón) #1

Hi all

We have the following question, PCI PT feature works great in OpenNebula and you get the right device just requesting the VENDOR/CLASS/DEVICE values but we found a problem in the scheduler using this procedure for some use cases.

In our case we want to use Infiniband PCI devices with an HA setup, these cards have several virtual functions assigned, in fact the IB mellanox device looks like this running onehost command:

  5e:00.1 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.2 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.3 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.4 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.5 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.6 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:00.7 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  5e:01.0 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]

In this use case we have included a second Infiniband card (same vendor and class) so we also get the values from onehost command and lspci but with a different address:

  d8:00.0 15b3:1013:0207 MT27700 Family [ConnectX-4]
  d8:00.1 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.2 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.3 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.4 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.5 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.6 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:00.7 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]
  d8:01.0 15b3:1014:0207 MT27700 Family [ConnectX-4 Virtual Function]

We want to add 2 PCI IB devices to our VM to use a HA setup with 2 different cards (these cards are connected to different switches just in case if one network goes down). That is fine because we can add several PCI sections to our VM template, but the problem is that is not possible to choose two different cards if they have the same VENDOR/CLASS/DEVICE values (https://docs.opennebula.org/5.6/deployment/open_cloud_host_setup/pci_passthrough.html) .

from the OpenNebula code it looks like the plugin just executes the lspci command and then the scheduler just picks up the first available address from the list (if it is not used by any VM).

It would be possible to use a round-robin mechanism for the PCI PT scheduler?
or just force the usage of an specific card just requesting the ADDRESS value directly from the PCI section as well (instead/plus vendor/class… values)?
This will help a lot for these HA cases.

I know that is not a regular use case, maybe only oneadmin should be able to request this but it could help for some use cases like this. When you have several cards in your hypervisor and you want to use a round-robin mechanism to use them from your VMs (as OpenNebula scheduler does to deploy VMs into different hyps)

Thanks a lot in advance!
Álvaro


(Álvaro Simón) #2

Hi, we have fixed this issue.
We have changed the pci.rb probe to change the PCI host behaviour a bit, so we have replaced the original vendor name in the second card (just for Mellanox devices in /var/lib/one/remotes/im/kvm-probes.d/pci.rb)

But we found an issue running the “updated” pci probe, onehost show was displaying the same values, the database was not changed after onehost sync execution.

We have removed and created the hyp again with onehost delete and onehost create to update the hyp PCI values with the new probe values.

We have opened a issue related with this: https://github.com/OpenNebula/one/issues/2574