By default, configuration only needs to be applied to a single Physical Function (PF). However, if you plan to use the esw_multiport feature (described in a later step), the mstconfig and commands below must be executed for each PF on the NIC.
Reset the PF configuration:
mstconfig -y -d <pci_address> resetConfigure it (LAG_RESOURCE_ALLOCATION=1 is only required when using esw_multiport or bonding):
mstconfig -y -d <pci_address> set LAG_RESOURCE_ALLOCATION=1 PF_BAR2_ENABLE=0 PER_PF_NUM_SF=1 PF_TOTAL_SF=65535 PF_SF_BAR_SIZE=6 SRIOV_EN=0|
Note
|
Replace <pci_address> with the Bus-Device-Function identifier for the target PF. If using esw_multiport, repeat these steps for all PFs before rebooting.
|
After applying the commands, reboot the system to ensure the changes take effect.
Set the embedded switch (eswitch) mode to switchdev:
devlink dev eswitch set pci/<pci_address> mode switchdev|
Note
|
This command only needs to be run for each PF if esw_multiport will be enabled. Otherwise, you can run it only on one PF.
|
To enable multiport mode, run the following command on the desired PF.
devlink dev param set pci/<pci_address> name esw_multiport value 1 cmode runtime|
Important
|
If you enable esw_multiport, you must ensure the mstconfig and devlink dev eswitch commands in the previous sections were applied to all PFs on the NIC.
|
Create a Sclable Function (SF) on a PF:
devlink port add pci/<pci_address> flavour pcisf pfnum <pf_index> sfnum <sf_index>|
Note
|
Replace <pf_index> with the index of the PF (e.g., 0). Replace <sf_index> with the desired SF index.
|
Set a MAC address for the newly created SF and activate it:
devlink port function set pci/<pci_address>/<sf_port_id> hw_addr <unicast_mac_address> state active|
Note
|
<sf_port_id> is obtained from the output of the devlink port add command.
|
Identify the auxiliary device index for the SF:
devlink dev | grep sfUse this index (referred to as <aux_dev> below) for subsequent configuration steps.
Enable vnet (vDPA) device for the SF (optional):
devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_vnet value 1 cmode driverinitEnable Remote Direct Memory Access (RDMA) and RDMA over Converged Ethernet (ROCE) for the SF (optional):
devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_rdma value 1 cmode driverinit
devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_roce value 1 cmode driverinitEnable netdev for the SF ("non-representor"), can be used from host (optional):
devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_eth value 1 cmode driverinit|
Note
|
This creates a netdev with 's' in name instead of 'sf' (udev). |
Reload the driver for the SF to apply the changes:
devlink dev reload auxiliary/mlx5_core.sf.<aux_dev>Verify the vDPA management devices (when enable_vnet=1):
vdpa mgmtdev showVerify RDMA device (when enable_rdma=1):
rdma devVerify non-representor netdev (when enable_eth=1):
ip linkCreate a vDPA device:
vdpa dev add name vdpa0 mgmtdev auxiliary/mlx5_core.sf.<aux_dev> mac <mac_address> max_vqp <max_vqp_count> mtu <mtu>List all VDPA devices to confirm creation:
vdpa devLocate the character device file for use with QEMU:
file "/dev/$(echo /sys/bus/vdpa/devices/vdpa0/vhost-vdpa-* | rev | cut -d/ -f-1 | rev)"SF representor netdev is used to configure SF switching in eSwitch of the NIC. For example, you can create a bridge and set both SF representor netdev and PF netdev as masters of that bridge. (AFAIU You cannot add foreign (not from the same PHYSICAL NIC) netdevs as masters of this bridge because hardware offloading will not work)
Create a bridge
ip link add br0 type bridgeSet netdevs as masters of this bridge
ip link set dev nic0pf0 master br0
ip link set dev nic0pf0sf0 master br0And the SF will be in the same L2 Domain as the phys. port.
-
When using
esw_multiportmode, PFs ports are "isolated" in the bridge. When using NPAR (NIC partitioning, e.g. whenNUM_OF_PF> phys. port count), the paired (second) port is in the same bridge (without isolation) as the first port. -
The maximum number of SFs that can be created per PF (tested on Cx6 Dx) is 511. NPAR pairs (second ports) cannot have SFs, you cannot double the number of SFs.
-
It is not possible to prevent MAC address changes for eth and vnet (vDPA) ports. You can only set the src mac filter for whole SF. See https://forums.developer.nvidia.com/t/338784.
-
It is not possible to sniff internal traffic in the eSwitch. Needless to say, DSA-like tags are obviously out of the question. Debugging is almost impossible.
-
Minor SF RX packetloss when receiving packets from a phys. port. And it is 100% packetloss when jumbo frames are used. If packets come not from a phys. port, but from eSwitch (e.g. directly from PF), no packetloss is observed and jumbo frames work fine.