Skip to content

Instantly share code, notes, and snippets.

@Kamillaova
Last active November 17, 2025 14:06
Show Gist options
  • Select an option

  • Save Kamillaova/287c242c57aadc548efdb763243c13c4 to your computer and use it in GitHub Desktop.

Select an option

Save Kamillaova/287c242c57aadc548efdb763243c13c4 to your computer and use it in GitHub Desktop.
Mellanox Scalable Functions and vDPA Confguration Guide (CC0 license)

Mellanox Scalable Functions and vDPA Configuration Guide

Initial Configuration and Reset

By default, configuration only needs to be applied to a single Physical Function (PF). However, if you plan to use the esw_multiport feature (described in a later step), the mstconfig and commands below must be executed for each PF on the NIC.

Reset the PF configuration:

mstconfig -y -d <pci_address> reset

Configure it (LAG_RESOURCE_ALLOCATION=1 is only required when using esw_multiport or bonding):

mstconfig -y -d <pci_address> set LAG_RESOURCE_ALLOCATION=1 PF_BAR2_ENABLE=0 PER_PF_NUM_SF=1 PF_TOTAL_SF=65535 PF_SF_BAR_SIZE=6 SRIOV_EN=0
Note
Replace <pci_address> with the Bus-Device-Function identifier for the target PF. If using esw_multiport, repeat these steps for all PFs before rebooting.

After applying the commands, reboot the system to ensure the changes take effect.

Configuring Eswitch Mode

Set the embedded switch (eswitch) mode to switchdev:

devlink dev eswitch set pci/<pci_address> mode switchdev
Note
This command only needs to be run for each PF if esw_multiport will be enabled. Otherwise, you can run it only on one PF.

Optionally Enabling Multiport

To enable multiport mode, run the following command on the desired PF.

devlink dev param set pci/<pci_address> name esw_multiport value 1 cmode runtime
Important
If you enable esw_multiport, you must ensure the mstconfig and devlink dev eswitch commands in the previous sections were applied to all PFs on the NIC.

Creating and Configuring a Scalable Function

Create a Sclable Function (SF) on a PF:

devlink port add pci/<pci_address> flavour pcisf pfnum <pf_index> sfnum <sf_index>
Note
Replace <pf_index> with the index of the PF (e.g., 0). Replace <sf_index> with the desired SF index.

Set a MAC address for the newly created SF and activate it:

devlink port function set pci/<pci_address>/<sf_port_id> hw_addr <unicast_mac_address> state active
Note
<sf_port_id> is obtained from the output of the devlink port add command.

Finding the Auxiliary Device Index

Identify the auxiliary device index for the SF:

devlink dev | grep sf

Use this index (referred to as <aux_dev> below) for subsequent configuration steps.

Optionally Enabling vDPA

Enable vnet (vDPA) device for the SF (optional):

devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_vnet value 1 cmode driverinit

Optionally Enabling RDMA and ROCE

Enable Remote Direct Memory Access (RDMA) and RDMA over Converged Ethernet (ROCE) for the SF (optional):

devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_rdma value 1 cmode driverinit
devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_roce value 1 cmode driverinit

Optionally Enabling Ethernet port netdev (for host)

Enable netdev for the SF ("non-representor"), can be used from host (optional):

devlink dev param set auxiliary/mlx5_core.sf.<aux_dev> name enable_eth value 1 cmode driverinit
Note
This creates a netdev with 's' in name instead of 'sf' (udev).

Reloading the Driver and Verifying Setup

Reload the driver for the SF to apply the changes:

devlink dev reload auxiliary/mlx5_core.sf.<aux_dev>

Verify the vDPA management devices (when enable_vnet=1):

vdpa mgmtdev show

Verify RDMA device (when enable_rdma=1):

rdma dev

Verify non-representor netdev (when enable_eth=1):

ip link

Setting up a vDPA device

Create a vDPA device:

vdpa dev add name vdpa0 mgmtdev auxiliary/mlx5_core.sf.<aux_dev> mac <mac_address> max_vqp <max_vqp_count> mtu <mtu>

List all VDPA devices to confirm creation:

vdpa dev

Locate the character device file for use with QEMU:

file "/dev/$(echo /sys/bus/vdpa/devices/vdpa0/vhost-vdpa-* | rev | cut -d/ -f-1 | rev)"

Configuring the representor netdev

SF representor netdev is used to configure SF switching in eSwitch of the NIC. For example, you can create a bridge and set both SF representor netdev and PF netdev as masters of that bridge. (AFAIU You cannot add foreign (not from the same PHYSICAL NIC) netdevs as masters of this bridge because hardware offloading will not work)

Create a bridge

ip link add br0 type bridge

Set netdevs as masters of this bridge

ip link set dev nic0pf0 master br0
ip link set dev nic0pf0sf0 master br0

And the SF will be in the same L2 Domain as the phys. port.

Additional notes

Scalable Functions and and other related restrictions
  1. When using esw_multiport mode, PFs ports are "isolated" in the bridge. When using NPAR (NIC partitioning, e.g. when NUM_OF_PF > phys. port count), the paired (second) port is in the same bridge (without isolation) as the first port.

  2. The maximum number of SFs that can be created per PF (tested on Cx6 Dx) is 511. NPAR pairs (second ports) cannot have SFs, you cannot double the number of SFs.

  3. It is not possible to prevent MAC address changes for eth and vnet (vDPA) ports. You can only set the src mac filter for whole SF. See https://forums.developer.nvidia.com/t/338784.

  4. It is not possible to sniff internal traffic in the eSwitch. Needless to say, DSA-like tags are obviously out of the question. Debugging is almost impossible.

Scalable Functions bugs
  1. Minor SF RX packetloss when receiving packets from a phys. port. And it is 100% packetloss when jumbo frames are used. If packets come not from a phys. port, but from eSwitch (e.g. directly from PF), no packetloss is observed and jumbo frames work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment