Skip to content

Instantly share code, notes, and snippets.

@Chester-Gillon
Last active October 27, 2025 02:59
Show Gist options
  • Select an option

  • Save Chester-Gillon/5e8836172164f5ec42b9778a088c300f to your computer and use it in GitHub Desktop.

Select an option

Save Chester-Gillon/5e8836172164f5ec42b9778a088c300f to your computer and use it in GitHub Desktop.
Raw Ethernet packet pacing on a ConnectX-4 Lx

0. Introduction

Hardware packet-pacing could be useful for the switch test to get a deterministic transmit rate avoiding software variation, and potentially allow the software to queue a number of packets in one go that are then allow the hardware to output at the requested rate.

Mellanox documentation:

  1. Raw Ethernet Programming: Packet Pacing - Code Example
  2. HowTo Configure Packet Pacing on ConnectX-4. Says for a ConnectX-4 and ConnectX-4 Lx.
  3. Supported Non-Volatile Configurations from relese notes from the 14.32.1010 firmware in use. Nothing in the list appears related to packet pacing.
  4. Rate Limit in the MLNX_OFED v5.4-3.1.0.0 documentation contains:

    Rate limit defines a maximum bandwidth allowed for a TC. Please note that 10% deviation from the requested values is considered acceptable.

For this a looking at ConnectX-4 Lx dual-port 10GbE MCX4121A-XCAT with 14.32.1010 firmware. Running AlmaLinux 8.5.

1. ibv_devinfo initially reports packet pacing as not supported

Before attempting any configuration changes the ConnectX-4 Lx wasn't reported as support packet pacing in the ibv_devinfo output:

$ ibv_devinfo -v
  <snip>
	packet_pacing_caps:
		qp_rate_limit_min:	0kbps
		qp_rate_limit_max:	0kbps

2. Saving configuration state before attempting any changes

Saving ibv_devinfo output:

[mr_halfword@haswell-alma connectx_4_lx_config]$ ibv_devinfo -v > before_ibv_devinfo.log

Saving mtf configuration:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mst start
[sudo] password for mr_halfword: 
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4117_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 q > before_mlxconfig_query.log

Attempting to take a backup reports no TLV were found, and no backup file is created:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 -f backup.conf backup
Collecting...
No TLVs were found.

3. Getting lost of all TLVs

The following generates a list of all TLVs, without requiring a device so not sure which TLVs are supported by which devices:

[mr_halfword@haswell-alma connectx_4_lx_config]$ mlxconfig gen_tlvs_file all_tlvs.txt
Saving output...
Done!

The generated file contains:

nv_packet_pacing                                  0

4. Finding the names for TLVs

The Firmware Activation section of HowTo Configure Packet Pacing on ConnectX-4 gives commands to create a raw TLV file to activate packet pacing in the firmware, the significance of the hex number isn't defined. Create a text file as per Activate Packet Pacing in the Firmware:

[mr_halfword@haswell-alma connectx_4_lx_config]$ echo "MLNX_RAW_TLV_FILE" > mlxconfig_raw_pacing.txt
[mr_halfword@haswell-alma connectx_4_lx_config]$ echo "0x00000004 0x0000010c 0x00000000 0x00000001" >> mlxconfig_raw_pacing.txt 

Convert the raw text TLV to XML:

[mr_halfword@haswell-alma connectx_4_lx_config]$ mlxconfig raw2xml mlxconfig_raw_pacing.txt mlxconfig_raw_pacing.xml
Saving output...
Done!

Which has the contents:

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://www.mellanox.com/config">
<nv_packet_pacing ovr_en='0' rd_en='0' writer_id='0'>

	<!-- Legal Values: False/True -->
	<packet_pacing>True</packet_pacing>

	<!-- Legal Values: False/True -->
	<lag_disable>False</lag_disable>

</nv_packet_pacing>

I.e. does enable packing_pacing, and also has a <lag_disable> field.

5. Set raw configuration

Set the raw configuration file created above, which reports information about the one raw TLV being set:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 -f mlxconfig_raw_pacing.txt set_raw
Raw TLV #1 Info:
Length: 0x4
Version: 0
OverrideEn: 0
Type: 0x0000010c
Data: 0x00000001 


 Operation intended for advanced users.
 Are you sure you want to apply raw TLV file? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

6. Changes in reported configuration after setting raw configuration

Rebooted the PC after the above set_raw operation. ibv_devinfo is now reporting packet pacing as supported for both devices (one device per port):

[mr_halfword@haswell-alma connectx_4_lx_config]$ ibv_devinfo -v > after_ibv_devinfo.log 
[mr_halfword@haswell-alma connectx_4_lx_config]$ diff before_ibv_devinfo.log after_ibv_devinfo.log 
99,100c99,102
< 		qp_rate_limit_min:	0kbps
< 		qp_rate_limit_max:	0kbps
---
> 		qp_rate_limit_min:	1kbps
> 		qp_rate_limit_max:	10000000kbps
> 		supported_qp:
> 					SUPPORT_RAW_PACKET
231,232c233,236
< 		qp_rate_limit_min:	0kbps
< 		qp_rate_limit_max:	0kbps
---
> 		qp_rate_limit_min:	1kbps
> 		qp_rate_limit_max:	10000000kbps
> 		supported_qp:
> 					SUPPORT_RAW_PACKET

There was no change to the configuration reported by a mlxconfig query:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mst start
[sudo] password for mr_halfword: 
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 q > after_mlxconfig_query.log
[mr_halfword@haswell-alma connectx_4_lx_config]$ diff before_mlxconfig_query.log after_mlxconfig_query.log 

And a mlxconfig backup now obtains the TLVs which were set:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 -f backup.conf backup
Collecting...
Saving output...
Done!
[mr_halfword@haswell-alma connectx_4_lx_config]$ cat backup.conf 
MLNX_RAW_TLV_FILE
% TLV Type: 0x0000010c, Writer ID: ICMD MLXCONFIG SET RAW(0x0c), Writer Host ID: 0x00
0x000c0004 0x0000010c 0x00000000 0x00000001 

7. Initial tests on packet pacing

The ibv_raw_packet_tx program was used to test the effect of packet pacing.

When capturing packets in Wireshark to look at the timestamps set a mirror port in the T1700G-28TQ switch, and re-directed rx only packets to the mirror port.

7.1 Burst of packets

For this configuration:

  1. packet_pacing enabled in the ConnectX-4 Lx by use of a TLV type 0x0000010c.
  2. mlxconfig reported the default of ACCURATE_TX_SCHEDULER False(0)
  3. ibv_raw_packet_tx only using IBV_QP_RATE_LIMIT.

Ran the test at a low rate to see the inter-packet gap:

[mr_halfword@haswell-alma release]$ ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1000
Send 1536 frames over 15.692908 secs, average 97.9 Hz

The wireshark packet shows bursts of packets:

  • 43 packets back-to-back in ~160 microseconds
  • Gap between bursts of ~537254 microseconds

7.2. ibv_modify_qp_rate_limit failing due to lack of burst support

The ibv_raw_packet_tx program was modified to have an option to call ibv_modify_qp_rate_limit to try and control the burst rate, but the call failed with EINVAL. The error was from the following in mlx5_modify_qp_rate_limit:

	if ((attr->max_burst_sz ||
	     attr->typical_pkt_sz) &&
	    (!attr->rate_limit ||
	     !(mctx->packet_pacing_caps.cap_flags &
	       MLX5_IB_PP_SUPPORT_BURST)))
		return EINVAL;

The contents of the packet_pacing_caps shown in the debugger were:

packet_pacing_caps	struct mlx5_packet_pacing_caps	{...}	
	qp_rate_limit_min	__u32	1	
	qp_rate_limit_max	__u32	10000000	
	supported_qpts	__u32	256	
	cap_flags	__u8	0 '\0'	

I.e. the cap_flags doesn't have MLX5_IB_PP_SUPPORT_BURST (1).

It is the mlx5_ib_query_device function in the AlmaLinux 8.5 linux-4.18.0-348.20.1.el8.x86_64/drivers/infiniband/hw/mlx5/main.c kernel file which sets MLX5_IB_PP_SUPPORT_BURST:

		if (MLX5_CAP_QOS(mdev, packet_pacing) &&
		    MLX5_CAP_GEN(mdev, qos)) {
			resp.packet_pacing_caps.qp_rate_limit_max =
				MLX5_CAP_QOS(mdev, packet_pacing_max_rate);
			resp.packet_pacing_caps.qp_rate_limit_min =
				MLX5_CAP_QOS(mdev, packet_pacing_min_rate);
			resp.packet_pacing_caps.supported_qpts |=
				1 << IB_QPT_RAW_PACKET;
			if (MLX5_CAP_QOS(mdev, packet_pacing_burst_bound) &&
			    MLX5_CAP_QOS(mdev, packet_pacing_typical_size))
				resp.packet_pacing_caps.cap_flags |=
					MLX5_IB_PP_SUPPORT_BURST;

Where MLX5_IB_PP_SUPPORT_BURST is set when both the packet_pacing_burst_bound and packet_pacing_typical_size fields read from the device are non-zero.

7.3. mlxconfig parameters which could affect burst support

Generated a description of the available configuration parameters for the ConnectX-4 Lx device:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 show_confs | cat > show_confs.txt

Ones which look like they could impact burst support:

ACCURATE_TX_SCHEDULER=<False|True> When TRUE, the device will optimize the transmit scheduler for high accuracy. may hurt performance When False, the device defaults will apply for the scheduler.

TX_SCHEDULER_BURST=<NUM> Log (base2) of the transmission scheduler default burst size, given in bytes, Value 0x0 indicates using device defaults.

7.4. Enabling accurate tx scheduler

Attempted to enable the accurate tx scheduler with:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 set ACCURATE_TX_SCHEDULER=1

Device #1:
----------

Device type:    ConnectX4LX     
Name:           MCX4121A-XCA_Ax 
Description:    ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:         /dev/mst/mt4117_pciconf0

Configurations:                              Next Boot       New
         ACCURATE_TX_SCHEDULER               False(0)        True(1)         

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

After a reset there was no change in the ibv_devinfo output:

[mr_halfword@haswell-alma connectx_4_lx_config]$ ibv_devinfo -v> after_tx_accurate_scheduler_devinfo.log
[mr_halfword@haswell-alma connectx_4_lx_config]$ diff after_ibv_devinfo.log after_tx_accurate_scheduler_devinfo.log 

And a mlxconfig query reported the expected difference:

[mr_halfword@haswell-alma connectx_4_lx_config]$ sudo mlxconfig  -d /dev/mst/mt4117_pciconf0 q|cat > after_tx_accurate_scheduler_mlxconfig.log 
[mr_halfword@haswell-alma connectx_4_lx_config]$ diff after_mlxconfig_query.log  after_tx_accurate_scheduler_mlxconfig.log 
31c31
<          ACCURATE_TX_SCHEDULER               False(0)        
---
>          ACCURATE_TX_SCHEDULER               True(1)       

Testing the effect of the change:

  1. The ibv_modify_qp_rate_limit call still failed with EINVAL as mctx->packet_pacing_caps.cap_flags was still zero.
  2. Initially thought that no difference in the gap between packets, but on further experiments seems senstive to the MTU set on the Ethernet device.

8. Added printf debugging to mlx5_core driver for packet pacing information

With the in-box driver in AlmaLinux 8.5 the following is reported for each ConnectX-4 Lx device in the dmesg output:

[    4.029819] mlx5_core 0000:01:00.0: Rate limit: 13 rates are supported, range: 0Mbps to 9765Mbps

Where that message is from the mlx5_init_rl_table function in the drivers/net/ethernet/mellanox/mlx5/core/rl.c source file, which reads the device capabilities and divides by 1024 to change the packet pacing rates from Kbps to approximately MiBps:

	/* First entry is reserved for unlimited rate */
	table->max_size = MLX5_CAP_QOS(dev, packet_pacing_rate_table_size) - 1;
	table->max_rate = MLX5_CAP_QOS(dev, packet_pacing_max_rate);
	table->min_rate = MLX5_CAP_QOS(dev, packet_pacing_min_rate);

	mlx5_core_info(dev, "Rate limit: %u rates are supported, range: %uMbps to %uMbps\n",
		       table->max_size,
		       table->min_rate >> 10,
		       table->max_rate >> 10);

In order to load a mlx5_core module instrumented with the raw packet pacing fields from the device:

  1. Had performed the steps in 18. Install Kernel source which prepared the Kernel source for AlmaLinux 8.5
  2. Took a copy of the prepared Kernel source:
[mr_halfword@haswell-alma ~]$ cd ~
[mr_halfword@haswell-alma ~]$ cp -pr ~/rpmbuild/BUILD/kernel-4.18.0-348.20.1.el8_5 ~/mlx5_packet_pacing_diags
  1. Using How to recompile just a single kernel module? as a guide used the copy of the Kernel source tree to build just the mlx5_core module. Change to root of the source tree:
[mr_halfword@haswell-alma ~]$ cd ~/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/
  1. Import the configuration from the running Kernel:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ cp /boot/config-`uname -r` .config
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make oldconfig
scripts/kconfig/conf  --oldconfig Kconfig
#
# configuration written to .config
#
  1. Make scripts, prepare and modules_prepare:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make scripts
scripts/kconfig/conf  --syncconfig Kconfig
  HOSTCC  scripts/basic/bin2c
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/poll.h
  WRAP    arch/x86/include/generated/uapi/asm/socket.h
  WRAP    arch/x86/include/generated/asm/dma-contiguous.h
  WRAP    arch/x86/include/generated/asm/early_ioremap.h
  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
  WRAP    arch/x86/include/generated/asm/mm-arch-hooks.h
  WRAP    arch/x86/include/generated/asm/mmiowb.h
  HOSTCC  scripts/genksyms/genksyms.o
  YACC    scripts/genksyms/parse.tab.c
  HOSTCC  scripts/genksyms/parse.tab.o
  LEX     scripts/genksyms/lex.lex.c
  YACC    scripts/genksyms/parse.tab.h
  HOSTCC  scripts/genksyms/lex.lex.o
  HOSTLD  scripts/genksyms/genksyms
  UPD     include/generated/uapi/linux/version.h
  CC      scripts/mod/empty.o
  HOSTCC  scripts/mod/mk_elfconfig
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  CC      scripts/mod/devicetable-offsets.s
  UPD     scripts/mod/devicetable-offsets.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTLD  scripts/mod/modpost
  HOSTCC  scripts/selinux/genheaders/genheaders
  HOSTCC  scripts/selinux/mdp/mdp
  HOSTCC  scripts/kallsyms
  HOSTCC  scripts/pnmtologo
  HOSTCC  scripts/conmakehash
  HOSTCC  scripts/recordmcount
  HOSTCC  scripts/sortextable
  HOSTCC  scripts/asn1_compiler
  HOSTCC  scripts/sign-file
  HOSTCC  scripts/extract-cert
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make prepare
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
  HYPERCALLS arch/x86/include/generated/asm/xen-hypercalls.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTLD  arch/x86/tools/relocs
  UPD     include/config/kernel.release
  UPD     include/generated/utsrelease.h
  CC      kernel/bounds.s
  UPD     include/generated/bounds.h
  UPD     include/generated/timeconst.h
  CC      arch/x86/kernel/asm-offsets.s
  UPD     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  HOSTCC   /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/fixdep.o
  HOSTLD   /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/fixdep-in.o
  LINK     /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/fixdep
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/exec-cmd.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/help.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/pager.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/parse-options.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/run-command.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/sigchain.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/subcmd-config.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/libsubcmd-in.o
  AR       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/libsubcmd.a
  MKDIR    /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/arch/x86/lib/
  GEN      /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/arch/x86/lib/inat-tables.c
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/arch/x86/decode.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/arch/x86/objtool-in.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/builtin-check.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/builtin-orc.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/check.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/orc_gen.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/orc_dump.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/elf.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/special.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/objtool.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/libstring.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/libctype.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/str_error_r.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/objtool-in.o
Warning: synced file at 'tools/objtool/arch/x86/include/asm/inat.h' differs from latest kernel version at 'arch/x86/include/asm/inat.h'
Warning: synced file at 'tools/objtool/arch/x86/include/asm/insn.h' differs from latest kernel version at 'arch/x86/include/asm/insn.h'
Warning: synced file at 'tools/objtool/arch/x86/lib/inat.c' differs from latest kernel version at 'arch/x86/lib/inat.c'
Warning: synced file at 'tools/objtool/arch/x86/lib/insn.c' differs from latest kernel version at 'arch/x86/lib/insn.c'
  LINK     /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/objtool/objtool
  DESCEND  bpf/resolve_btfids

Auto-detecting system features:
...                        libelf: [ on  ]
...                          zlib: [ on  ]
...                           bpf: [ on  ]

  GEN      /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/bpf_helper_defs.h
  MKDIR    /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/libbpf.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/bpf.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/nlattr.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/btf.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/libbpf_errno.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/str_error.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/netlink.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/bpf_prog_linfo.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/libbpf_probes.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/xsk.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/hashmap.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/btf_dump.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/ringbuf.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/staticobjs/libbpf-in.o
  LINK     /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/libbpf.a
  HOSTCC   /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/fixdep.o
  HOSTLD   /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/fixdep-in.o
  LINK     /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/fixdep
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/exec-cmd.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/help.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/pager.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/parse-options.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/run-command.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/sigchain.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/subcmd-config.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/libsubcmd-in.o
  AR       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/libsubcmd.a
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/main.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/rbtree.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/zalloc.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/string.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/ctype.o
  CC       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/str_error_r.o
  LD       /home/mr_halfword/mlx5_packet_pacing_diags/linux-4.18.0-348.20.1.el8.x86_64/tools/bpf/resolve_btfids/resolve_btfids-in.o
  LINK     resolve_btfids
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make modules_prepare
  CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  DESCEND  bpf/resolve_btfids
  1. Copy the Module.symvers which match the running kernel:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ cp /usr/src/kernels/`uname -r`/Module.symvers .
  1. Compile the mlx5_core module, with no changes to check that it compiles without error:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make modules SUBDIRS=drivers/net/ethernet/mellanox/mlx5/core
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/main.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/cmd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/debugfs.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fw.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/eq.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/uar.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/pagealloc.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/health.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/mcg.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/cq.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/alloc.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/port.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/mr.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/pd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/transobj.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/vport.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sriov.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fs_core.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/pci_irq.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fs_counters.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/rl.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lag.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/dev.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/events.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/wq.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/gid.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/dm.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/devlink.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fw_reset.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/qos.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_main.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_common.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_fs.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_tx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_rx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_dim.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_txrx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_stats.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_selftest.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/port.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/health.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/params.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/devlink.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/ptp.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/qos.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/trap.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_arfs.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lag_mp.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/geneve.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_rep.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/rep/bond.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/mod_hdr.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/mapping.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_tc.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_vxlan.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_gre.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_mplsoudp.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/diag/en_tc_tracepoint.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/eswitch.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/ecpf.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/rdma.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/acl/helper.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/vporttbl.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/esw/sample.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/clock.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec_offload.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/accel/tls.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fpga/core.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_stats.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_table.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v1.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/hv.o
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.o
  LD [M]  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.mod.o
  LD [M]  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
  1. In the drivers/net/ethernet/mellanox/mlx5/core/rl.c file change from:
	mlx5_core_info(dev, "Rate limit: %u rates are supported, range: %uMbps to %uMbps\n",
		       table->max_size,
		       table->min_rate >> 10,
		       table->max_rate >> 10);

To:

	mlx5_core_info(dev, "Rate limit: %u rates are supported, range: %uKbps to %uKbps packet_pacing_burst_bound=%u packet_pacing_typical_size=%u\n",
		       table->max_size,
		       table->min_rate,
		       table->max_rate,
                       MLX5_CAP_QOS(dev, packet_pacing_burst_bound),
                       MLX5_CAP_QOS(dev, packet_pacing_typical_size));
  1. Rebuild the module with the changes:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ make modules SUBDIRS=drivers/net/ethernet/mellanox/mlx5/core
  CC [M]  drivers/net/ethernet/mellanox/mlx5/core/rl.o
  LD [M]  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.mod.o
  LD [M]  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
  1. Load the modified module:
[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ sudo rmmod mlx5_ib mlx5_core;sudo insmod drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko

9. Impact of ConnectX-4 Lx firmware configuration on packet pacing capabilities seen by Kernel module

For all these tests the TLV change to enable packet-pacing was left enabled.

9.1 ACCURATE_TX_SCHEDULER=True and TX_SCHEDULER_BURST=0

With the configuration as left from above, i.e. ACCURATE_TX_SCHEDULER=True and TX_SCHEDULER_BURST=0 then the mlx5_core module with the diagnostics reports:

[11098.480121] mlx5_core 0000:01:00.0: Rate limit: 13 rates are supported, range
: 1Kbps to 10000000Kbps packet_pacing_burst_bound=1 packet_pacing_typical_size=0

9.2 ACCURATE_TX_SCHEDULER=False and TX_SCHEDULER_BURST=0

Changed ACCURATE_TX_SCHEDULER back to its default of False

[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 set ACCURATE_TX_SCHEDULER=0

Device #1:
----------

Device type:    ConnectX4LX     
Name:           MCX4121A-XCA_Ax 
Description:    ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:         /dev/mst/mt4117_pciconf0

Configurations:                              Next Boot       New
         ACCURATE_TX_SCHEDULER               True(1)         False(0)        

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

After a reboot no change to the reported packet pacing:

[  199.073230] mlx5_core 0000:01:00.0: Rate limit: 13 rates are supported, range: 1Kbps to 10000000Kbps packet_pacing_burst_bound=1 packet_pacing_typical_size=0

9.3 ACCURATE_TX_SCHEDULER=True and TX_SCHEDULER_BURST=11

Renabled ACCURATE_TX_SCHEDULER and set the base-log2 TX_SCHEDULER_BURST to 11 (2048 bytes to allow for switch test packet size):

mr_halfword@haswell-alma ~]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 set ACCURATE_TX_SCHEDULER=1 TX_SCHEDULER_BURST=11

Device #1:
----------

Device type:    ConnectX4LX     
Name:           MCX4121A-XCA_Ax 
Description:    ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:         /dev/mst/mt4117_pciconf0

Configurations:                              Next Boot       New
         ACCURATE_TX_SCHEDULER               False(0)        True(1)         
         TX_SCHEDULER_BURST                  0               11              

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

After a reboot no change to the reported packet pacing:

[  146.267158] mlx5_core 0000:01:00.0: Rate limit: 13 rates are supported, range: 1Kbps to 10000000Kbps packet_pacing_burst_bound=1 packet_pacing_typical_size=0

With these changes no noticable difference from the packet burst size compared to 7.1 Burst of packets:

  1. With the mtu on the Ethernet interface set to 9216 wireshark indicates bursts of 43 packets.
[mr_halfword@haswell-alma release]$  ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1000
Send 1280 frames over 15.876766 secs, average 80.6 Hz
  1. With the mtu set to 1500 wireshark indicates bursts of 3 packets.
[mr_halfword@haswell-alma release]$  ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1000 
Send 1280 frames over 14.426067 secs, average 88.7 Hz
  1. With the mtu set to 2048 wireshark indicates bursts of 2 packets.
[mr_halfword@haswell-alma release]$  ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1000 
Send 1536 frames over 12.896631 secs, average 119.1 Hz
  1. With the mtu set to 2048 wireshark indicates bursts of 2 packets.
[mr_halfword@haswell-alma release]$  ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 500 
Send 1024 frames over 17.178638 secs, average 59.6 Hz

10. Discrepancies between user space and kernel about support for changing typical_pkt_sz

As per .2. ibv_modify_qp_rate_limit failing due to lack of burst support the user space mlx5_modify_qp_rate_limit in the rdma-core mlx5 provider doesn't allow max_burst_sz or typical_pkt_sz to be modified in a Queue-Pair unless both:

  1. rate_limit is non-zero in the request.
  2. The MLX5_IB_PP_SUPPORT_BURST flag is set in the packet_pacing_caps.cap_flags reported by the kernel for the device. As per the above anaysis this flag is only set when the reported device QOS capabilities supports both packet_pacing_burst_bound and packet_pacing_typical_size.

The __mlx5_ib_modify_qp function in the kernel drivers/infiniband/hw/mlx5/qp.c has independent tests on if modifications to max_burst_sz or typical_pkt_sz are allowed:

		if (attr_mask & IB_QP_RATE_LIMIT) {
			raw_qp_param.rl.rate = attr->rate_limit;

			if (ucmd->burst_info.max_burst_sz) {
				if (attr->rate_limit &&
				    MLX5_CAP_QOS(dev->mdev, packet_pacing_burst_bound)) {
					raw_qp_param.rl.max_burst_sz =
						ucmd->burst_info.max_burst_sz;
				} else {
					err = -EINVAL;
					goto out;
				}
			}

			if (ucmd->burst_info.typical_pkt_sz) {
				if (attr->rate_limit &&
				    MLX5_CAP_QOS(dev->mdev, packet_pacing_typical_size)) {
					raw_qp_param.rl.typical_pkt_sz =
						ucmd->burst_info.typical_pkt_sz;
				} else {
					err = -EINVAL;
					goto out;
				}
			}

			raw_qp_param.set_mask |= MLX5_RAW_QP_RATE_LIMIT;
		}

Based upon the kernel code, changed the user space code:

  1. In verbs.c the test in mlx5_modify_qp_rate_limit on the MLX5_IB_PP_SUPPORT_BURST flag was commented out.
  2. In ibv_raw_packet_tx.c modified the call to ibv_modify_qp_rate_limit to only set rate_limit and max_burst_sz.

With both the user space code changes was then able to successfully call mlx5_modify_qp_rate_limit with non-zero values for rate_limit and max_burst_sz.

11. Runs measuring effect of burst options

11.1. ACCURATE_TX_SCHEDULER=True and TX_SCHEDULER_BURST=11

Left ConnectX-4 Lx unchanged from previous tests

Output from running the tests:

[mr_halfword@haswell-alma ibv_message_passing]$ ./run_raw_packet_pacing_tests.sh ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l200000.csv 
HCA core clock 156250 KHz
Send 181525 frames over 10.027515 secs (CLOCK_MONOTONIC), average 18102.7 Hz
Average bit rate for untagged frames = 222.7 Mbps
HCA elapsed time = 10.027382 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l200000-b.csv -b
HCA core clock 156250 KHz
Send 133897 frames over 10.038814 secs (CLOCK_MONOTONIC), average 13337.9 Hz
Average bit rate for untagged frames = 164.1 Mbps
HCA elapsed time = 10.038680 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l1200000.csv 
HCA core clock 156250 KHz
Send 1067564 frames over 10.004821 secs (CLOCK_MONOTONIC), average 106705.0 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004688 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 1067564 frames over 10.004799 secs (CLOCK_MONOTONIC), average 106705.2 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004666 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l2400000.csv 
HCA core clock 156250 KHz
Send 2134630 frames over 10.002422 secs (CLOCK_MONOTONIC), average 213411.3 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002288 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 2134623 frames over 10.002406 secs (CLOCK_MONOTONIC), average 213411.0 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002273 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l4800000.csv 
HCA core clock 156250 KHz
Send 3963866 frames over 10.001304 secs (CLOCK_MONOTONIC), average 396334.9 Hz
Average bit rate for untagged frames = 4876.5 Mbps
HCA elapsed time = 10.001172 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m9216_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 4268741 frames over 10.001207 secs (CLOCK_MONOTONIC), average 426822.6 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001074 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l200000.csv 
HCA core clock 156250 KHz
Send 191057 frames over 10.026802 secs (CLOCK_MONOTONIC), average 19054.6 Hz
Average bit rate for untagged frames = 234.4 Mbps
HCA elapsed time = 10.026668 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l200000-b.csv -b
HCA core clock 156250 KHz
Send 305386 frames over 10.016788 secs (CLOCK_MONOTONIC), average 30487.4 Hz
Average bit rate for untagged frames = 375.1 Mbps
HCA elapsed time = 10.016654 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l1200000.csv 
HCA core clock 156250 KHz
Send 1067571 frames over 10.004830 secs (CLOCK_MONOTONIC), average 106705.6 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004697 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 1220006 frames over 10.004190 secs (CLOCK_MONOTONIC), average 121949.5 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004057 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l2400000.csv 
HCA core clock 156250 KHz
Send 2134623 frames over 10.002397 secs (CLOCK_MONOTONIC), average 213411.1 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002264 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 1220006 frames over 10.004191 secs (CLOCK_MONOTONIC), average 121949.5 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004058 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l4800000.csv 
HCA core clock 156250 KHz
Send 4268734 frames over 10.001201 secs (CLOCK_MONOTONIC), average 426822.2 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001067 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m2048_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 1220004 frames over 10.004185 secs (CLOCK_MONOTONIC), average 121949.4 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004052 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l200000.csv 
HCA core clock 156250 KHz
Send 152948 frames over 10.033561 secs (CLOCK_MONOTONIC), average 15243.6 Hz
Average bit rate for untagged frames = 187.6 Mbps
HCA elapsed time = 10.033427 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l200000-b.csv -b
HCA core clock 156250 KHz
Send 152949 frames over 10.033574 secs (CLOCK_MONOTONIC), average 15243.7 Hz
Average bit rate for untagged frames = 187.6 Mbps
HCA elapsed time = 10.033441 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l1200000.csv 
HCA core clock 156250 KHz
Send 915133 frames over 10.005597 secs (CLOCK_MONOTONIC), average 91462.1 Hz
Average bit rate for untagged frames = 1125.3 Mbps
HCA elapsed time = 10.005463 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 610259 frames over 10.008387 secs (CLOCK_MONOTONIC), average 60974.8 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008253 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l2400000.csv 
HCA core clock 156250 KHz
Send 1829750 frames over 10.002804 secs (CLOCK_MONOTONIC), average 182923.7 Hz
Average bit rate for untagged frames = 2250.7 Mbps
HCA elapsed time = 10.002671 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 610259 frames over 10.008390 secs (CLOCK_MONOTONIC), average 60974.7 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008256 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l4800000.csv 
HCA core clock 156250 KHz
Send 4268715 frames over 10.001197 secs (CLOCK_MONOTONIC), average 426820.4 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001063 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/accurate_tx_scheduler_tx_scheduler_burst11_m1500_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 610256 frames over 10.008383 secs (CLOCK_MONOTONIC), average 60974.5 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008249 secs

11.2. ACCURATE_TX_SCHEDULER=False and TX_SCHEDULER_BURST=0

Reverted to the original scheduler related firmware configuration options and then rebooted:

[mr_halfword@haswell-alma linux-4.18.0-348.20.1.el8.x86_64]$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 set ACCURATE_TX_SCHEDULER=0 TX_SCHEDULER_BURST=0
[sudo] password for mr_halfword: 

Device #1:
----------

Device type:    ConnectX4LX     
Name:           MCX4121A-XCA_Ax 
Description:    ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:         /dev/mst/mt4117_pciconf0

Configurations:                              Next Boot       New
         ACCURATE_TX_SCHEDULER               True(1)         False(0)        
         TX_SCHEDULER_BURST                  11              0               

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

Output from running the tests:

[mr_halfword@haswell-alma ibv_message_passing]$ ./run_raw_packet_pacing_tests.sh ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l200000.csv 
HCA core clock 156250 KHz
Send 181525 frames over 10.027740 secs (CLOCK_MONOTONIC), average 18102.3 Hz
Average bit rate for untagged frames = 222.7 Mbps
HCA elapsed time = 10.027610 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l200000-b.csv -b
HCA core clock 156250 KHz
Send 133890 frames over 10.038414 secs (CLOCK_MONOTONIC), average 13337.8 Hz
Average bit rate for untagged frames = 164.1 Mbps
HCA elapsed time = 10.038285 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l1200000.csv 
HCA core clock 156250 KHz
Send 1067564 frames over 10.004826 secs (CLOCK_MONOTONIC), average 106704.9 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004696 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 1067571 frames over 10.004809 secs (CLOCK_MONOTONIC), average 106705.8 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004679 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l2400000.csv 
HCA core clock 156250 KHz
Send 2134616 frames over 10.002410 secs (CLOCK_MONOTONIC), average 213410.2 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002290 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 2134628 frames over 10.002393 secs (CLOCK_MONOTONIC), average 213411.7 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002285 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l4800000.csv 
HCA core clock 156250 KHz
Send 3963861 frames over 10.001304 secs (CLOCK_MONOTONIC), average 396334.4 Hz
Average bit rate for untagged frames = 4876.5 Mbps
HCA elapsed time = 10.001187 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m9216_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 4268742 frames over 10.001197 secs (CLOCK_MONOTONIC), average 426823.1 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001079 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l200000.csv 
HCA core clock 156250 KHz
Send 191057 frames over 10.026776 secs (CLOCK_MONOTONIC), average 19054.7 Hz
Average bit rate for untagged frames = 234.4 Mbps
HCA elapsed time = 10.026658 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l200000-b.csv -b
HCA core clock 156250 KHz
Send 305384 frames over 10.016781 secs (CLOCK_MONOTONIC), average 30487.2 Hz
Average bit rate for untagged frames = 375.1 Mbps
HCA elapsed time = 10.016663 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l1200000.csv 
HCA core clock 156250 KHz
Send 1067564 frames over 10.004839 secs (CLOCK_MONOTONIC), average 106704.8 Hz
Average bit rate for untagged frames = 1312.9 Mbps
HCA elapsed time = 10.004690 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 1219998 frames over 10.004198 secs (CLOCK_MONOTONIC), average 121948.6 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004031 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l2400000.csv 
HCA core clock 156250 KHz
Send 2134609 frames over 10.002404 secs (CLOCK_MONOTONIC), average 213409.6 Hz
Average bit rate for untagged frames = 2625.8 Mbps
HCA elapsed time = 10.002236 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 1219996 frames over 10.004190 secs (CLOCK_MONOTONIC), average 121948.5 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004022 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l4800000.csv 
HCA core clock 156250 KHz
Send 4268699 frames over 10.001198 secs (CLOCK_MONOTONIC), average 426818.7 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001031 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m2048_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 1220002 frames over 10.004191 secs (CLOCK_MONOTONIC), average 121949.1 Hz
Average bit rate for untagged frames = 1500.5 Mbps
HCA elapsed time = 10.004025 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l200000.csv 
HCA core clock 156250 KHz
Send 152948 frames over 10.033536 secs (CLOCK_MONOTONIC), average 15243.7 Hz
Average bit rate for untagged frames = 187.6 Mbps
HCA elapsed time = 10.033409 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l200000-b.csv -b
HCA core clock 156250 KHz
Send 152949 frames over 10.033538 secs (CLOCK_MONOTONIC), average 15243.8 Hz
Average bit rate for untagged frames = 187.6 Mbps
HCA elapsed time = 10.033414 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l1200000.csv 
HCA core clock 156250 KHz
Send 915134 frames over 10.005606 secs (CLOCK_MONOTONIC), average 91462.1 Hz
Average bit rate for untagged frames = 1125.4 Mbps
HCA elapsed time = 10.005483 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 1200000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l1200000-b.csv -b
HCA core clock 156250 KHz
Send 610259 frames over 10.008389 secs (CLOCK_MONOTONIC), average 60974.7 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008266 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l2400000.csv 
HCA core clock 156250 KHz
Send 1829744 frames over 10.002803 secs (CLOCK_MONOTONIC), average 182923.1 Hz
Average bit rate for untagged frames = 2250.7 Mbps
HCA elapsed time = 10.002680 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 2400000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l2400000-b.csv -b
HCA core clock 156250 KHz
Send 610259 frames over 10.008387 secs (CLOCK_MONOTONIC), average 60974.8 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008263 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l4800000.csv 
HCA core clock 156250 KHz
Send 4268741 frames over 10.001204 secs (CLOCK_MONOTONIC), average 426822.7 Hz
Average bit rate for untagged frames = 5251.6 Mbps
HCA elapsed time = 10.001079 secs

ibv_message_passing_c_project/bin/release/ibv_switch_test/ibv_raw_packet_tx -i rocep1s0f0 -n 1 -p 25-26 -l 4800000 -c ibv_message_passing_c_project/bin/packet_pacing/scheduler_defaults_m1500_l4800000-b.csv -b
HCA core clock 156250 KHz
Send 610259 frames over 10.008387 secs (CLOCK_MONOTONIC), average 60974.8 Hz
Average bit rate for untagged frames = 750.2 Mbps
HCA elapsed time = 10.008256 secs

11.3. Summary

Changing the value of the ACCURATE_TX_SCHEDULER and TX_SCHEDULER_BURST firmware parameters didn't affect the average bit rate generated.

Setting a non-zero max_burst_sz can cause the average bit rate to fall off dramatically at the higher requested rate limits, e.g. be only 16% of that requested.

Trying some temporary changes to increase rate_limit_attr.max_burst_sz showed that need to go to around x6 the packet size to achieve the higher requested rates but:

  • The transmit completion timestamps then showed packets being sent in bursts.
  • There was still some interaction with the MTU set on the device, in terms of the difference between the requested and actual transmit rate in bits per second/

I.e. still not clear how to get a deterministic send interval between packets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment