Linux timer migration

0. Introduction

On tests where where have been trying to use the Kernel isolcpus parameter, and other techniques, to perform CPU isolation to split the cores into:

One OS cpu per processor package to run background processes and interrupts.
The other cpus in the processor package are running one busy-polling application thread per core. On these cpus want to avoid unwanted interrupts to avoid latency spikes.

1. 3.10.33-rt32.33.el6rt.x86_64 Kernel from Scientific Linux 6.6

This is a RHEL based Kernel, in which CONFIG_NO_HZ_FULL is not set, meaning are unable to use the nohz_full parameter to be able to turn the application CPUs into adaptive-ticks CPUs.

Running tests found that sometimes the application CPUs could be preempted for several milliseconds. By enabling Kernel ftrace the duration the application CPUs could be preempted for was explained by how long TIMER_SOFTIRQ processing was running for. A function graph of the run_timer_softirq didn't show any single type of work taking the time, but rather appeared to be scanning timer lists as shown by sequences of multiple calls to cascade.

Looking at the Kernel source code to try and find if timer processing can be caused to run on what should be an isolated application CPU.

In kernel/timer.c there is the following inside the __mod_timer function:

	cpu = smp_processor_id();

#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
	if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
		cpu = get_nohz_timer_target();
#endif

The config options which conditionally compile the above block are enabled.

Looking at the above logic:

pinned will only be true for timers created by mod_timer_pinned which only appears to be a few specific cases, e.g. the Intel pstate cpufreq driver.
get_sysctl_timer_migration reads the value of the kernel.timer_migration sysctl parameters, which on the system being used for this analysis has defaulted to 1.
idle_cpu determine is the calling CPU is considered idle.
The get_nohz_timer_target function in kernel/sched/core.c is commented as:

/*
 * In the semi idle case, use the nearest busy cpu for migrating timers
 * from an idle cpu.  This is good for power-savings.
 *
 * We don't do similar optimization for completely idle system, as
 * selecting an idle cpu will add more delays to the timers than intended
 * (as that cpu's timer base may not be uptodate wrt jiffies etc).
 */

Therefore, if one of the OS CPUs becomes idle, then with kernel.timer_migration=1 then the timers on the OS CPUs can be migrated to one of the application CPUs.

Bug 1797629 - Disable timer_migration on cpu-partitioning profile has some imformation on the impact of this.

1.1. Porential improvements to using Eclipse to navigate the kernel source

Note: The above analysis was performed using Eclipse to navigate the Kernel source, but hadn't set up the indexer configuration used by Eclipse to match that used by the kernel. This meant couldn't rely upon the cross-referencer since:

Some code was considered conditionally compiled out by the the Eclipse indexer
Some syntax errors were reported by the Eclipse indexer

HowTo use the CDT to navigate Linux kernel source has some guidance on how to set the Eclipse indexer to match the kernel configuration, e.g. importing the pre-processor macro definitions from the include/linux/kconfig.h used when building the kernel. That will require setting up the kernel configuration, for which Added printf debugging to mlx5_core driver for packet pacing information has some notes.

2. 4.18.0-348.23.1.el8_5.x86_64 from AlmaLinux 8.5

Default is also kernel.timer_migration=1.

Haven't compared the timer migration code against the 3.10.33-rt32.33.el6rt.x86_64 Kernel.

Chester-Gillon/Linux_timer_migration.md

Select an option