kexec

syscalls

Load new kernel + initrd from files:
- kexec_file_load
- Go syscall
Call reboot with LINUX_REBOOT_CMD_KEXEC:
- reboot

(2) means that kexec is only available if reboot() is available as well.

There are two kexec_load-related syscalls:

kexec_load which takes arbitrary memory (enabled via CONFIG_KEXEC)
kexec_file_load which takes file descriptors and might do signature validation (enabled via CONFIG_KEXEC_FILE)

KSPP talks only about CONFIG_KEXEC, not about CONFIG_KEXEC_FILE. At the same time it recommends sysctl to disable kexec_load which disables both flavors.

kexec source code.

Capabilities

kexec_file_load and reboot require CAP_SYS_BOOT capability.

reboot() inside user namespace doesn't reboot the system, it reboots the namespace (killing it) proof. If LINUX_REBOOT_CMD_KEXEC is used, it results in EINVAL. Which in turn means that any container can't actually use kexec, unless it breaks out of user namespace (if it does, security is compromised anyways).

We can further limit kexec by dropping CAP_SYS_BOOT capability for any process forked from machined (init). Path towards that is not yet totally clear for me, but some pointers:

Creating user namespace re-enables all the capabilities back but capabilities inside the user namespace are limited to the resources scoped under the user namespace (more info).

In other words, on protecting kexec from being used by processes other than machined:

For processes directly forked from machined (which include udevd, containerd, etc.): we can try to drop capabilities as we fork into those processes.
For containers created by containerd (both system and k8s), kexec shouldn't be available as they reside in user namespace.

smira/README.md

Select an option

No results found

Select an option

No results found

kexec

syscalls

Capabilities