Disclaimer: ChatGPT generated document.
cgroups — short for control groups — are a Linux kernel feature that lets you limit, prioritize, account for, and isolate resource usage (CPU, memory, I/O, etc.) of processes.
They are one of the core building blocks of modern container systems like Docker and Kubernetes.
Before cgroups, Linux had:
nicefor CPU priorityulimitfor per-process limitssetrlimit()syscall- cpusets for CPU affinity
But there was no:
- Hierarchical resource control
- Way to manage groups of processes
- Unified interface for memory + CPU + I/O together
Google engineers developed cgroups in 2006 to solve this. It was merged into Linux 2.6.24 (2008).
A cgroup is a group of processes.
You can:
- Limit their CPU time
- Restrict memory usage
- Limit disk I/O
- Control network traffic
- Track usage statistics
Unlike ulimit, cgroups:
- Work hierarchically
- Apply to groups
- Are dynamic
- Work well with containers
A set of processes.
Each controller manages one type of resource:
- CPU
- Memory
- I/O
- PIDs
- etc.
cgroups are arranged in a tree.
Child groups inherit restrictions from parents.
There are two versions.
Released around 2008.
Features:
- Each controller mounted separately
- Flexible but messy
- Controllers could be attached independently
Problems:
- Complex
- Controllers not unified
- Hard to reason about
- Inconsistent behavior
Modern Linux uses this.
Introduced gradually (Linux 4.x+).
Major improvements:
- Single unified hierarchy
- Better delegation model
- Cleaner resource model
- Improved security
- More predictable behavior
If you use modern systemd (like Ubuntu 22+, Debian 12+, etc.), you are likely using v2.
Check:
mount | grep cgroupIf you see cgroup2, you're on v2.
Mounted at:
/sys/fs/cgroup
Everything is a file.
Example:
/sys/fs/cgroup/mygroup/
Files inside:
cpu.maxmemory.maxmemory.currentpids.maxio.max
You control behavior by writing to these files.
Controls CPU bandwidth and distribution.
Key files:
cpu.max
cpu.weight
cpu.stat
Example:
echo "20000 100000" > cpu.max
Meaning:
- 20ms CPU time every 100ms
- So max 20% CPU
Controls RAM usage.
Files:
memory.max
memory.current
memory.high
memory.swap.max
Example:
echo 500M > memory.max
If exceeded:
- OOM kill
- Or throttling (if using memory.high)
Important difference:
memory.high→ soft limit (throttle)memory.max→ hard limit (kill)
Controls disk bandwidth and IOPS.
File:
io.max
Example:
echo "8:0 rbps=1048576" > io.max
Limits device 8:0 (e.g., /dev/sda) to 1MB/s read.
Limits number of processes.
echo 100 > pids.max
Prevents fork bombs.
Limits which CPUs or NUMA nodes are allowed.
cpuset.cpus
cpuset.mems
Used heavily in high-performance systems.
When you run:
docker run -m 512m --cpus=1 nginxDocker:
- Creates a new cgroup
- Writes limits into memory.max and cpu.max
- Adds container processes to the group
That’s it.
Containers are basically:
- Namespaces (isolation)
- cgroups (resource control)
Modern Linux systems use systemd.
systemd:
- Manages services via cgroups
- Each service runs in its own cgroup
- You can set limits in unit files:
Example:
[Service]
MemoryMax=500M
CPUQuota=50%
systemd translates this into cgroup settings.
cgroups v2 supports safe delegation.
Example:
- systemd owns root
- It delegates a subtree to Docker
- Docker manages containers inside that subtree
Security rule:
- A process can only control its subtree.
Modern kernels expose:
/proc/pressure/cpu
/proc/pressure/memory
/proc/pressure/io
This shows resource contention metrics.
Extremely useful for:
- Performance tuning
- Autoscaling systems
In v2:
- OOM is per-cgroup
- Not system-wide
Meaning:
- Only that group gets killed
- Not random system processes
Used by containers so they:
- Only see their own subtree
- Cannot see host hierarchy
v2 supports thread-level resource distribution.
Rarely used directly but powerful.
Create group:
mkdir /sys/fs/cgroup/testLimit memory:
echo 100M > /sys/fs/cgroup/test/memory.maxAdd process:
echo <PID> > /sys/fs/cgroup/test/cgroup.procsNow that process cannot exceed 100MB.
Internally:
- Each task_struct has pointer to cgroup
- Scheduler checks cpu controller
- Memory allocator checks memory controller
- I/O layer checks io controller
Hooks exist in:
- Scheduler
- Memory allocator
- VFS layer
- Block layer
It is deeply integrated into kernel subsystems.
cgroups prevent:
- Resource exhaustion attacks
- Fork bombs
- Memory DoS
- Container escape via resource abuse
But they are not isolation alone — namespaces + seccomp + capabilities are also required.
- Containers
- Cloud platforms
- Multi-tenant hosting
- CI/CD runners
- Serverless runtimes
- HPC resource partitioning
- Desktop app sandboxing
systemd-cgls
systemd-cgtop
cat /proc/<pid>/cgroupOr:
cat /sys/fs/cgroup/<group>/memory.currentThey are not compatible.
Hard limits can cause abrupt OOM.
cpu.max throttles, not caps frequency.
memory.swap.max must be configured.
- cgroups add minimal overhead
- CPU controller uses CFS bandwidth control
- Memory controller adds accounting overhead
- I/O controller depends on I/O scheduler
For HPC or ultra-low-latency systems:
- Fine-tuning may be necessary
- v1 is deprecated
- v2 is standard
- Deep integration with container runtimes
- PSI becoming more important for autoscaling
- eBPF integration improving observability
cgroups are:
- A Linux kernel feature
- Used for resource control
- Essential for containers
- Hierarchical
- File-based interface
- Deeply integrated into kernel subsystems
They are one of the most important kernel features in modern cloud computing.
If you'd like, I can next:
- Go deep into kernel implementation details
- Show how Docker config maps to cgroups files
- Explain how to use cgroups in high-performance C++ systems
- Or build a minimal container runtime using namespaces + cgroups
Given your systems/programming background, I can go very low-level.
