You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observability Tools for Linux Performance Investigation
Performance issues on Linux systems can arise from multiple sources—disk I/O bottlenecks, memory pressure, network latency, or file system inefficiencies. To effectively diagnose and resolve these problems, engineers need a comprehensive set of observability tools that provide deep insights into system behavior.
This wiki page serves as a central reference for Linux observability tools that can be used to investigate performance issues from four critical perspectives:
Disk: Analyze I/O patterns, latency, and throughput to identify storage-related bottlenecks.
Memory: Monitor physical and virtual memory usage, NUMA behavior, and reclaim activity.
File System: Trace file operations, cache statistics, and latency sources to optimize file access.
Network: Examine socket statistics, interface throughput, and TCP/IP stack behavior for connectivity and bandwidth issues.
Each section includes commonly used tools and their purpose. Whether you are troubleshooting high latency, unexpected resource consumption, or system stalls, these tools provide the visibility needed to make informed decisions.
Linux CPU observability tools
Tool
Description
uptime
Load averages
vmstat
Includes system-wide CPU averages
mpstat
Per-CPU statistics
sar
Historical statistics
ps
Process status
top
Monitor per-process/thread CPU usage
pidstat
Per-process/thread CPU breakdowns
time, ptime
Time a command, with CPU breakdowns
turboboost
Show CPU clock rate and other states
showboost
Show CPU clock rate and turbo boost
pmcarch
Show high-level CPU cycle usage
tlbstat
Summarize TLB cycles
perf
CPU profiling and PMC analysis
profile
Sample CPU stack traces
cpudist
Summarize on-CPU time
runqlat
Summarize CPU run queue latency
runqlen
Summarize CPU run queue length
softirqs
Summarize soft interrupt time
hardirqs
Summarize hard interrupt time
bpftrace
Tracing programs for CPU analysis
Disk observability tools
Tool
Description
iostat
Various per-disk statistics
sar
Historical disk statistics
PSI
Disk pressure stall information
pidstat
Disk I/O usage by process
perf
Record block I/O tracepoints
biolatency
Summarize disk I/O latency as a histogram
biosnoop
Trace disk I/O with PID and latency
iotop, biotop
Top for disks: summarize disk I/O by process
biostacks
Show disk I/O with initialization stacks
blktrace
Disk I/O event tracing
bpftrace
Custom disk tracing
MegaCli
LSI controller statistics
smartctl
Disk controller statistics
Linux memory observability tools
Tool
Description
vmstat
Virtual and physical memory statistics
PSI
Memory pressure stall information
swapon
Swap device usage
sar
Historical statistics
slabtop
Kernel slab allocator statistics
numastat
NUMA statistics
ps
Process status
top
Monitor per-process memory usage
pmap
Process address space statistics
perf
Memory PMC and tracepoint analysis
drsnoop
Direct reclaim tracing
wss
Working set size estimation
bpftrace
Tracing programs for memory analysis
File system observability tools
Tool
Description
mount
List file systems and their mount flags
free
Cache capacity statistics
top
Includes memory usage summary
vmstat
Virtual memory statistics
sar
Various statistics, including historic
slabtop
Kernel slab allocator statistics
strace
System call tracing
fatrace
Trace file system operations using fanotify
latencytop
Show system-wide latency sources
opensnoop
Trace files opened
filetop
Top files in use by IOPS and bytes
cachestat
Page cache statistics
Network observability tools
Tool
Description
ss
Socket statistics
ip
Network interface and route statistics
ifconfig
Network interface statistics
nstat
Network stack statistics
netstat
Various network stack and interface statistics
sar
Historical statistics
nicstat
Network interface throughput and utilization
ethtool
Network interface driver statistics
tcplife
Trace TCP session lifespans with connection details