Linux Perf Tools

Materials used to write this gist.

TL;DR

uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
perf

Tools

`uptime` for hight level load average.

Number of processes wanting to run. Includes processes blocked in uninterruptible IO.

`dmesg` system messages/errors.

Look for errors that can cause performance problems. "Out of memory", "TCP: ... dropping request", etc.

dmesg | grep oom-killer

`vmstat` for virtual memory statistics.

r: Number of processes running or waiting. Doesn't include IO. IF r > num_cpu THEN saturation.
b: Number of processes blocked by IO.
free: Free memory in kb
buffers: buffer cache, used for block device I/O.
cached: page cache, used by file systems.
si, so: Swap-ins and swap-outs. IF si,so != 0 THEN out_of_memory.
us, sy, id, wa, st: CPU time average
- us: user time.
- sy: system time. IF sy > 20% THEN kernel processing IO inefficeitnly.
- id: idle.
- wa: IO wait.
- st: stolten time, time spent by hypervisor for other VMs.

`mpstat -P ALL 1` fur CPU time breakdown per CPU.

Look out for signle hot CPU

`pidstat 1` rolling process summary.

Like top but prints a rolling summary instead of clearing the screen.

`iostat -xz 1` disk usage analysis.

r/s: delivered reads per second
rkB/s: kB read per second
w/s: delivered writes per second
wkB/s: kB write per second
await: the average time for the IO in milliseconds. Includes both time queued and time being serviced. IF high THEN device_saturation | device_problems
avgqu-sz: the average number of requests issued to the device. IF avgqu-sz > 1 THEN could_be saturation. Still Multilple back-end disk devices can operate on requests in parallel.
%util: device utilization (busy %). IF %util > 60% THEN poor_performance (double check with await). IF %util ~= 100% THEN saturation

`free -m -s 1` (`-m` display in MB, `-h` - display human readable).

buffers: buffer cache, used for block device I/O.
cached: page cache, used by file systems.
buff/cache: sum of buffers and cached.
available: used for caches but could be quickly made available for the application. IF buffers or cached ~= 0 THEN higher disk IO

`sar -n DEV 1` for network interface throughput.

rxpck/s: number of packets received per second.
txpck/s: number of packets transmitted per second.
rxkB/s: number of kilobytes received per second.
txkB/s: number of kilobytes transmitted per second.
%ifutil: utilization percentage of the network interface. Could be unreliable.

`sar -n TCP,ETCP 1` for summarized view of some key TCP metrics.

active/s: number of locally-initiated TCP connections per second (e.g., via connect()). (~ outbound)
passive/s: number of remotely-initiated TCP connections per second (e.g., via accept()). (~ inbound)
retrans/s: number of TCP retransmits per second. Sign of network or server issue.

`perf` sampling profiler.

You'll need a call graph:

--call-graph lbr - aka Last Branch Record utilizes special hardware registers to store some limited call graph of last branching instruction (you can expect aroudn ~32 entries). Very fast, but requires modern hardware >Haswell >ARMv9.2-A.
--call-graph fp - use frame pointer to determine call graph, use if your binary is built with frame pointer (-fno-omit-frame-pointer)
--call-graph dwarf - saves 8k of call stack to be analyzed later together with debug info. Produces large perf.data records, which are extremely slow to perf report. Practically unuseful with high sampling rate, therefore limit sampling rate to 99 Hz with -F99.

Example of comamnds:

Attach to running process to sample it for 10 seconds with 1000 Hz sample rate and LBR call-graph. Creates perf.data record.

perf record -p <pid> --call-graph lbr -F1000 -- sleep 10

Sample all the system for 10 secods with dward debug info, limiting samling rate to 99 Hz

perf record -a --call-graph dwarf -F99 -- sleep 10

If run on remote system, pack all necessary information to be analyzed later on a host system. This will create a .tar archive. Copy it together with perf.data to the host system.

perf archive

On the host system unpack the .tar archive. This will extract .tar archive to ~/.debug

perf archive --unpack

You can later generate a report with perf.data from remote system

perf report

`perf-archive` for Ubuntu

perf-archive is missing from all the Ubuntu perf packages. Get one from Linux source:

mkdir /usr/libexec/perf-core/
wget -O /usr/libexec/perf-core/perf-archive https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/perf/perf-archive.sh
chmod +x /usr/libexec/perf-core/perf-archive

pankkor/linux_perf_tools.md

Select an option

No results found

Select an option

No results found

Linux Perf Tools

Materials used to write this gist.

TL;DR

Tools

`uptime` for hight level load average.

`dmesg` system messages/errors.

`vmstat` for virtual memory statistics.

`mpstat -P ALL 1` fur CPU time breakdown per CPU.

`pidstat 1` rolling process summary.

`iostat -xz 1` disk usage analysis.

`free -m -s 1` (`-m` display in MB, `-h` - display human readable).

`sar -n DEV 1` for network interface throughput.

`sar -n TCP,ETCP 1` for summarized view of some key TCP metrics.

`perf` sampling profiler.

`perf-archive` for Ubuntu

pankkor/linux_perf_tools.md

Linux Perf Tools

Materials used to write this gist.

TL;DR

Tools

uptime for hight level load average.

dmesg system messages/errors.

vmstat for virtual memory statistics.

mpstat -P ALL 1 fur CPU time breakdown per CPU.

pidstat 1 rolling process summary.

iostat -xz 1 disk usage analysis.

free -m -s 1 (-m display in MB, -h - display human readable).

sar -n DEV 1 for network interface throughput.

sar -n TCP,ETCP 1 for summarized view of some key TCP metrics.

perf sampling profiler.

perf-archive for Ubuntu

`uptime` for hight level load average.

`dmesg` system messages/errors.

`vmstat` for virtual memory statistics.

`mpstat -P ALL 1` fur CPU time breakdown per CPU.

`pidstat 1` rolling process summary.

`iostat -xz 1` disk usage analysis.

`free -m -s 1` (`-m` display in MB, `-h` - display human readable).

`sar -n DEV 1` for network interface throughput.

`sar -n TCP,ETCP 1` for summarized view of some key TCP metrics.

`perf` sampling profiler.

`perf-archive` for Ubuntu