OOM Investigation - 2026-01-24

Summary

An OOM (Out of Memory) event occurred at 2026-01-24 00:56:33 that killed 48 processes and triggered a logout/login cycle (not a reboot - uptime is 13+ days).

Evidence Gathered

OOM Event Timeline

00:56:13: GNOME session started shutting down services
00:56:18: Zoom segfault (likely triggered by memory pressure)
00:56:33: OOM killer activated, 48 processes SIGKILL'd
00:56:33: [email protected] terminated
00:56:33: New GNOME session started (PID 3806336)

Processes Killed (by type)

Process Type	Count
bd (beads)	14
zsh	11
http-server	5
python	4
claude	2
sh	2
MainThread	2
zoom	1
Others	7

Key Observations

14 orphaned bd processes - Beads daemon processes accumulated and weren't cleaned up
5 http-server processes - Multiple http-server instances running
2 Claude sessions killed (PIDs 948978, 1257542)
Large Claude transcript files:
- 491MB: worktree-1/bf35e43e-*.jsonl (Jan 22)
- 348MB: worktree-1/45c23588-*.jsonl (Jan 24 - most recent)
- 64 subagent files in worktree-2's 74MB subagent dir
Memory state at investigation time: 42GB/62GB used with 80GB swap available

Three Hypotheses

Hypothesis 1: Beads (`bd`) Process Accumulation (HIGHEST CONFIDENCE)

Evidence:

14 orphaned bd processes were killed during OOM
These processes should terminate after completion but accumulated over 13+ days of uptime
Each bd process holds memory for issue parsing, graph computation, and IPC

Root Cause Theory: The bd (beads) command-line tool spawns processes for triage, listing, and other operations. When invoked via Claude Code subagents or shell commands, these processes may not properly terminate if:

Parent processes exit before children
Signal handlers don't propagate to forked bd instances
The TUI component has zombie process handling issues

Confidence: HIGH (14 processes is strong signal)

Test:

# Monitor bd processes over time
watch -n 60 'ps aux | grep -E "^\S+\s+[0-9]+.*bd" | grep -v grep | wc -l'

# Check if bd processes are orphaned (PPID 1)
ps -eo pid,ppid,cmd | grep "bd" | awk '$2 == 1 {print}'

Hypothesis 2: Claude Code Session Context Accumulation

Evidence:

Two massive transcript files: 491MB and 348MB
64 subagent files in worktree-2 alone (74MB subagent directory)
Session 45c23588 ran for extended period (last modified 00:01, still active at 00:56)
Claude processes were among those killed

Root Cause Theory: Long-running Claude Code sessions with:

Large context windows loaded in memory
Multiple subagents running concurrently
Transcript files being written/read continuously
No memory limits on Claude Code Node.js processes

When combined with the 13+ days of uptime, memory fragmentation and leaks in the Node.js process accumulate.

Confidence: MEDIUM-HIGH (large files are evidence, but unclear if loaded in memory)

Test:

# Check Claude process memory before and after session restart
ps aux --sort=-rss | grep -E "claude|node.*claude"

# Monitor during active session
watch -n 10 'ps aux --sort=-rss | head -5'

Hypothesis 3: http-server Process Leak from Claude Code Automation

Evidence:

5 http-server processes killed during OOM
http-server is commonly spawned by Claude Code for previewing HTML/web content
These processes run in background and may not be cleaned up

Root Cause Theory: Claude Code workflows frequently spawn http-server for serving local files. When:

Sessions are interrupted or crash
Terminal contexts are lost
Background processes are forked without cleanup tracking

These http-server instances persist indefinitely, each consuming memory.

Confidence: MEDIUM (5 processes is notable but not the primary cause)

Test:

# Check for orphaned http-server processes
ps aux | grep http-server

# Find http-server parent relationships
pstree -p | grep http-server

Recommended Mitigations

Immediate Actions

Clean up orphaned processes regularly:
```
# Add to cron (weekly)
pkill -f "^bd" && pkill -f "http-server"
```
Refine this if you have long-running http-server instance that you actually do want to keep alive.

Monitor process accumulation:

# Add to ~/.bashrc or cron
if [ $(ps aux | grep -E "`<user>`.*bd" | grep -v grep | wc -l) -gt 5 ]; then
  notify-send "Warning: $(ps aux | grep bd | wc -l) bd processes running"
fi

Long-term Fixes

Investigate beads process lifecycle - Why are bd processes not terminating?
Set memory limits on Claude Code - Use systemd slice or cgroups
Implement session cleanup hooks - Kill orphaned processes on session end
Archive old Claude transcripts - Move 300MB+ files to cold storage

Memory Snapshot at Investigation Time

total: 62Gi | used: 42Gi | available: 20Gi
swap:  80Gi | used: 2.2Gi
uptime: 13 days, 21:47

Current top memory consumers:

<vpn-client>: 798MB
<voice-daemon>: 728MB
claude: 595MB
gnome-shell: 515MB

Related Historical OOM Events

From boot 0 (current, since Jan 10):

Jan 10 03:38: gnome-shell, google-chrome, <voice-daemon> killed by OOM
Jan 24 00:56: Current event (48 processes killed)

Pattern: Extended uptime (10+ days) correlates with OOM events.

micahstubbs/oom-investigation-2026-01-24-redacted.md

Select an option

No results found

Select an option

No results found

OOM Investigation - 2026-01-24

Summary

Evidence Gathered

OOM Event Timeline

Processes Killed (by type)

Key Observations

Three Hypotheses

Hypothesis 1: Beads (`bd`) Process Accumulation (HIGHEST CONFIDENCE)

Hypothesis 2: Claude Code Session Context Accumulation

Hypothesis 3: http-server Process Leak from Claude Code Automation

Recommended Mitigations

Immediate Actions

Long-term Fixes

Memory Snapshot at Investigation Time

Related Historical OOM Events

micahstubbs/oom-investigation-2026-01-24-redacted.md

OOM Investigation - 2026-01-24

Summary

Evidence Gathered

OOM Event Timeline

Processes Killed (by type)

Key Observations

Three Hypotheses

Hypothesis 1: Beads (bd) Process Accumulation (HIGHEST CONFIDENCE)

Hypothesis 2: Claude Code Session Context Accumulation

Hypothesis 3: http-server Process Leak from Claude Code Automation

Recommended Mitigations

Immediate Actions

Long-term Fixes

Memory Snapshot at Investigation Time

Related Historical OOM Events

Hypothesis 1: Beads (`bd`) Process Accumulation (HIGHEST CONFIDENCE)