name

description

args

convos

Extract past Claude Code conversations into readable markdown files in convos/. Highlights user prompts and assistant insights. Use for interview prep, project history, and knowledge mining.

name	description	required
args	[project-path] [--since=YYYY-MM-DD] [--session=UUID] [--reindex]	false

Convos - Conversation Extractor

Extract Claude Code session history into queryable markdown files. Optimized for surfacing high-signal content (your prompts, insights, decisions, architecture reasoning) while compressing low-signal content (tool I/O, file reads, bash output).

Overview

Conversations live at ~/.claude/projects/<mangled-path>/*.jsonl. Each JSONL line has:

type: user | assistant | progress | system | file-history-snapshot | queue-operation
timestamp: ISO 8601
message.content: string (user) or array of blocks (assistant)
gitBranch, sessionId, version

Step 0: Resolve Target Project

Determine which project to extract from:

If $ARGUMENTS contains a project path, mangle it: replace / with -, prepend -
- e.g. /Users/tom/sandbox/git-repos/foo -> -Users-tom-sandbox-git-repos-foo
If no path given, use the current repo's mangled path
Verify ~/.claude/projects/<mangled>/ exists and has .jsonl files

PROJECT_DIR="$HOME/.claude/projects/<mangled>"
ls "$PROJECT_DIR"/*.jsonl 2>/dev/null | wc -l

Set OUTPUT_DIR to convos/ in the current working directory (the repo you're in).

Step 1: Enumerate Sessions

# List all sessions with line counts and date range
for f in "$PROJECT_DIR"/*.jsonl; do
  ID=$(basename "$f" .jsonl)
  LINES=$(wc -l < "$f")
  FIRST_TS=$(jq -r 'select(.timestamp) | .timestamp' "$f" | head -1)
  echo "$FIRST_TS $LINES $ID"
done | sort

Filter by --since= if provided. If --session= provided, process only that one.

Skip sessions with <10 lines (hook-only noise).

If --reindex not set, skip sessions that already have a markdown file in convos/.

Step 2: Extract Each Session

For each session JSONL, extract into a structured markdown file. This is the core logic - it runs per session and requires judgment, not just jq piping.

2a: Metadata Header

# Extract from JSONL
jq -r 'select(.timestamp) | .timestamp' "$SESSION" | head -1  # start
jq -r 'select(.timestamp) | .timestamp' "$SESSION" | tail -1  # end
jq -r 'select(.gitBranch) | .gitBranch' "$SESSION" | head -1  # branch
jq -r '.version // empty' "$SESSION" | head -1                 # CC version

2b: Content Extraction (jq patterns)

User prompts (ALWAYS include, full text):

jq -r 'select(.type == "user") |
  "---\n### " + .timestamp + " [USER]\n\n" +
  (if .message.content | type == "string" then .message.content
   elif .message.content | type == "array" then
     [.message.content[] | select(.type == "text") | .text] | join("\n")
   else "" end)' "$SESSION"

Assistant text blocks (include, but filter):

jq -r 'select(.type == "assistant") |
  .message.content[] | select(.type == "text") | .text' "$SESSION"

Tool use blocks (COMPRESS - just tool name + file path or command summary):

jq -r 'select(.type == "assistant") |
  .message.content[] | select(.type == "tool_use") |
  "  > " + .name + ": " +
  (if .name == "Read" then .input.file_path
   elif .name == "Bash" then (.input.description // (.input.command | .[0:80]))
   elif .name == "Edit" then .input.file_path
   elif .name == "Write" then .input.file_path
   elif .name == "Glob" then .input.pattern
   elif .name == "Grep" then .input.pattern
   else (.input | tostring | .[0:80])
   end)' "$SESSION"

Thinking blocks (OMIT by default - these are internal reasoning, not user-facing insights).

Progress/system (OMIT entirely - hook execution noise).

2c: Insight Extraction (HIGH PRIORITY)

Insights are the most valuable content. Extract them separately for the summary section:

jq -r 'select(.type == "assistant") |
  .message.content[] | select(.type == "text") | .text' "$SESSION" |
  grep -A 50 "★ Insight" | grep -B 0 -A 50 "★" |
  awk '/★ Insight/{found=1} found{print} /^`─+`$/{if(found) {print ""; found=0}}'

2d: Commit Detection

Sessions often end with commits. Extract any git SHAs mentioned:

jq -r 'select(.type == "assistant") |
  .message.content[] | select(.type == "text") | .text' "$SESSION" |
  grep -oE '[0-9a-f]{7,40}' | sort -u

Cross-reference with git log --oneline to find actual commits from this session.

Step 3: Assemble Markdown

File Naming Convention

convos/YYYY-MM-DD-<slug>.md

Where <slug> is derived from the session content:

Read the first user message
Generate a 3-5 word kebab-case slug capturing the topic
If multiple sessions on same day + same topic, append -2, -3

Examples:

2026-02-10-daft-punk-sound-hooks.md
2026-02-15-devcontainer-firewall-setup.md
2026-03-01-claude-md-restructure.md

Markdown Structure

# <Title derived from first user message>

| Field | Value |
|-------|-------|
| Date | YYYY-MM-DD |
| Session | `<uuid>` |
| Branch | `<branch>` |
| Duration | ~Xm |
| Commits | `abc1234`, `def5678` |

## Insights

> Collected from Insight blocks throughout the session.

<all insight blocks, in order>

## Conversation

### HH:MM [You]

<user prompt text>

### HH:MM [Claude]

<assistant text, with tool calls compressed to one-line summaries>

  > Read: path/to/file.ts
  > Bash: Run tests
  > Edit: path/to/file.ts

<next assistant text block>

### HH:MM [You]

...

## Files Touched

- `path/to/file.ts` (edited)
- `path/to/new-file.sh` (created)

Key formatting rules:

Strip <system-reminder> tags from user messages
Strip HTML from user messages where it's clearly paste noise (keep if it's the actual content like HTML templates)
Tool summaries are indented blockquotes (one line each)
Consecutive tool calls group together without blank lines between them
User messages preserve original formatting (code blocks, lists, etc.)

Step 4: Index File

After processing all sessions, generate convos/INDEX.md:

# Conversation Index - <project-name>

Generated: YYYY-MM-DD | Sessions: N | Date range: YYYY-MM-DD to YYYY-MM-DD

## By Date

| Date | Title | Duration | Insights | Commits |
|------|-------|----------|----------|---------|
| 2026-03-07 | [Title](./2026-03-07-slug.md) | ~15m | 3 | `abc1234` |
| ... | ... | ... | ... | ... |

## Topics

<group sessions by detected topic clusters - e.g. "DevContainer", "Hooks", "Sound System", "CLAUDE.md">

Processing Strategy

<20 sessions: Process sequentially in this agent
20-50 sessions: Spawn 3-4 sonnet subagents, each handling a batch of sessions
50+ sessions: Spawn subagents in waves of 5, wait for completion, then next wave

Each subagent gets: the JSONL path, the output dir, and the naming convention. The lead agent handles the INDEX.md assembly.

Guardrails

Output to convos/ in CWD only. Never write to ~/.claude/.
convos/ should be gitignored (add to .gitignore if not already present). These are local extracts, not version-controlled artifacts.
If a session JSONL is >5000 lines, warn the user and ask before processing (it will be slow).
Truncate individual user messages at 2000 chars in the output (they can be very long pastes).
Never include thinking block content in output.
Strip system-reminder tags completely.

tomfuertes/_README.md

Select an option

No results found