codectl: Specification & Design Document

Carving out codectl's role across two systems — as a standalone repo context generator and as contextctl's codebase intelligence engine.

Sources: contextctl research report, repoprompt reverse engineering bible, knowctl repoprompt topic (40 docs), Codex architecture review, competitive landscape research (Aider, Kit, CatCoder, Sourcegraph Cody, Moderne Prethink).

The Core Insight

RepoPrompt's central thesis is right: context over convenience. Models perform better when they receive curated, token-efficient context upfront than when they discover it through exploratory tool calls. But RepoPrompt delivers this as a separate discovery phase before implementation. We want to deliver it during implementation — injecting the right context at the right time as the session unfolds.

codectl sits at the intersection. It understands the codebase and expresses that understanding in forms that agents (and contextctl) can consume.

Identity: One Core, Two Adapters

codectl is NOT "two co-equal jobs." It's one repo-intelligence core with two delivery modes:

Machine adapter — probe and snapshot produce structured output for contextctl's scoring engine
Human adapter — context assembles a prompt bundle for pasting into chat or piping to an agent

Prompt assembly is downstream of extraction. The core identity is: codectl is a deterministic repo profile engine. Everything else is presentation.

This matters because if the standalone bundle generator becomes a first-class peer, it will drag the architecture toward prompt UX, presets, pasteability, and formatting concerns too early. Keep the two-adapter framing for explaining value, but don't let it drive the implementation model.

What to Extract from RepoPrompt

Must Clone (Tier 1 — Core Value)

Feature	Why	RP Equivalent
Tree-sitter codemaps	10-40x token reduction, the single most differentiating feature	`get_code_structure`
File tree generation	Foundation for all context, multiple views (full/selected/auto)	`get_file_tree`
.gitignore-aware filtering	Essential for tree + search	Built into file tree
File slicing	First-class concept, not just "read line ranges" — deliberate slice-based selection as a token-saving primitive alongside codemaps	`read_file` + `manage_selection`
Token counting + budget visibility	Show where tokens went: tree, codemaps, slices, files, diffs. Without this, bundle quality is hard to improve.	Per-file breakdown
Search	Path + content search is essential for manual refinement and future AI discovery	`file_search`

Should Clone (Tier 2 — High Value)

Feature	Why
Non-opinionated handoff output	Facts, relationships, and open questions — not a pre-solved plan. RepoPrompt's builder is valuable because it produces neutral discovery, not pre-biased solutions.
Auto-codemap for dependencies	Select a file in Full mode → its imports get codemaps automatically. Validated by CatCoder research: 1-order dependency type context beats naive retrieval.
Git-aware context	Diffs and recent changes are often more valuable than more file tree
Multiple tree views	Selected-only tree is frequently better than full tree (saves tokens, focuses attention)
Multi-root workspaces	Monorepo support
Relevance-weighted codemaps	Aider's insight: rank symbols by call-graph distance from active code, not just flat signature lists

Defer

Feature	Why Defer
Context Builder (AI discovery)	High complexity, v2 feature. For v1, manual file selection + auto-codemap is sufficient.
MCP server	Not needed until other tools want to consume codectl
Apply/review editing	GUI concern, out of scope
Preset system	Nice polish, not core

Implementation Approach: Strict Hybrid

The Decision

Hybrid with guardrails. Not the soft version.

Custom fast path for probe: scope detection, atom normalization, command detection, output schema. Must be <200ms.
Kit-backed heavy path for snapshot: codemaps, symbol extraction, dependency analysis, file-walk caching via cased/kit (MIT, Python, 16-language tree-sitter, incremental symbol caching, Rust-powered file walking).

Guardrails

Do NOT use Kit's summarization (overlaps with contextctl)
Do NOT use Kit's MCP server
Do NOT use Kit as the product architecture
Put it behind a narrow extraction adapter so it's swappable
If Kit makes cold-start probe latency bad, the probe path stays fully custom forever

Why Hybrid

Building multi-language tree-sitter extraction and incremental symbol caching from scratch is expensive, boring, and not differentiating
Adopting Kit as the full foundation is too much baggage (numpy, fastapi, openai SDK, redis — most irrelevant to codectl)
Hybrid gives leverage without surrendering the boundary

Pre-requisite

Spend a day actually using Kit against the arthack monorepo. Benchmark probe latency, test tree-sitter coverage for Python + TypeScript, evaluate the dependency tree weight. Decide empirically before committing.

Why Not the Other Options

Approach	Verdict
Kit as full foundation	Too much baggage. Framework-shaped, not library-shaped.
Build lean from scratch	Months of tree-sitter wiring per language. Not differentiating work.
Repoprompt port	Swift/SwiftUI, wrong architecture, different goals.
Kit fork (trimmed)	Maintenance burden, drift from upstream.

Incremental Context: Not in v1

DirtyOverlay and session-aware mutation tracking are premature.

What's worth doing (v1)

Fingerprint-based snapshot reuse (git HEAD + manifest hashes)
Content-addressed symbol caching (Kit gives this for free)
Re-probe on demand or obvious invalidators (manifest change, scope change, explicit refresh)

What's deferred (v2+, only if proven needed)

Tracking every Write/Edit event
Session dirty overlay
Hook-driven per-file refresh
"Live" mid-session updates

Why defer

The critical path for contextctl is the cheap probe, not perfectly fresh codemaps
No evidence yet that mid-session refresh materially changes outcomes
DirtyOverlay is the kind of clever subsystem that eats weeks and creates stale-state bugs
Ship with cheap re-probe + cached snapshot. If snapshot freshness becomes a proven bottleneck, add incremental invalidation then

v1 Commands

Three commands. Everything else is deferred.

`codectl probe`

Fast (<200ms), deterministic, machine-readable. The contextctl fast path and the main debugging surface.

fingerprint:
  head_sha: abc123
  manifest_digest: def456
scope:
  root: ~/code/arthack
  primary: apps/viewctl
  confidence: 0.85
  reasons: [cwd, nearest_manifest]
  related: [apps/cli_common]
atoms:
  - framework:nextjs
  - lang:typescript
  - pkgmgr:pnpm
  - tool:turbo
commands:
  build: pnpm --filter viewctl build
  dev: pnpm --filter viewctl dev
  test: pnpm --filter viewctl test

`codectl snapshot`

Full structured repo context for one scope. The core artifact. Cached by fingerprint.

fingerprint: { ... }
scope: { ... }
atoms: [ ... ]
tree:
  - path: apps/viewctl/src
    kind: dir
    importance: 0.91
  - path: apps/viewctl/src/app/page.tsx
    kind: file
    importance: 0.85
symbols:
  - path: apps/viewctl/src/components/Button.tsx
    exports: [Button, ButtonProps]
    functions: [renderIcon]
    lines: 142
    relevance: 0.78  # call-graph distance from active scope
deps:
  - from: apps/viewctl
    to: apps/cli_common
    type: import
commands: { ... }
git:
  branch: main
  recent_commits: [...]
  changed_files: [...]

tree and codemap are NOT top-level commands — they're views on snapshot via flags (codectl snapshot --tree-only, codectl snapshot --codemaps-only). Separate commands fragment the product before the core contract is stable.

`codectl context`

Human-facing bundle export built from snapshot. Proves standalone value without changing the core.

codectl context --files src/auth/ src/types/ --budget 32k > bundle.md
codectl context "implement auth middleware"  # v2: AI discovery

Output: assembled markdown prompt (tree + codemaps + file contents + git context + instructions). Includes budget visibility — shows where tokens went.

Monorepo Scoping

The hardest practical problem. A 30+ app monorepo can't be fully scanned every time.

Three Scanning Bands

Shallow root overlay: turbo.json, pnpm-workspace.yaml, root configs — always scanned, cheap
Deep active scope: Full codemaps, dep analysis for the app being worked on — expensive, cached
Shallow related scopes: Just manifests + export signatures for sibling packages imported by the active scope

Scope Detection Signals

In priority order:

Explicit path argument (highest confidence)
cwd relative to repo root
Files mentioned in prompt (contextctl passes this)
Recently touched files (from hook events)
git diff paths

Scope Ambiguity

When confidence < threshold, codectl emits multi-scope with confidence scores and lets the consumer decide. Never silently pick the wrong scope. The --explain flag shows confidence, reasons, and scan decisions for debugging.

Relationship to contextctl

┌──────────────────────────────────────────────────┐
│                  contextctl                       │
│  Snippet Registry → Selection Engine → Composer   │
│       ↑                    ↑                      │
│  Provider Adapters    codectl probe + snapshot     │
└──────────────────────────────────────────────────┘
         ↑                    ↓
   Hook events          additionalContext
   (stdin JSON)         (stdout markdown)

The purity boundary: codectl knows nothing about snippets, scoring, session state, or hooks. It's a pure function: (repo path, scope hint) → structured context. contextctl consumes its output.

How They Connect

Use Case	Who Calls codectl	How
Kickstart agent session	Human	`codectl context "task" \| pbcopy`
Feed snippet scoring	contextctl hook	`codectl probe --json` → atoms for trigger matching
Inject repo awareness	contextctl hook	`codectl snapshot --scope apps/viewctl` → context map
Ad-hoc exploration	Human/agent	`codectl snapshot --codemaps-only src/auth/`
Pipe to external model	Human	`codectl context --budget 60k > context.md`

Why codectl Is the Right Foundation

contextctl's vision is a context compiler that scores and injects the right snippets at the right time. For that to work well, it needs one hard missing input: a deterministic understanding of the current repo and current scope. codectl provides that.

But the deeper foundation is not a CLI — it's:

A stable probe schema
A stable snapshot schema
A clear purity boundary

If we get that right, codectl becomes the right first implementation of repo awareness for contextctl. If we get seduced into building prompt assembly, AI discovery, MCP, and incremental live-refresh all at once, we delay the thing contextctl actually needs.

Competitive Landscape

What Others Do

Tool	Approach	Key Insight
RepoPrompt	Pre-computed context bundles via GUI + MCP	"Context over convenience" — front-load discovery
Aider	PageRank over AST call graphs, auto-included per prompt	Relevance-weighted codemaps, not just flat signatures
Sourcegraph Cody	Semantic code search + code graph at enterprise scale	@-mentions for pulling specific context
CatCoder	1-order dependency type context via language servers	Auto-codemap for dependencies beats naive retrieval
Moderne Prethink	Pre-resolved data tables from static analysis, versioned	Context that lives alongside code
Kit (cased/kit)	16-language tree-sitter + incremental caching + dep analysis	Library-shaped extraction engine (our proposed backend)

What Nobody Else Is Doing

Proactive, session-aware context injection via hooks. RepoPrompt's MCP gives agents tools to pull context. Aider's repo-map is automatic but static per prompt. Nobody is pushing dynamically scored context into the agent's awareness as the conversation unfolds. That's contextctl's differentiator, and codectl's atoms + snapshots are what make the scoring possible.

The Two Camps

The field is splitting into pre-computed context (RepoPrompt, Prethink) vs. runtime retrieval (Aider, Claude Code's tools, Cody). We're building the hybrid — pre-computed structure (codemaps, atoms) delivered at runtime (via contextctl hooks). That's the right bet.

Spec Gaps to Close

Gap	Why It Matters	Priority
Versioned output schema	contextctl depends on stable `probe` and `snapshot` contracts. Define required/optional fields, compatibility rules, failure modes.	Must-have before contextctl integration
Safety policy	Generated files, vendored code, secrets, .env, lockfiles, minified bundles — what gets excluded or redacted?	Must-have
Scope confidence + fallback	What happens at 0.45 confidence? Multi-scope? Repo-wide shallow?	Must-have
Language support policy	v1: Python, TypeScript/JavaScript. Define unsupported behavior explicitly. Add more by request.	Must-have
JSON-first machine output	YAML for humans, JSON for contextctl. YAML-only is wrong for an integration boundary.	Should-have
Success criteria	p50/p95 latency targets, scope accuracy targets, cache hit rates	Should-have
Debuggability	`--explain` mode showing confidence, reasons, scan decisions	Should-have
Cache invalidation edge cases	Uncommitted edits, stash, worktrees	Defer to v2

Open Questions for Humans

Architecture

Q1: Kit evaluation — should we spend a day benchmarking Kit on the arthack monorepo before committing? (Codex and the spec both recommend yes.)

Q2: v1 language scope — Python + TypeScript only, or also Go and Rust? The monorepo is primarily Python + TypeScript. Adding Go/Rust multiplies tree-sitter work.

Q3: Python or something else? Kit is Python. contextctl is Python. The arthack ecosystem is Python. But for a CLI that needs to start in <200ms, Python's startup overhead matters. Should the probe path be a compiled binary (Rust/Go) while the full snapshot stays Python?

Scope

Q4: How smart should AI discovery be in v2? Feed tree + codemaps to an LLM and ask it to select files? Or build a full agentic loop like RepoPrompt's Context Builder?

Q5: Should codectl live in this repo or in the arthack monorepo? It's currently standalone, but contextctl will live in the monorepo. Keeping them close reduces integration friction.

Q6: MCP server — ever? Kit has one. RepoPrompt's MCP is their main integration. Worth planning for even if deferred?

Validation

Q7: What's the first real test? Use codectl to generate a context bundle for a real task, compare agent performance with vs. without. What repo/task would be the best benchmark?

Q8: Is incremental context actually valuable? Nobody has proven that mid-session context updates improve agent performance. Should we design an experiment before building the infrastructure?

Open Questions for Agents

AQ1: What is the minimum output contextctl actually needs to outperform keyword-only snippet matching? If atoms alone (without the full snapshot) give contextctl 80% of its scoring power, the snapshot becomes optional for the integration path.

AQ2: How do we benchmark scope detection accuracy? Proposed: record the human's actual working directory + files touched per session, then compare codectl's scope prediction against ground truth. What's the right accuracy target?

AQ3: What atom vocabulary covers the arthack monorepo? Enumerate all the atoms codectl would emit for the arthack monorepo today. This tests whether the atom design is expressive enough before building the detection logic.

AQ4: How does Aider's PageRank-over-AST compare to flat codemaps in token efficiency and model performance? Worth a literature review or empirical test.

Next Steps

Recommended Path

Step 1: Kit evaluation sprint (1 day) Install Kit, run it against the arthack monorepo. Benchmark probe-equivalent latency, check tree-sitter coverage for Python + TypeScript, evaluate dependency weight. This answers the approach question empirically.

Step 2: Fast standalone MVP (1-2 weeks) Ship probe, snapshot, context. Define the output schemas. Prove codemaps and repo profiles work on real code. Don't wire to contextctl yet — just prove the extraction value.

Step 3: contextctl integration (after MVP works) Wire probe to contextctl's atom-based snippet triggers. Wire snapshot to a repo-awareness snippet. Test the full loop: hook fires → codectl scans → contextctl scores → advice injected.

Alternative Paths

Path A: contextctl-first — Build codectl directly as a contextctl component. Start with probe (atoms) and snapshot (context map). Test the full hook loop. Slower to standalone value, faster to the bigger goal.

Path B: Standalone-first — Focus entirely on the bundle generator use case. Ship a tool humans use to paste into ChatGPT/Claude. Defer contextctl integration. Faster to tangible value, slower to the bigger vision.

possibilities/codectl-specification.md