Carving out codectl's role across two systems — as a standalone repo context generator and as contextctl's codebase intelligence engine.
Sources: contextctl research report, repoprompt reverse engineering bible, knowctl repoprompt topic (40 docs), Codex architecture review, competitive landscape research (Aider, Kit, CatCoder, Sourcegraph Cody, Moderne Prethink).
RepoPrompt's central thesis is right: context over convenience. Models perform better when they receive curated, token-efficient context upfront than when they discover it through exploratory tool calls. But RepoPrompt delivers this as a separate discovery phase before implementation. We want to deliver it during implementation — injecting the right context at the right time as the session unfolds.
codectl sits at the intersection. It understands the codebase and expresses that understanding in forms that agents (and contextctl) can consume.
codectl is NOT "two co-equal jobs." It's one repo-intelligence core with two delivery modes:
- Machine adapter —
probeandsnapshotproduce structured output for contextctl's scoring engine - Human adapter —
contextassembles a prompt bundle for pasting into chat or piping to an agent
Prompt assembly is downstream of extraction. The core identity is: codectl is a deterministic repo profile engine. Everything else is presentation.
This matters because if the standalone bundle generator becomes a first-class peer, it will drag the architecture toward prompt UX, presets, pasteability, and formatting concerns too early. Keep the two-adapter framing for explaining value, but don't let it drive the implementation model.
| Feature | Why | RP Equivalent |
|---|---|---|
| Tree-sitter codemaps | 10-40x token reduction, the single most differentiating feature | get_code_structure |
| File tree generation | Foundation for all context, multiple views (full/selected/auto) | get_file_tree |
| .gitignore-aware filtering | Essential for tree + search | Built into file tree |
| File slicing | First-class concept, not just "read line ranges" — deliberate slice-based selection as a token-saving primitive alongside codemaps | read_file + manage_selection |
| Token counting + budget visibility | Show where tokens went: tree, codemaps, slices, files, diffs. Without this, bundle quality is hard to improve. | Per-file breakdown |
| Search | Path + content search is essential for manual refinement and future AI discovery | file_search |
| Feature | Why |
|---|---|
| Non-opinionated handoff output | Facts, relationships, and open questions — not a pre-solved plan. RepoPrompt's builder is valuable because it produces neutral discovery, not pre-biased solutions. |
| Auto-codemap for dependencies | Select a file in Full mode → its imports get codemaps automatically. Validated by CatCoder research: 1-order dependency type context beats naive retrieval. |
| Git-aware context | Diffs and recent changes are often more valuable than more file tree |
| Multiple tree views | Selected-only tree is frequently better than full tree (saves tokens, focuses attention) |
| Multi-root workspaces | Monorepo support |
| Relevance-weighted codemaps | Aider's insight: rank symbols by call-graph distance from active code, not just flat signature lists |
| Feature | Why Defer |
|---|---|
| Context Builder (AI discovery) | High complexity, v2 feature. For v1, manual file selection + auto-codemap is sufficient. |
| MCP server | Not needed until other tools want to consume codectl |
| Apply/review editing | GUI concern, out of scope |
| Preset system | Nice polish, not core |
Hybrid with guardrails. Not the soft version.
- Custom fast path for
probe: scope detection, atom normalization, command detection, output schema. Must be <200ms. - Kit-backed heavy path for
snapshot: codemaps, symbol extraction, dependency analysis, file-walk caching via cased/kit (MIT, Python, 16-language tree-sitter, incremental symbol caching, Rust-powered file walking).
- Do NOT use Kit's summarization (overlaps with contextctl)
- Do NOT use Kit's MCP server
- Do NOT use Kit as the product architecture
- Put it behind a narrow extraction adapter so it's swappable
- If Kit makes cold-start probe latency bad, the probe path stays fully custom forever
- Building multi-language tree-sitter extraction and incremental symbol caching from scratch is expensive, boring, and not differentiating
- Adopting Kit as the full foundation is too much baggage (numpy, fastapi, openai SDK, redis — most irrelevant to codectl)
- Hybrid gives leverage without surrendering the boundary
Spend a day actually using Kit against the arthack monorepo. Benchmark probe latency, test tree-sitter coverage for Python + TypeScript, evaluate the dependency tree weight. Decide empirically before committing.
| Approach | Verdict |
|---|---|
| Kit as full foundation | Too much baggage. Framework-shaped, not library-shaped. |
| Build lean from scratch | Months of tree-sitter wiring per language. Not differentiating work. |
| Repoprompt port | Swift/SwiftUI, wrong architecture, different goals. |
| Kit fork (trimmed) | Maintenance burden, drift from upstream. |
DirtyOverlay and session-aware mutation tracking are premature.
- Fingerprint-based snapshot reuse (git HEAD + manifest hashes)
- Content-addressed symbol caching (Kit gives this for free)
- Re-probe on demand or obvious invalidators (manifest change, scope change, explicit refresh)
- Tracking every Write/Edit event
- Session dirty overlay
- Hook-driven per-file refresh
- "Live" mid-session updates
- The critical path for contextctl is the cheap
probe, not perfectly fresh codemaps - No evidence yet that mid-session refresh materially changes outcomes
- DirtyOverlay is the kind of clever subsystem that eats weeks and creates stale-state bugs
- Ship with cheap re-probe + cached snapshot. If snapshot freshness becomes a proven bottleneck, add incremental invalidation then
Three commands. Everything else is deferred.
Fast (<200ms), deterministic, machine-readable. The contextctl fast path and the main debugging surface.
fingerprint:
head_sha: abc123
manifest_digest: def456
scope:
root: ~/code/arthack
primary: apps/viewctl
confidence: 0.85
reasons: [cwd, nearest_manifest]
related: [apps/cli_common]
atoms:
- framework:nextjs
- lang:typescript
- pkgmgr:pnpm
- tool:turbo
commands:
build: pnpm --filter viewctl build
dev: pnpm --filter viewctl dev
test: pnpm --filter viewctl testFull structured repo context for one scope. The core artifact. Cached by fingerprint.
fingerprint: { ... }
scope: { ... }
atoms: [ ... ]
tree:
- path: apps/viewctl/src
kind: dir
importance: 0.91
- path: apps/viewctl/src/app/page.tsx
kind: file
importance: 0.85
symbols:
- path: apps/viewctl/src/components/Button.tsx
exports: [Button, ButtonProps]
functions: [renderIcon]
lines: 142
relevance: 0.78 # call-graph distance from active scope
deps:
- from: apps/viewctl
to: apps/cli_common
type: import
commands: { ... }
git:
branch: main
recent_commits: [...]
changed_files: [...]tree and codemap are NOT top-level commands — they're views on snapshot via flags (codectl snapshot --tree-only, codectl snapshot --codemaps-only). Separate commands fragment the product before the core contract is stable.
Human-facing bundle export built from snapshot. Proves standalone value without changing the core.
codectl context --files src/auth/ src/types/ --budget 32k > bundle.md
codectl context "implement auth middleware" # v2: AI discoveryOutput: assembled markdown prompt (tree + codemaps + file contents + git context + instructions). Includes budget visibility — shows where tokens went.
The hardest practical problem. A 30+ app monorepo can't be fully scanned every time.
- Shallow root overlay: turbo.json, pnpm-workspace.yaml, root configs — always scanned, cheap
- Deep active scope: Full codemaps, dep analysis for the app being worked on — expensive, cached
- Shallow related scopes: Just manifests + export signatures for sibling packages imported by the active scope
In priority order:
- Explicit path argument (highest confidence)
- cwd relative to repo root
- Files mentioned in prompt (contextctl passes this)
- Recently touched files (from hook events)
- git diff paths
When confidence < threshold, codectl emits multi-scope with confidence scores and lets the consumer decide. Never silently pick the wrong scope. The --explain flag shows confidence, reasons, and scan decisions for debugging.
┌──────────────────────────────────────────────────┐
│ contextctl │
│ Snippet Registry → Selection Engine → Composer │
│ ↑ ↑ │
│ Provider Adapters codectl probe + snapshot │
└──────────────────────────────────────────────────┘
↑ ↓
Hook events additionalContext
(stdin JSON) (stdout markdown)
The purity boundary: codectl knows nothing about snippets, scoring, session state, or hooks. It's a pure function: (repo path, scope hint) → structured context. contextctl consumes its output.
| Use Case | Who Calls codectl | How |
|---|---|---|
| Kickstart agent session | Human | codectl context "task" | pbcopy |
| Feed snippet scoring | contextctl hook | codectl probe --json → atoms for trigger matching |
| Inject repo awareness | contextctl hook | codectl snapshot --scope apps/viewctl → context map |
| Ad-hoc exploration | Human/agent | codectl snapshot --codemaps-only src/auth/ |
| Pipe to external model | Human | codectl context --budget 60k > context.md |
contextctl's vision is a context compiler that scores and injects the right snippets at the right time. For that to work well, it needs one hard missing input: a deterministic understanding of the current repo and current scope. codectl provides that.
But the deeper foundation is not a CLI — it's:
- A stable
probeschema - A stable
snapshotschema - A clear purity boundary
If we get that right, codectl becomes the right first implementation of repo awareness for contextctl. If we get seduced into building prompt assembly, AI discovery, MCP, and incremental live-refresh all at once, we delay the thing contextctl actually needs.
| Tool | Approach | Key Insight |
|---|---|---|
| RepoPrompt | Pre-computed context bundles via GUI + MCP | "Context over convenience" — front-load discovery |
| Aider | PageRank over AST call graphs, auto-included per prompt | Relevance-weighted codemaps, not just flat signatures |
| Sourcegraph Cody | Semantic code search + code graph at enterprise scale | @-mentions for pulling specific context |
| CatCoder | 1-order dependency type context via language servers | Auto-codemap for dependencies beats naive retrieval |
| Moderne Prethink | Pre-resolved data tables from static analysis, versioned | Context that lives alongside code |
| Kit (cased/kit) | 16-language tree-sitter + incremental caching + dep analysis | Library-shaped extraction engine (our proposed backend) |
Proactive, session-aware context injection via hooks. RepoPrompt's MCP gives agents tools to pull context. Aider's repo-map is automatic but static per prompt. Nobody is pushing dynamically scored context into the agent's awareness as the conversation unfolds. That's contextctl's differentiator, and codectl's atoms + snapshots are what make the scoring possible.
The field is splitting into pre-computed context (RepoPrompt, Prethink) vs. runtime retrieval (Aider, Claude Code's tools, Cody). We're building the hybrid — pre-computed structure (codemaps, atoms) delivered at runtime (via contextctl hooks). That's the right bet.
| Gap | Why It Matters | Priority |
|---|---|---|
| Versioned output schema | contextctl depends on stable probe and snapshot contracts. Define required/optional fields, compatibility rules, failure modes. |
Must-have before contextctl integration |
| Safety policy | Generated files, vendored code, secrets, .env, lockfiles, minified bundles — what gets excluded or redacted? | Must-have |
| Scope confidence + fallback | What happens at 0.45 confidence? Multi-scope? Repo-wide shallow? | Must-have |
| Language support policy | v1: Python, TypeScript/JavaScript. Define unsupported behavior explicitly. Add more by request. | Must-have |
| JSON-first machine output | YAML for humans, JSON for contextctl. YAML-only is wrong for an integration boundary. | Should-have |
| Success criteria | p50/p95 latency targets, scope accuracy targets, cache hit rates | Should-have |
| Debuggability | --explain mode showing confidence, reasons, scan decisions |
Should-have |
| Cache invalidation edge cases | Uncommitted edits, stash, worktrees | Defer to v2 |
Q1: Kit evaluation — should we spend a day benchmarking Kit on the arthack monorepo before committing? (Codex and the spec both recommend yes.)
Q2: v1 language scope — Python + TypeScript only, or also Go and Rust? The monorepo is primarily Python + TypeScript. Adding Go/Rust multiplies tree-sitter work.
Q3: Python or something else? Kit is Python. contextctl is Python. The arthack ecosystem is Python. But for a CLI that needs to start in <200ms, Python's startup overhead matters. Should the probe path be a compiled binary (Rust/Go) while the full snapshot stays Python?
Q4: How smart should AI discovery be in v2? Feed tree + codemaps to an LLM and ask it to select files? Or build a full agentic loop like RepoPrompt's Context Builder?
Q5: Should codectl live in this repo or in the arthack monorepo? It's currently standalone, but contextctl will live in the monorepo. Keeping them close reduces integration friction.
Q6: MCP server — ever? Kit has one. RepoPrompt's MCP is their main integration. Worth planning for even if deferred?
Q7: What's the first real test? Use codectl to generate a context bundle for a real task, compare agent performance with vs. without. What repo/task would be the best benchmark?
Q8: Is incremental context actually valuable? Nobody has proven that mid-session context updates improve agent performance. Should we design an experiment before building the infrastructure?
AQ1: What is the minimum output contextctl actually needs to outperform keyword-only snippet matching? If atoms alone (without the full snapshot) give contextctl 80% of its scoring power, the snapshot becomes optional for the integration path.
AQ2: How do we benchmark scope detection accuracy? Proposed: record the human's actual working directory + files touched per session, then compare codectl's scope prediction against ground truth. What's the right accuracy target?
AQ3: What atom vocabulary covers the arthack monorepo? Enumerate all the atoms codectl would emit for the arthack monorepo today. This tests whether the atom design is expressive enough before building the detection logic.
AQ4: How does Aider's PageRank-over-AST compare to flat codemaps in token efficiency and model performance? Worth a literature review or empirical test.
Step 1: Kit evaluation sprint (1 day) Install Kit, run it against the arthack monorepo. Benchmark probe-equivalent latency, check tree-sitter coverage for Python + TypeScript, evaluate dependency weight. This answers the approach question empirically.
Step 2: Fast standalone MVP (1-2 weeks)
Ship probe, snapshot, context. Define the output schemas. Prove codemaps and repo profiles work on real code. Don't wire to contextctl yet — just prove the extraction value.
Step 3: contextctl integration (after MVP works)
Wire probe to contextctl's atom-based snippet triggers. Wire snapshot to a repo-awareness snippet. Test the full loop: hook fires → codectl scans → contextctl scores → advice injected.
Path A: contextctl-first — Build codectl directly as a contextctl component. Start with probe (atoms) and snapshot (context map). Test the full hook loop. Slower to standalone value, faster to the bigger goal.
Path B: Standalone-first — Focus entirely on the bundle generator use case. Ship a tool humans use to paste into ChatGPT/Claude. Defer contextctl integration. Faster to tangible value, slower to the bigger vision.