Skip to content

Instantly share code, notes, and snippets.

@possibilities
Created March 12, 2026 01:17
Show Gist options
  • Select an option

  • Save possibilities/9ab1b8a17d8cb7711e16876550ece169 to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/9ab1b8a17d8cb7711e16876550ece169 to your computer and use it in GitHub Desktop.
codectl: Specification & Design Document

codectl: Specification & Design Document

Carving out codectl's role across two systems — as a standalone repo context generator and as contextctl's codebase intelligence engine.

Sources: contextctl research report, repoprompt reverse engineering bible, knowctl repoprompt topic (40 docs), Codex architecture review, competitive landscape research (Aider, Kit, CatCoder, Sourcegraph Cody, Moderne Prethink).


The Core Insight

RepoPrompt's central thesis is right: context over convenience. Models perform better when they receive curated, token-efficient context upfront than when they discover it through exploratory tool calls. But RepoPrompt delivers this as a separate discovery phase before implementation. We want to deliver it during implementation — injecting the right context at the right time as the session unfolds.

codectl sits at the intersection. It understands the codebase and expresses that understanding in forms that agents (and contextctl) can consume.


Identity: One Core, Two Adapters

codectl is NOT "two co-equal jobs." It's one repo-intelligence core with two delivery modes:

  1. Machine adapterprobe and snapshot produce structured output for contextctl's scoring engine
  2. Human adaptercontext assembles a prompt bundle for pasting into chat or piping to an agent

Prompt assembly is downstream of extraction. The core identity is: codectl is a deterministic repo profile engine. Everything else is presentation.

This matters because if the standalone bundle generator becomes a first-class peer, it will drag the architecture toward prompt UX, presets, pasteability, and formatting concerns too early. Keep the two-adapter framing for explaining value, but don't let it drive the implementation model.


What to Extract from RepoPrompt

Must Clone (Tier 1 — Core Value)

Feature Why RP Equivalent
Tree-sitter codemaps 10-40x token reduction, the single most differentiating feature get_code_structure
File tree generation Foundation for all context, multiple views (full/selected/auto) get_file_tree
.gitignore-aware filtering Essential for tree + search Built into file tree
File slicing First-class concept, not just "read line ranges" — deliberate slice-based selection as a token-saving primitive alongside codemaps read_file + manage_selection
Token counting + budget visibility Show where tokens went: tree, codemaps, slices, files, diffs. Without this, bundle quality is hard to improve. Per-file breakdown
Search Path + content search is essential for manual refinement and future AI discovery file_search

Should Clone (Tier 2 — High Value)

Feature Why
Non-opinionated handoff output Facts, relationships, and open questions — not a pre-solved plan. RepoPrompt's builder is valuable because it produces neutral discovery, not pre-biased solutions.
Auto-codemap for dependencies Select a file in Full mode → its imports get codemaps automatically. Validated by CatCoder research: 1-order dependency type context beats naive retrieval.
Git-aware context Diffs and recent changes are often more valuable than more file tree
Multiple tree views Selected-only tree is frequently better than full tree (saves tokens, focuses attention)
Multi-root workspaces Monorepo support
Relevance-weighted codemaps Aider's insight: rank symbols by call-graph distance from active code, not just flat signature lists

Defer

Feature Why Defer
Context Builder (AI discovery) High complexity, v2 feature. For v1, manual file selection + auto-codemap is sufficient.
MCP server Not needed until other tools want to consume codectl
Apply/review editing GUI concern, out of scope
Preset system Nice polish, not core

Implementation Approach: Strict Hybrid

The Decision

Hybrid with guardrails. Not the soft version.

  • Custom fast path for probe: scope detection, atom normalization, command detection, output schema. Must be <200ms.
  • Kit-backed heavy path for snapshot: codemaps, symbol extraction, dependency analysis, file-walk caching via cased/kit (MIT, Python, 16-language tree-sitter, incremental symbol caching, Rust-powered file walking).

Guardrails

  • Do NOT use Kit's summarization (overlaps with contextctl)
  • Do NOT use Kit's MCP server
  • Do NOT use Kit as the product architecture
  • Put it behind a narrow extraction adapter so it's swappable
  • If Kit makes cold-start probe latency bad, the probe path stays fully custom forever

Why Hybrid

  • Building multi-language tree-sitter extraction and incremental symbol caching from scratch is expensive, boring, and not differentiating
  • Adopting Kit as the full foundation is too much baggage (numpy, fastapi, openai SDK, redis — most irrelevant to codectl)
  • Hybrid gives leverage without surrendering the boundary

Pre-requisite

Spend a day actually using Kit against the arthack monorepo. Benchmark probe latency, test tree-sitter coverage for Python + TypeScript, evaluate the dependency tree weight. Decide empirically before committing.

Why Not the Other Options

Approach Verdict
Kit as full foundation Too much baggage. Framework-shaped, not library-shaped.
Build lean from scratch Months of tree-sitter wiring per language. Not differentiating work.
Repoprompt port Swift/SwiftUI, wrong architecture, different goals.
Kit fork (trimmed) Maintenance burden, drift from upstream.

Incremental Context: Not in v1

DirtyOverlay and session-aware mutation tracking are premature.

What's worth doing (v1)

  • Fingerprint-based snapshot reuse (git HEAD + manifest hashes)
  • Content-addressed symbol caching (Kit gives this for free)
  • Re-probe on demand or obvious invalidators (manifest change, scope change, explicit refresh)

What's deferred (v2+, only if proven needed)

  • Tracking every Write/Edit event
  • Session dirty overlay
  • Hook-driven per-file refresh
  • "Live" mid-session updates

Why defer

  • The critical path for contextctl is the cheap probe, not perfectly fresh codemaps
  • No evidence yet that mid-session refresh materially changes outcomes
  • DirtyOverlay is the kind of clever subsystem that eats weeks and creates stale-state bugs
  • Ship with cheap re-probe + cached snapshot. If snapshot freshness becomes a proven bottleneck, add incremental invalidation then

v1 Commands

Three commands. Everything else is deferred.

codectl probe

Fast (<200ms), deterministic, machine-readable. The contextctl fast path and the main debugging surface.

fingerprint:
  head_sha: abc123
  manifest_digest: def456
scope:
  root: ~/code/arthack
  primary: apps/viewctl
  confidence: 0.85
  reasons: [cwd, nearest_manifest]
  related: [apps/cli_common]
atoms:
  - framework:nextjs
  - lang:typescript
  - pkgmgr:pnpm
  - tool:turbo
commands:
  build: pnpm --filter viewctl build
  dev: pnpm --filter viewctl dev
  test: pnpm --filter viewctl test

codectl snapshot

Full structured repo context for one scope. The core artifact. Cached by fingerprint.

fingerprint: { ... }
scope: { ... }
atoms: [ ... ]
tree:
  - path: apps/viewctl/src
    kind: dir
    importance: 0.91
  - path: apps/viewctl/src/app/page.tsx
    kind: file
    importance: 0.85
symbols:
  - path: apps/viewctl/src/components/Button.tsx
    exports: [Button, ButtonProps]
    functions: [renderIcon]
    lines: 142
    relevance: 0.78  # call-graph distance from active scope
deps:
  - from: apps/viewctl
    to: apps/cli_common
    type: import
commands: { ... }
git:
  branch: main
  recent_commits: [...]
  changed_files: [...]

tree and codemap are NOT top-level commands — they're views on snapshot via flags (codectl snapshot --tree-only, codectl snapshot --codemaps-only). Separate commands fragment the product before the core contract is stable.

codectl context

Human-facing bundle export built from snapshot. Proves standalone value without changing the core.

codectl context --files src/auth/ src/types/ --budget 32k > bundle.md
codectl context "implement auth middleware"  # v2: AI discovery

Output: assembled markdown prompt (tree + codemaps + file contents + git context + instructions). Includes budget visibility — shows where tokens went.


Monorepo Scoping

The hardest practical problem. A 30+ app monorepo can't be fully scanned every time.

Three Scanning Bands

  1. Shallow root overlay: turbo.json, pnpm-workspace.yaml, root configs — always scanned, cheap
  2. Deep active scope: Full codemaps, dep analysis for the app being worked on — expensive, cached
  3. Shallow related scopes: Just manifests + export signatures for sibling packages imported by the active scope

Scope Detection Signals

In priority order:

  1. Explicit path argument (highest confidence)
  2. cwd relative to repo root
  3. Files mentioned in prompt (contextctl passes this)
  4. Recently touched files (from hook events)
  5. git diff paths

Scope Ambiguity

When confidence < threshold, codectl emits multi-scope with confidence scores and lets the consumer decide. Never silently pick the wrong scope. The --explain flag shows confidence, reasons, and scan decisions for debugging.


Relationship to contextctl

┌──────────────────────────────────────────────────┐
│                  contextctl                       │
│  Snippet Registry → Selection Engine → Composer   │
│       ↑                    ↑                      │
│  Provider Adapters    codectl probe + snapshot     │
└──────────────────────────────────────────────────┘
         ↑                    ↓
   Hook events          additionalContext
   (stdin JSON)         (stdout markdown)

The purity boundary: codectl knows nothing about snippets, scoring, session state, or hooks. It's a pure function: (repo path, scope hint) → structured context. contextctl consumes its output.

How They Connect

Use Case Who Calls codectl How
Kickstart agent session Human codectl context "task" | pbcopy
Feed snippet scoring contextctl hook codectl probe --json → atoms for trigger matching
Inject repo awareness contextctl hook codectl snapshot --scope apps/viewctl → context map
Ad-hoc exploration Human/agent codectl snapshot --codemaps-only src/auth/
Pipe to external model Human codectl context --budget 60k > context.md

Why codectl Is the Right Foundation

contextctl's vision is a context compiler that scores and injects the right snippets at the right time. For that to work well, it needs one hard missing input: a deterministic understanding of the current repo and current scope. codectl provides that.

But the deeper foundation is not a CLI — it's:

  • A stable probe schema
  • A stable snapshot schema
  • A clear purity boundary

If we get that right, codectl becomes the right first implementation of repo awareness for contextctl. If we get seduced into building prompt assembly, AI discovery, MCP, and incremental live-refresh all at once, we delay the thing contextctl actually needs.


Competitive Landscape

What Others Do

Tool Approach Key Insight
RepoPrompt Pre-computed context bundles via GUI + MCP "Context over convenience" — front-load discovery
Aider PageRank over AST call graphs, auto-included per prompt Relevance-weighted codemaps, not just flat signatures
Sourcegraph Cody Semantic code search + code graph at enterprise scale @-mentions for pulling specific context
CatCoder 1-order dependency type context via language servers Auto-codemap for dependencies beats naive retrieval
Moderne Prethink Pre-resolved data tables from static analysis, versioned Context that lives alongside code
Kit (cased/kit) 16-language tree-sitter + incremental caching + dep analysis Library-shaped extraction engine (our proposed backend)

What Nobody Else Is Doing

Proactive, session-aware context injection via hooks. RepoPrompt's MCP gives agents tools to pull context. Aider's repo-map is automatic but static per prompt. Nobody is pushing dynamically scored context into the agent's awareness as the conversation unfolds. That's contextctl's differentiator, and codectl's atoms + snapshots are what make the scoring possible.

The Two Camps

The field is splitting into pre-computed context (RepoPrompt, Prethink) vs. runtime retrieval (Aider, Claude Code's tools, Cody). We're building the hybrid — pre-computed structure (codemaps, atoms) delivered at runtime (via contextctl hooks). That's the right bet.


Spec Gaps to Close

Gap Why It Matters Priority
Versioned output schema contextctl depends on stable probe and snapshot contracts. Define required/optional fields, compatibility rules, failure modes. Must-have before contextctl integration
Safety policy Generated files, vendored code, secrets, .env, lockfiles, minified bundles — what gets excluded or redacted? Must-have
Scope confidence + fallback What happens at 0.45 confidence? Multi-scope? Repo-wide shallow? Must-have
Language support policy v1: Python, TypeScript/JavaScript. Define unsupported behavior explicitly. Add more by request. Must-have
JSON-first machine output YAML for humans, JSON for contextctl. YAML-only is wrong for an integration boundary. Should-have
Success criteria p50/p95 latency targets, scope accuracy targets, cache hit rates Should-have
Debuggability --explain mode showing confidence, reasons, scan decisions Should-have
Cache invalidation edge cases Uncommitted edits, stash, worktrees Defer to v2

Open Questions for Humans

Architecture

Q1: Kit evaluation — should we spend a day benchmarking Kit on the arthack monorepo before committing? (Codex and the spec both recommend yes.)

Q2: v1 language scope — Python + TypeScript only, or also Go and Rust? The monorepo is primarily Python + TypeScript. Adding Go/Rust multiplies tree-sitter work.

Q3: Python or something else? Kit is Python. contextctl is Python. The arthack ecosystem is Python. But for a CLI that needs to start in <200ms, Python's startup overhead matters. Should the probe path be a compiled binary (Rust/Go) while the full snapshot stays Python?

Scope

Q4: How smart should AI discovery be in v2? Feed tree + codemaps to an LLM and ask it to select files? Or build a full agentic loop like RepoPrompt's Context Builder?

Q5: Should codectl live in this repo or in the arthack monorepo? It's currently standalone, but contextctl will live in the monorepo. Keeping them close reduces integration friction.

Q6: MCP server — ever? Kit has one. RepoPrompt's MCP is their main integration. Worth planning for even if deferred?

Validation

Q7: What's the first real test? Use codectl to generate a context bundle for a real task, compare agent performance with vs. without. What repo/task would be the best benchmark?

Q8: Is incremental context actually valuable? Nobody has proven that mid-session context updates improve agent performance. Should we design an experiment before building the infrastructure?


Open Questions for Agents

AQ1: What is the minimum output contextctl actually needs to outperform keyword-only snippet matching? If atoms alone (without the full snapshot) give contextctl 80% of its scoring power, the snapshot becomes optional for the integration path.

AQ2: How do we benchmark scope detection accuracy? Proposed: record the human's actual working directory + files touched per session, then compare codectl's scope prediction against ground truth. What's the right accuracy target?

AQ3: What atom vocabulary covers the arthack monorepo? Enumerate all the atoms codectl would emit for the arthack monorepo today. This tests whether the atom design is expressive enough before building the detection logic.

AQ4: How does Aider's PageRank-over-AST compare to flat codemaps in token efficiency and model performance? Worth a literature review or empirical test.


Next Steps

Recommended Path

Step 1: Kit evaluation sprint (1 day) Install Kit, run it against the arthack monorepo. Benchmark probe-equivalent latency, check tree-sitter coverage for Python + TypeScript, evaluate dependency weight. This answers the approach question empirically.

Step 2: Fast standalone MVP (1-2 weeks) Ship probe, snapshot, context. Define the output schemas. Prove codemaps and repo profiles work on real code. Don't wire to contextctl yet — just prove the extraction value.

Step 3: contextctl integration (after MVP works) Wire probe to contextctl's atom-based snippet triggers. Wire snapshot to a repo-awareness snippet. Test the full loop: hook fires → codectl scans → contextctl scores → advice injected.

Alternative Paths

Path A: contextctl-first — Build codectl directly as a contextctl component. Start with probe (atoms) and snapshot (context map). Test the full hook loop. Slower to standalone value, faster to the bigger goal.

Path B: Standalone-first — Focus entirely on the bundle generator use case. Ship a tool humans use to paste into ChatGPT/Claude. Defer contextctl integration. Faster to tangible value, slower to the bigger vision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment