Generate a production‑grade Rust workspace that implements an autonomous, multi‑agent Test‑Driven Development machine for code katas. The tool must run locally as a CLI, orchestrate three agents (tester, implementor, refactorer), and follow a strict red‑green‑refactor loop for a configurable number of steps. It must store state in git and allow each agent to read the last commit message, the last git diff, and the entire working tree.
- Create a Rust workspace with clean boundaries, strong types, and testable modules.
- Implement an orchestrator that cycles over agents: tester → implementor → refactorer → implementor → … for N steps.
- Each agent must run tests and compile checks, and must be able to edit the codebase across multiple files and modules.
- Persist progress via conventional commits. Commit messages must include all context needed by the next agent.
- Consume a kata description from a Markdown file. Agents should align their actions to that document.
- Support pluggable LLMs through OpenAI‑compatible APIs for each role, configurable per role.
- No GUI. CLI only.
- No remote execution or containers. Local process execution is fine.
- No patch‑file handoffs. Handoff happens by reading the repo state and git history.
Create a cargo workspace with these crates:
/ Cargo.toml # workspace table
/ crates/
tdd-cli/ # binary, user entrypoint
tdd-core/ # domain model, orchestrator, traits, commit policy
tdd-agents/ # role implementations that call the LLMs
tdd-exec/ # test runner, git, fs, process utilities
tdd-llm/ # client and adapters for OpenAI‑compatible providers
tdd-fixtures/ # sample katas and tests for e2e validation (dev-dependency)
clapfor CLI.serde,serde_yaml,serde_jsonfor config and logs.tokiofor async when calling LLMs.reqwestfor HTTP.git2for git operations.tempfile,anyhow,thiserrorfor ergonomics.walkdir,ignorefor repo scanning.which,ductortokio::processfor process execution.
Implement tdd.yaml at repo root. Example:
kata_description: "path/to/kata-description.md" # path to markdown
language: "rust" # used by runner to select commands
steps: 20
max_attempts_per_agent: 5
roles:
tester:
model: "openai:gpt-4.1-mini"
temperature: 0.4
implementor:
model: "deepseek:coder-v2"
temperature: 0.2
refactorer:
model: "glm:glm-4-air"
temperature: 0.3
llm:
base_url: "http://localhost:11434/v1" # OpenAI‑compatible
api_key_env: "LLM_API_KEY"
ci:
test_cmd: ["cargo", "test", "--all"]
check_cmd: ["cargo", "clippy", "--all", "--", "-D", "warnings"]
fmt_cmd: ["cargo", "fmt"]
commit:
author_name: "TDD Machine"
author_email: "tdd@local"USAGE:
tdd-cli init # initialize repo and scaffolding
tdd-cli run --steps N # run N full TDD steps
tdd-cli step # run a single agent step (debug)
tdd-cli status # show current agent, step counter, last commit summary
tdd-cli doctor # verify tools, versions, environment
Define these types and traits:
pub enum Role { Tester, Implementor, Refactorer }
pub struct StepContext {
pub role: Role,
pub step_index: u32,
pub kata_description: String,
pub git_last_commit_msg: String,
pub git_last_diff: String, // unified diff
pub repo_snapshot_paths: Vec<String>, // files in repo
}
pub struct StepResult {
pub files_changed: Vec<String>, // absolute or repo‑relative
pub commit_message: String, // conventional commit
pub notes: String, // extra info for logs
}
#[async_trait::async_trait]
pub trait Agent {
fn role(&self) -> Role;
async fn plan(&self, ctx: &StepContext) -> anyhow::Result<String>; // reasoning and change plan
async fn edit(&self, ctx: &StepContext) -> anyhow::Result<StepResult>; // apply edits to fs
}
pub trait Orchestrator {
fn current_role(&self) -> Role;
async fn next(&mut self) -> anyhow::Result<()>; // run a role step, commit, rotate role
}-
Start with
Testerwhen the repo is empty.Testermust initialize git and basic scaffold for the chosen language. -
Every step:
- Build
StepContextfrom the working tree, last commit message, last diff, and the kata description. - Call
Agent::planand persist the plan to.tdd/plan/step-N-role.mdfor traceability. - Call
Agent::editthat writes files to disk. - Run
fmt,check, thentestviatdd-exec. - If tests fail for Implementor or Refactorer, allow up to
max_attempts_per_agentretries. Between retries the agent can undo withgit checkout .or adjust edits. - On success, create a conventional commit using
git2with a detailed message. - Rotate the role according to the red‑green‑refactor loop.
- Build
All agents must follow the TDD cycle and respect the kata Markdown. They must not change public behavior during refactor.
- If repo is uninitialized, create
Cargo.toml, src layout, test module, and enablerustfmtandclippy. - Write the smallest failing test that advances behavior in line with the kata.
- Verify the test fails before handing off.
- Commit
test:with a message that explains the intent and the smallest behavior slice.
- Read last commit message and diff. Implement the minimal change to pass all tests.
- Prefer simplest design that works. Defer structure to refactorer.
- On failure, iterate up to
max_attempts_per_agent. May reset the working tree between attempts. - Commit
feat:orfix:depending on the change.
- Improve structure without changing observable behavior. May split files, extract modules, rename for clarity, improve API shapes, add types.
- Ensure tests keep passing.
- Commit
refactor:with a detailed rationale.
Use the following format, with a rich body so the next agent has context.
<type>(scope): short summary
Context:
- Role: <Tester|Implementor|Refactorer>
- Step: <N>
- Kata goal: <one sentence from kata.md>
Rationale:
- Why this change right now
- For tests: what behavior is introduced and why
- For implementor: minimal implementation strategy
- For refactorer: structural improvements and intended flexibility
Diff summary:
- List key files touched and intent per file
Verification:
- Test results summary
- Provide a generic
LlmClientwithchat(messages: Vec<Message>) -> String. - Allow per‑role model, temperature, and provider selection via config.
- Support any OpenAI‑compatible base URL and API key.
- Utilities to run commands with timeouts, capture stdout and stderr, and map to a
Result<TestOutcome>. run_fmt,run_check,run_testswith language‑aware defaults. Focus on Rust in v1.- Git helpers: init repo, stage changes, read last commit message, compute diff, commit with author, rollback.
- Agents produce an edit plan as structured JSON:
{
"edits": [
{"path": "src/lib.rs", "action": "upsert", "content": "..."},
{"path": "src/game.rs", "action": "upsert", "content": "..."},
{"path": "src/mod.rs", "action": "upsert", "content": "..."}
]
}- The
tdd-agentscrate turns this plan into file system changes. Avoid patch files.
Embed a system prompt per role that is fed to the LLM along with StepContext and selected file snippets. Ensure the model obeys the file editing protocol.
You are the Tester in a TDD cycle for a Rust kata. Your responsibilities:
- Read the kata.md and propose the smallest meaningful test that advances behavior.
- Write or update tests only. Do not implement production code.
- Tests must compile and be focused on one behavior slice.
- Provide a JSON edit plan with files and full contents, not a diff.
- After writing the test, ensure it fails when run against current code.
- Produce a conventional commit message with the `test:` type.
You are the Implementor in a TDD cycle for a Rust kata. Your responsibilities:
- Read the last commit message, the last diff, and the full tree.
- Implement the smallest change that makes all tests pass.
- Keep the design simple. You may add files, structs, modules.
- Provide a JSON edit plan with files and full contents.
- Produce a conventional commit message with `feat:` or `fix:`.
You are the Refactorer in a TDD cycle for a Rust kata. Your responsibilities:
- Improve structure and readability without changing behavior.
- You may reorganize modules, extract types, rename for clarity.
- Do not modify test assertions, only restructure code under test.
- Provide a JSON edit plan with files and full contents.
- Produce a `refactor:` commit message.
Implement these in tdd-core and tdd-exec:
pub struct RepoState {
pub last_commit_message: String,
pub last_diff: String,
pub files: Vec<String>,
}
pub struct RunnerOutcome { pub ok: bool, pub stdout: String, pub stderr: String }
pub trait Runner {
fn fmt(&self) -> anyhow::Result<RunnerOutcome>;
fn check(&self) -> anyhow::Result<RunnerOutcome>;
fn test(&self) -> anyhow::Result<RunnerOutcome>;
}
pub trait Vcs {
fn init_if_needed(&self) -> anyhow::Result<()>;
fn read_state(&self) -> anyhow::Result<RepoState>;
fn stage_all(&self) -> anyhow::Result<()>;
fn commit(&self, message: &str) -> anyhow::Result<String>; // returns commit id
}tdd-cli initshould create a new cargo library with atests/folder,.gitignore,rust-toolchain.toml, and preconfiguredclippyandfmt.- It should create
kata.mdplaceholder andtdd.yamlwith defaults.
- Persist JSON logs per step in
.tdd/logs/step-N-role.jsonincluding plan, runner outputs, and commit id. - Provide
tdd-cli statusto print the latest step summary and failing diagnostics if any.
- Running
tdd-cli initon an empty folder initializes a working Rust kata scaffold and a git repo. - Running
tdd-cli run --steps 3produces alternating commits from tester, implementor, refactorer, with passing tests after implementor and refactorer steps. - Each commit follows the specified conventional commit format and includes context sections.
- Agents can create multiple files and modules. The tool compiles and runs tests at every step.
- Config supports different models per role and a custom OpenAI‑compatible base URL.
- Fully compilable workspace with crates listed above.
- Unit tests for
tdd-coreandtdd-executilities. - Example kata fixture such as String Calculator or Bowling, under
tdd-fixtures, to validate the loop. - A
README.mddescribing usage, config, and architecture.
- Create the cargo workspace and crate skeletons with
Cargo.tomlfiles. - Implement
tdd-execfor process runner and git. - Implement
tdd-llmwith a minimal OpenAI‑compatible client and per‑role routing. - Implement
tdd-coredomain types, traits, orchestrator, and commit policy. - Implement
tdd-agentswith the three role templates and JSON edit plan mechanism. - Implement
tdd-cliwithinit,run,step,status,doctorcommands. - Add
tdd-fixtureswith a sample kata and an e2e test that runs a few steps using a mocked LLM client. - Write the
README.mdwith instructions.
- The code must compile with stable Rust and pass
cargo clippy -D warnings. - All public functions should have doc comments.
- Provide meaningful error messages and map process failures to structured errors.
- Keep modules small and cohesive. Favor traits and dependency injection for testability.
- Add per‑language runners, starting with Node and Python.
- Add a sandbox mode that runs agents in a temporary worktree and only commits on success.
- Add a risk budget that controls how many files a refactor is allowed to move in a single step.
- Add a guard that blocks refactor commits if test coverage drops.