- Seed – Start with expert demonstrations (JSONL). Feed them into the Early Experience (EE) pipeline.
- Explore & Reflect – Roll out trajectories, compare alternatives, and generate self-reflections.
- Guard – Apply deterministic guardrails (tests, formulas, MSE/SSIM) to clamp outputs to canonical results.
- Distill – ACE ingests the guardrail-corrected lessons and appends delta updates to the playbook.
- Self-Improve – The playbook informs the next loop; the agent keeps teaching itself.
flowchart LR
A[Expert Demos] --> B(Early Experience Pipeline)
subgraph EE[Early Experience]
B --> C[World Model]
B --> D[Exploration]
B --> E[Reflection]
B --> F[Policy]
end
F --> G[Live Loop Episodes]
G --> H{Deterministic Guardrails}
H -->|Clamp & Log| I[ACE Playbook]
I -->|Updated Context| G
timeline
title Reward-Free Self-Improvement
section Preparation
Expert demos loaded: 2025-05-10
Guardrails scaffolded: 2025-05-11
section Learning Loop
SWE-bench loop (50 eps): Guardrail passes=50
ACE updates shadow bullets (+9 new, +20 helpful)
MagicBrush loop (50 eps): Guardrail passes=50
ACE updates shadow bullets (+28 total)
section Expansion
Add new domain via scaffold_domain.py
Run benchmark harness (--offline)
Live loop with ACE (--ace)
# Finance benchmark
PYTHONPATH=src python scripts/run_benchmark.py benchmarks/finance_subset.jsonl --domain finance --offline
# SWE-bench live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=swe-bench ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain swe-bench --episodes 50 --ace
# MagicBrush live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=magicbrush ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain magicbrush --episodes 50 --ace