Reward-Free Agent Loop Summary

Narrative Frames

Seed – Start with expert demonstrations (JSONL). Feed them into the Early Experience (EE) pipeline.
Explore & Reflect – Roll out trajectories, compare alternatives, and generate self-reflections.
Guard – Apply deterministic guardrails (tests, formulas, MSE/SSIM) to clamp outputs to canonical results.
Distill – ACE ingests the guardrail-corrected lessons and appends delta updates to the playbook.
Self-Improve – The playbook informs the next loop; the agent keeps teaching itself.

Flowchart

flowchart LR
    A[Expert Demos] --> B(Early Experience Pipeline)
    subgraph EE[Early Experience]
        B --> C[World Model]
        B --> D[Exploration]
        B --> E[Reflection]
        B --> F[Policy]
    end
    F --> G[Live Loop Episodes]
    G --> H{Deterministic Guardrails}
    H -->|Clamp & Log| I[ACE Playbook]
    I -->|Updated Context| G

Timeline

timeline
    title Reward-Free Self-Improvement
    section Preparation
      Expert demos loaded: 2025-05-10
      Guardrails scaffolded: 2025-05-11
    section Learning Loop
      SWE-bench loop (50 eps): Guardrail passes=50
      ACE updates shadow bullets (+9 new, +20 helpful)
      MagicBrush loop (50 eps): Guardrail passes=50
      ACE updates shadow bullets (+28 total)
    section Expansion
      Add new domain via scaffold_domain.py
      Run benchmark harness (--offline)
      Live loop with ACE (--ace)

Key Commands

# Finance benchmark
PYTHONPATH=src python scripts/run_benchmark.py benchmarks/finance_subset.jsonl --domain finance --offline

# SWE-bench live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=swe-bench ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain swe-bench --episodes 50 --ace

# MagicBrush live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=magicbrush ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain magicbrush --episodes 50 --ace

jmanhype/gist_narrative.md

Select an option

No results found

Select an option

No results found

Reward-Free Agent Loop Summary

Narrative Frames

Flowchart

Timeline

Key Commands