Skip to content

Instantly share code, notes, and snippets.

@jmanhype
Created October 20, 2025 02:14
Show Gist options
  • Select an option

  • Save jmanhype/c7e418159d99c7423f61c5cd86baf01d to your computer and use it in GitHub Desktop.

Select an option

Save jmanhype/c7e418159d99c7423f61c5cd86baf01d to your computer and use it in GitHub Desktop.
Reward-free EE + ACE loop narrative

Reward-Free Agent Loop Summary

Narrative Frames

  1. Seed – Start with expert demonstrations (JSONL). Feed them into the Early Experience (EE) pipeline.
  2. Explore & Reflect – Roll out trajectories, compare alternatives, and generate self-reflections.
  3. Guard – Apply deterministic guardrails (tests, formulas, MSE/SSIM) to clamp outputs to canonical results.
  4. Distill – ACE ingests the guardrail-corrected lessons and appends delta updates to the playbook.
  5. Self-Improve – The playbook informs the next loop; the agent keeps teaching itself.

Flowchart

flowchart LR
    A[Expert Demos] --> B(Early Experience Pipeline)
    subgraph EE[Early Experience]
        B --> C[World Model]
        B --> D[Exploration]
        B --> E[Reflection]
        B --> F[Policy]
    end
    F --> G[Live Loop Episodes]
    G --> H{Deterministic Guardrails}
    H -->|Clamp & Log| I[ACE Playbook]
    I -->|Updated Context| G
Loading

Timeline

timeline
    title Reward-Free Self-Improvement
    section Preparation
      Expert demos loaded: 2025-05-10
      Guardrails scaffolded: 2025-05-11
    section Learning Loop
      SWE-bench loop (50 eps): Guardrail passes=50
      ACE updates shadow bullets (+9 new, +20 helpful)
      MagicBrush loop (50 eps): Guardrail passes=50
      ACE updates shadow bullets (+28 total)
    section Expansion
      Add new domain via scaffold_domain.py
      Run benchmark harness (--offline)
      Live loop with ACE (--ace)
Loading

Key Commands

# Finance benchmark
PYTHONPATH=src python scripts/run_benchmark.py benchmarks/finance_subset.jsonl --domain finance --offline

# SWE-bench live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=swe-bench ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain swe-bench --episodes 50 --ace

# MagicBrush live loop with ACE
ACE_ENABLED=1 ACE_DOMAIN_ID=magicbrush ACE_TARGET_STAGE=shadow \
DATABASE_URL=sqlite:///ace_playbook.db \
PYTHONPATH=src python examples/live_loop_swe_magic.py --domain magicbrush --episodes 50 --ace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment