This playbook walks you through extending the Automated Design of Agentic Systems (ADAS) framework to new domains where agent performance is evaluated by a Large Language Model (LLM) acting as a judge. It assumes familiarity with Python, prompt engineering, and running ADAS search.py pipelines.
- Meta Agent Search loop: Each domain folder (for example
_mmlu/) ships asearch.pythat orchestrates agent generation, evaluation, and archiving. The meta-agent proposes Python snippets that implementAgentSystem.forward(...). - Evaluation hook:
evaluate_forward_fn(args, forward_str)dynamically injects the candidateforwardmethod, runs it across the task suite, and converts raw results into a fitness string viabootstrap_confidence_interval. - Artifacts: Runs append entries to
results/_run_archive.json, combining candidate metadata, code, and fitness. Treat the archive as the single source of truth f