Skip to content

Instantly share code, notes, and snippets.

@nibzard
Last active March 10, 2026 01:45
Show Gist options
  • Select an option

  • Save nibzard/fade1c6d3d099fbc56df0fcdf2ede1f2 to your computer and use it in GitHub Desktop.

Select an option

Save nibzard/fade1c6d3d099fbc56df0fcdf2ede1f2 to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-booking-20260308-192032
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Optimization Report - booking</title>
<style>
:root {
--bg: #f6f7f3;
--surface: #ffffff;
--ink: #14212e;
--muted: #5b6673;
--accent: #126b79;
--line: #d6dbe2;
--ok: #2c7a42;
--bad: #9a2e2e;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif;
color: var(--ink);
background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%);
}
.wrap { max-width: 1180px; margin: 0 auto; padding: 24px; }
h1 { margin: 0 0 8px; font-size: 30px; }
.sub { color: var(--muted); margin-bottom: 18px; }
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; }
.card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; }
.k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; }
.v { font-size: 24px; font-weight: 650; margin-top: 4px; }
section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; }
h3 { margin: 0 0 6px; font-size: 15px; }
h2 { margin: 0 0 10px; font-size: 18px; }
table { width: 100%; border-collapse: collapse; font-size: 13px; }
th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; }
th { color: var(--muted); font-weight: 600; }
ul { margin: 0; padding-left: 18px; }
.lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; }
.lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); }
.lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); }
.lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); }
.lineage-title { font-weight: 700; margin-bottom: 2px; }
.lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.iteration-block { margin-top: 14px; }
.iteration-head { margin-bottom: 8px; }
.candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; }
.candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; }
.candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; }
.candidate-title { font-weight: 700; }
.badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; }
.badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; }
.badge-mode { background: #eef7fb; color: #25556f; }
.badge-validated { background: #eef7ef; color: #2b6540; }
.badge-selected { background: #fff4dc; color: #8b5a00; }
.badge-promoted { background: #e8f7e9; color: #21643a; }
.badge-missing { background: #fdecec; color: #8e3434; }
.candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; }
.metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; }
.metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; }
.candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; }
.journey-step summary { cursor: pointer; font-weight: 650; }
.grid-line { stroke: #e6ebf0; stroke-width: 1; }
.axis-line { stroke: #8b97a5; stroke-width: 1; }
.axis-label { fill: #647182; font-size: 11px; }
.score-line { stroke: var(--accent); stroke-width: 2.3; }
.line-point { fill: #126b79; }
.cost-bar { fill: #2d6b9f; opacity: .82; }
.diff-file { color: #9ec1ff; display: block; }
.diff-hunk { color: #ffd479; display: block; }
.diff-add { color: #65d28d; background: #123321; display: block; }
.diff-del { color: #ff9da1; background: #3b1818; display: block; }
.diff-neutral { color: #d9e4ef; display: block; }
.foot { color: var(--muted); font-size: 12px; margin-top: 16px; }
</style>
</head>
<body>
<div class="wrap">
<h1>Optimization Report</h1>
<div class="sub">Domain: <strong>booking</strong> | Run generated: 2026-03-08T21:26:17.197126+00:00</div>
<div class="cards">
<div class="card"><div class="k">Baseline score</div><div class="v">0.8062</div></div>
<div class="card"><div class="k">Final test score</div><div class="v">0.5200</div></div>
<div class="card"><div class="k">Delta</div><div class="v">-0.2862</div></div>
<div class="card"><div class="k">Total cost</div><div class="v">$22.5332</div></div>
<div class="card"><div class="k">Wall clock</div><div class="v">2h 5m 45s</div></div>
<div class="card"><div class="k">Total invocations</div><div class="v">100</div></div>
<div class="card"><div class="k">Input tokens</div><div class="v">1,073,410</div></div>
<div class="card"><div class="k">Output tokens</div><div class="v">183,551</div></div>
<div class="card"><div class="k">Cache read tokens</div><div class="v">39,434,752</div></div>
<div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">40,691,713</div></div>
<div class="card"><div class="k">Input from cache</div><div class="v">97.35%</div></div>
</div>
<section>
<h2>Score Over Iterations</h2>
<svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,31.2 278.0,20.0 510.0,25.6 742.0,202.0" class="score-line" /><circle cx="46.0" cy="31.2" r="3.5" class="line-point"><title>baseline-val: 0.8062</title></circle><circle cx="278.0" cy="20.0" r="3.5" class="line-point"><title>iter-1-selected-val: 0.8250</title></circle><circle cx="510.0" cy="25.6" r="3.5" class="line-point"><title>iter-2-selected-val: 0.8157</title></circle><circle cx="742.0" cy="202.0" r="3.5" class="line-point"><title>final-test: 0.5200</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">0.83</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.75</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.67</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.60</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.52</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Cost Over Phases (USD)</h2>
<svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="51.0" y="66.0" width="23.2" height="108.0" class="cost-bar"><title>baseline-val: 1.6928</title></rect><rect x="84.1" y="171.7" width="23.2" height="2.3" class="cost-bar"><title>baseline-core: 0.0358</title></rect><rect x="117.3" y="167.0" width="23.2" height="7.0" class="cost-bar"><title>iter-1-mutation: 0.1092</title></rect><rect x="150.4" y="126.9" width="23.2" height="47.1" class="cost-bar"><title>iter-1-train-candidate-00: 0.7378</title></rect><rect x="183.5" y="77.8" width="23.2" height="96.2" class="cost-bar"><title>iter-1-train-candidate-01: 1.5082</title></rect><rect x="216.7" y="25.4" width="23.2" height="148.6" class="cost-bar"><title>iter-1-train-candidate-02: 2.3290</title></rect><rect x="249.8" y="16.0" width="23.2" height="158.0" class="cost-bar"><title>iter-1-train-candidate-03: 2.4766</title></rect><rect x="283.0" y="150.1" width="23.2" height="23.9" class="cost-bar"><title>iter-1-val-candidate-00: 0.3748</title></rect><rect x="316.1" y="37.2" width="23.2" height="136.8" class="cost-bar"><title>iter-1-val-candidate-03: 2.1446</title></rect><rect x="349.3" y="172.1" width="23.2" height="1.9" class="cost-bar"><title>iter-1-core-candidate-00: 0.0305</title></rect><rect x="382.4" y="172.2" width="23.2" height="1.8" class="cost-bar"><title>iter-1-core-candidate-03: 0.0278</title></rect><rect x="415.5" y="172.1" width="23.2" height="1.9" class="cost-bar"><title>iter-2-mutation: 0.0295</title></rect><rect x="448.7" y="27.4" width="23.2" height="146.6" class="cost-bar"><title>iter-2-train-candidate-00: 2.2984</title></rect><rect x="481.8" y="53.9" width="23.2" height="120.1" class="cost-bar"><title>iter-2-train-candidate-01: 1.8831</title></rect><rect x="515.0" y="29.0" width="23.2" height="145.0" class="cost-bar"><title>iter-2-train-candidate-02: 2.2729</title></rect><rect x="548.1" y="170.1" width="23.2" height="3.9" class="cost-bar"><title>iter-2-train-candidate-03: 0.0615</title></rect><rect x="581.3" y="79.1" width="23.2" height="94.9" class="cost-bar"><title>iter-2-val-candidate-00: 1.4868</title></rect><rect x="614.4" y="127.0" width="23.2" height="47.0" class="cost-bar"><title>iter-2-val-candidate-02: 0.7370</title></rect><rect x="647.5" y="168.9" width="23.2" height="5.1" class="cost-bar"><title>iter-2-core-candidate-00: 0.0805</title></rect><rect x="680.7" y="172.2" width="23.2" height="1.8" class="cost-bar"><title>iter-2-core-candidate-02: 0.0283</title></rect><rect x="713.8" y="34.4" width="23.2" height="139.6" class="cost-bar"><title>final-test: 2.1880</title></rect><text x="62.6" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="95.7" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="128.9" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="162.0" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="195.1" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="228.3" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="261.4" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="294.6" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="327.7" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-03</text><text x="360.9" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="394.0" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-03</text><text x="427.1" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="460.3" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="493.4" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="526.6" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="559.7" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="592.9" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="626.0" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-02</text><text x="659.1" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="692.3" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-02</text><text x="725.4" y="204" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Overlay Lineage</h2>
<div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div>
<div class="lineage">
<div class="lineage-node lineage-baseline">
<div class="lineage-title">Initial overlay</div>
<div class="lineage-sub">starting champion</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 1</div>
<div class="lineage-sub">selected candidate-03</div>
<div class="lineage-path">candidates/iter-1/candidate-03.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-03</div><div class="lineage-detail">champion after gates: iter-1 candidate-03</div><div class="lineage-detail">selected val: 0.8250</div><div class="lineage-detail">improvement: 0.0188</div><div class="lineage-detail">result: promoted</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 2</div>
<div class="lineage-sub">selected candidate-00</div>
<div class="lineage-path">candidates/iter-1/candidate-03.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">champion after gates: iter-1 candidate-03</div><div class="lineage-detail">selected val: 0.8157</div><div class="lineage-detail">improvement: -0.0093</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-promoted">
<div class="lineage-title">Promoted overlay</div>
<div class="lineage-sub">final published artifact</div>
<div class="lineage-path">promoted.overlay.md</div>
<div class="lineage-state">present</div>
</div>
</div>
</section>
<section>
<h2>Iteration Candidates</h2>
<div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 1</h3>
<div class="sub">selected candidate: 3 | result: promoted</div>
<div class="sub">gate reasons: none</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.4100</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.6150</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 1 | val 1 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7226</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.9867</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 3 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">1.0000</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.8250</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 2</h3>
<div class="sub">selected candidate: 0 | result: held previous champion</div>
<div class="sub">val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7408</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.8157</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 2 | val 2 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">fallback_copy</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.5958</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">fallback_copy</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6592</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.5850</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val 1 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.2300</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 0 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
</section>
<section>
<h2>Promotion Decisions</h2>
<table>
<thead>
<tr>
<th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th>
</tr>
</thead>
<tbody>
<tr><td>1</td><td>3</td><td>0.8250</td><td>0.0188</td><td>yes</td><td></td></tr><tr><td>2</td><td>0</td><td>0.8157</td><td>-0.0093</td><td>no</td><td>val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Token and Cost Breakdown</h2>
<table>
<thead>
<tr>
<th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>1,370,591</td><td>33,419</td><td>387,357</td><td>0</td><td>$4.7293</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>979,440</td><td>158,833</td><td>38,913,792</td><td>0</td><td>$16.9950</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>90,158</td><td>18,512</td><td>407,168</td><td>0</td><td>$0.6703</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>3,812</td><td>6,206</td><td>113,792</td><td>0</td><td>$0.1387</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Provider Invocation Breakdown</h2>
<table>
<thead>
<tr>
<th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>claude</td><td>100</td><td>4</td><td>48</td><td>48</td><td>1,073,410</td><td>183,551</td><td>39,434,752</td><td>$22.5332</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Run Diagnostics</h2>
<table>
<thead>
<tr>
<th>Metric</th><th>Value</th>
</tr>
</thead>
<tbody>
<tr><td>Run started at (UTC)</td><td>2026-03-08T19:20:32.414232+00:00</td></tr>
<tr><td>Run ended at (UTC)</td><td>2026-03-08T21:26:17.197126+00:00</td></tr>
<tr><td>Wall clock duration</td><td>2h 5m 45s</td></tr>
<tr><td>Active runtime (eval + mutator)</td><td>2h 8m 55s</td></tr>
<tr><td>Orchestration overhead</td><td>0s</td></tr>
<tr><td>Eval runtime total</td><td>2h 8m 0s</td></tr>
<tr><td>Mutator runtime total</td><td>55s</td></tr>
<tr><td>Eval tasks with runtime</td><td>48</td></tr>
<tr><td>Mutator calls with duration</td><td>4</td></tr>
<tr><td>Candidate overlays generated</td><td>8</td></tr>
<tr><td>Candidate metadata records</td><td>8</td></tr>
<tr><td>Candidate overlays train-tested</td><td>8</td></tr>
<tr><td>Candidate overlays val/core-tested</td><td>4</td></tr>
<tr><td>Promotion events</td><td>1</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Quality Flags</h2>
<ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul>
</section>
<section>
<h2>Overlay Diff (Initial vs Promoted)</h2>
<div class="sub">Initial: /home/agent/skillpa/reports/opt-booking-20260308-192032/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-booking-20260308-192032/promoted.overlay.md</div>
<div class="sub">Initial lines: 37 | Promoted lines: 64 | +27 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/inputs/initial.overlay.md</span>
<span class="diff-file">+++ opt-booking-20260308-192032/promoted.overlay.md</span>
<span class="diff-hunk">@@ -30,8 +30,35 @@</span>
<span class="diff-neutral"> - Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span>
<span class="diff-neutral"> - Use `wait --load networkidle` or `wait --text` for synchronization.</span>
<span class="diff-neutral"> - Always capture final URL via `steel browser get url`.</span>
<span class="diff-add">+- Use `fill` with the positional value argument, e.g. `steel browser fill e16 &quot;Rome&quot; --session booking-search`.</span>
<span class="diff-add">+- Do not use deprecated `--value`; use value as the second positional argument.</span>
<span class="diff-add">+- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span>
<span class="diff-add">+- Always call `steel browser stop` in success and failure paths.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Output discipline</span>
<span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span>
<span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span>
<span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Execution contract</span>
<span class="diff-add">+- Every workflow MUST begin with `steel browser start` to establish a session.</span>
<span class="diff-add">+- Every workflow MUST end with `steel browser stop` for cleanup.</span>
<span class="diff-add">+- Session ID must be captured and reused: `steel browser start --name my-session`.</span>
<span class="diff-add">+- Never omit `stop` even if commands fail mid-sequence.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Failure policy</span>
<span class="diff-add">+- If snapshot yields no matching elements, retry with `wait --load networkidle` then re-snapshot.</span>
<span class="diff-add">+- If a selector fails, fall back to broader selectors or text-based search.</span>
<span class="diff-add">+- If extraction fails after 3 attempts, return partial results with clear limitation notice.</span>
<span class="diff-add">+- Always execute `steel browser stop` regardless of task outcome.</span>
<span class="diff-add">+- Report failures with context: step number, command attempted, and error message.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Known error map</span>
<span class="diff-add">+- **Missing `--name` on start**: Always use `--name` parameter for session reusability.</span>
<span class="diff-add">+- **Omitted `stop` command**: Never end workflows without cleanup; `stop` is mandatory.</span>
<span class="diff-add">+- **Brittle CSS selectors**: Prefer snapshot refs (`@e1`) over raw CSS/XPath.</span>
<span class="diff-add">+- **No wait before extraction**: Always synchronize with `wait --load networkidle` before scraping.</span>
<span class="diff-add">+- **Hardcoded dates**: Use relative dates or user-provided values; avoid fixed date strings.</span>
<span class="diff-add">+- **Snapshot before page load**: Never snapshot before `wait` confirms page readiness.</span>
<span class="diff-add">+- **Invalid steel commands**: Do not use `click` directly; use `steel browser click` with element refs.</span>
<span class="diff-add">+- **Missing URL capture**: Always `steel browser get url` to preserve traceability.</span></pre>
</section>
<section>
<h2>Stepwise Overlay Diffs</h2>
<div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div>
<details class="journey-step">
<summary>Iteration 1: selected candidate-03</summary>
<div class="sub">opt-booking-20260308-192032/inputs/initial.overlay.md -> opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</div>
<div class="sub">selected val 0.8250 | improvement 0.0188 | promoted</div>
<div class="sub">lines: 37 -> 60 | +23 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/inputs/initial.overlay.md</span>
<span class="diff-file">+++ opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</span>
<span class="diff-hunk">@@ -35,3 +35,26 @@</span>
<span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span>
<span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span>
<span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Execution contract</span>
<span class="diff-add">+- Every workflow MUST begin with `steel browser start` to establish a session.</span>
<span class="diff-add">+- Every workflow MUST end with `steel browser stop` for cleanup.</span>
<span class="diff-add">+- Session ID must be captured and reused: `steel browser start --name my-session`.</span>
<span class="diff-add">+- Never omit `stop` even if commands fail mid-sequence.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Failure policy</span>
<span class="diff-add">+- If snapshot yields no matching elements, retry with `wait --load networkidle` then re-snapshot.</span>
<span class="diff-add">+- If a selector fails, fall back to broader selectors or text-based search.</span>
<span class="diff-add">+- If extraction fails after 3 attempts, return partial results with clear limitation notice.</span>
<span class="diff-add">+- Always execute `steel browser stop` regardless of task outcome.</span>
<span class="diff-add">+- Report failures with context: step number, command attempted, and error message.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Known error map</span>
<span class="diff-add">+- **Missing `--name` on start**: Always use `--name` parameter for session reusability.</span>
<span class="diff-add">+- **Omitted `stop` command**: Never end workflows without cleanup; `stop` is mandatory.</span>
<span class="diff-add">+- **Brittle CSS selectors**: Prefer snapshot refs (`@e1`) over raw CSS/XPath.</span>
<span class="diff-add">+- **No wait before extraction**: Always synchronize with `wait --load networkidle` before scraping.</span>
<span class="diff-add">+- **Hardcoded dates**: Use relative dates or user-provided values; avoid fixed date strings.</span>
<span class="diff-add">+- **Snapshot before page load**: Never snapshot before `wait` confirms page readiness.</span>
<span class="diff-add">+- **Invalid steel commands**: Do not use `click` directly; use `steel browser click` with element refs.</span>
<span class="diff-add">+- **Missing URL capture**: Always `steel browser get url` to preserve traceability.</span></pre>
</details>
<details class="journey-step">
<summary>Iteration 2: selected candidate-00</summary>
<div class="sub">opt-booking-20260308-192032/candidates/iter-1/candidate-03.md -> opt-booking-20260308-192032/candidates/iter-2/candidate-00.md</div>
<div class="sub">selected val 0.8157 | improvement -0.0093 | not promoted | val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</div>
<div class="sub">lines: 60 -> 60 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
<details class="journey-step">
<summary>Final champion to promoted overlay</summary>
<div class="sub">opt-booking-20260308-192032/candidates/iter-1/candidate-03.md -> opt-booking-20260308-192032/promoted.overlay.md</div>
<div class="sub">artifact diff</div>
<div class="sub">lines: 60 -> 64 | +4 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</span>
<span class="diff-file">+++ opt-booking-20260308-192032/promoted.overlay.md</span>
<span class="diff-hunk">@@ -30,6 +30,10 @@</span>
<span class="diff-neutral"> - Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span>
<span class="diff-neutral"> - Use `wait --load networkidle` or `wait --text` for synchronization.</span>
<span class="diff-neutral"> - Always capture final URL via `steel browser get url`.</span>
<span class="diff-add">+- Use `fill` with the positional value argument, e.g. `steel browser fill e16 &quot;Rome&quot; --session booking-search`.</span>
<span class="diff-add">+- Do not use deprecated `--value`; use value as the second positional argument.</span>
<span class="diff-add">+- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span>
<span class="diff-add">+- Always call `steel browser stop` in success and failure paths.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Output discipline</span>
<span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span></pre>
</details>
</section>
<div class="foot">Report schema v1 | Generated: 2026-03-10T01:45:04.380848+00:00</div>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment