Last active
March 10, 2026 01:45
-
-
Save nibzard/fade1c6d3d099fbc56df0fcdf2ede1f2 to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-booking-20260308-192032
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!doctype html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> | |
| <title>Optimization Report - booking</title> | |
| <style> | |
| :root { | |
| --bg: #f6f7f3; | |
| --surface: #ffffff; | |
| --ink: #14212e; | |
| --muted: #5b6673; | |
| --accent: #126b79; | |
| --line: #d6dbe2; | |
| --ok: #2c7a42; | |
| --bad: #9a2e2e; | |
| } | |
| * { box-sizing: border-box; } | |
| body { | |
| margin: 0; | |
| font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif; | |
| color: var(--ink); | |
| background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%); | |
| } | |
| .wrap { max-width: 1180px; margin: 0 auto; padding: 24px; } | |
| h1 { margin: 0 0 8px; font-size: 30px; } | |
| .sub { color: var(--muted); margin-bottom: 18px; } | |
| .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; } | |
| .card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; } | |
| .k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; } | |
| .v { font-size: 24px; font-weight: 650; margin-top: 4px; } | |
| section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; } | |
| h3 { margin: 0 0 6px; font-size: 15px; } | |
| h2 { margin: 0 0 10px; font-size: 18px; } | |
| table { width: 100%; border-collapse: collapse; font-size: 13px; } | |
| th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; } | |
| th { color: var(--muted); font-weight: 600; } | |
| ul { margin: 0; padding-left: 18px; } | |
| .lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; } | |
| .lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); } | |
| .lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); } | |
| .lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); } | |
| .lineage-title { font-weight: 700; margin-bottom: 2px; } | |
| .lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .iteration-block { margin-top: 14px; } | |
| .iteration-head { margin-bottom: 8px; } | |
| .candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; } | |
| .candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; } | |
| .candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; } | |
| .candidate-title { font-weight: 700; } | |
| .badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; } | |
| .badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; } | |
| .badge-mode { background: #eef7fb; color: #25556f; } | |
| .badge-validated { background: #eef7ef; color: #2b6540; } | |
| .badge-selected { background: #fff4dc; color: #8b5a00; } | |
| .badge-promoted { background: #e8f7e9; color: #21643a; } | |
| .badge-missing { background: #fdecec; color: #8e3434; } | |
| .candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; } | |
| .metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; } | |
| .metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; } | |
| .candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; } | |
| .journey-step summary { cursor: pointer; font-weight: 650; } | |
| .grid-line { stroke: #e6ebf0; stroke-width: 1; } | |
| .axis-line { stroke: #8b97a5; stroke-width: 1; } | |
| .axis-label { fill: #647182; font-size: 11px; } | |
| .score-line { stroke: var(--accent); stroke-width: 2.3; } | |
| .line-point { fill: #126b79; } | |
| .cost-bar { fill: #2d6b9f; opacity: .82; } | |
| .diff-file { color: #9ec1ff; display: block; } | |
| .diff-hunk { color: #ffd479; display: block; } | |
| .diff-add { color: #65d28d; background: #123321; display: block; } | |
| .diff-del { color: #ff9da1; background: #3b1818; display: block; } | |
| .diff-neutral { color: #d9e4ef; display: block; } | |
| .foot { color: var(--muted); font-size: 12px; margin-top: 16px; } | |
| </style> | |
| </head> | |
| <body> | |
| <div class="wrap"> | |
| <h1>Optimization Report</h1> | |
| <div class="sub">Domain: <strong>booking</strong> | Run generated: 2026-03-08T21:26:17.197126+00:00</div> | |
| <div class="cards"> | |
| <div class="card"><div class="k">Baseline score</div><div class="v">0.8062</div></div> | |
| <div class="card"><div class="k">Final test score</div><div class="v">0.5200</div></div> | |
| <div class="card"><div class="k">Delta</div><div class="v">-0.2862</div></div> | |
| <div class="card"><div class="k">Total cost</div><div class="v">$22.5332</div></div> | |
| <div class="card"><div class="k">Wall clock</div><div class="v">2h 5m 45s</div></div> | |
| <div class="card"><div class="k">Total invocations</div><div class="v">100</div></div> | |
| <div class="card"><div class="k">Input tokens</div><div class="v">1,073,410</div></div> | |
| <div class="card"><div class="k">Output tokens</div><div class="v">183,551</div></div> | |
| <div class="card"><div class="k">Cache read tokens</div><div class="v">39,434,752</div></div> | |
| <div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">40,691,713</div></div> | |
| <div class="card"><div class="k">Input from cache</div><div class="v">97.35%</div></div> | |
| </div> | |
| <section> | |
| <h2>Score Over Iterations</h2> | |
| <svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,31.2 278.0,20.0 510.0,25.6 742.0,202.0" class="score-line" /><circle cx="46.0" cy="31.2" r="3.5" class="line-point"><title>baseline-val: 0.8062</title></circle><circle cx="278.0" cy="20.0" r="3.5" class="line-point"><title>iter-1-selected-val: 0.8250</title></circle><circle cx="510.0" cy="25.6" r="3.5" class="line-point"><title>iter-2-selected-val: 0.8157</title></circle><circle cx="742.0" cy="202.0" r="3.5" class="line-point"><title>final-test: 0.5200</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">0.83</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.75</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.67</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.60</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.52</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Cost Over Phases (USD)</h2> | |
| <svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="51.0" y="66.0" width="23.2" height="108.0" class="cost-bar"><title>baseline-val: 1.6928</title></rect><rect x="84.1" y="171.7" width="23.2" height="2.3" class="cost-bar"><title>baseline-core: 0.0358</title></rect><rect x="117.3" y="167.0" width="23.2" height="7.0" class="cost-bar"><title>iter-1-mutation: 0.1092</title></rect><rect x="150.4" y="126.9" width="23.2" height="47.1" class="cost-bar"><title>iter-1-train-candidate-00: 0.7378</title></rect><rect x="183.5" y="77.8" width="23.2" height="96.2" class="cost-bar"><title>iter-1-train-candidate-01: 1.5082</title></rect><rect x="216.7" y="25.4" width="23.2" height="148.6" class="cost-bar"><title>iter-1-train-candidate-02: 2.3290</title></rect><rect x="249.8" y="16.0" width="23.2" height="158.0" class="cost-bar"><title>iter-1-train-candidate-03: 2.4766</title></rect><rect x="283.0" y="150.1" width="23.2" height="23.9" class="cost-bar"><title>iter-1-val-candidate-00: 0.3748</title></rect><rect x="316.1" y="37.2" width="23.2" height="136.8" class="cost-bar"><title>iter-1-val-candidate-03: 2.1446</title></rect><rect x="349.3" y="172.1" width="23.2" height="1.9" class="cost-bar"><title>iter-1-core-candidate-00: 0.0305</title></rect><rect x="382.4" y="172.2" width="23.2" height="1.8" class="cost-bar"><title>iter-1-core-candidate-03: 0.0278</title></rect><rect x="415.5" y="172.1" width="23.2" height="1.9" class="cost-bar"><title>iter-2-mutation: 0.0295</title></rect><rect x="448.7" y="27.4" width="23.2" height="146.6" class="cost-bar"><title>iter-2-train-candidate-00: 2.2984</title></rect><rect x="481.8" y="53.9" width="23.2" height="120.1" class="cost-bar"><title>iter-2-train-candidate-01: 1.8831</title></rect><rect x="515.0" y="29.0" width="23.2" height="145.0" class="cost-bar"><title>iter-2-train-candidate-02: 2.2729</title></rect><rect x="548.1" y="170.1" width="23.2" height="3.9" class="cost-bar"><title>iter-2-train-candidate-03: 0.0615</title></rect><rect x="581.3" y="79.1" width="23.2" height="94.9" class="cost-bar"><title>iter-2-val-candidate-00: 1.4868</title></rect><rect x="614.4" y="127.0" width="23.2" height="47.0" class="cost-bar"><title>iter-2-val-candidate-02: 0.7370</title></rect><rect x="647.5" y="168.9" width="23.2" height="5.1" class="cost-bar"><title>iter-2-core-candidate-00: 0.0805</title></rect><rect x="680.7" y="172.2" width="23.2" height="1.8" class="cost-bar"><title>iter-2-core-candidate-02: 0.0283</title></rect><rect x="713.8" y="34.4" width="23.2" height="139.6" class="cost-bar"><title>final-test: 2.1880</title></rect><text x="62.6" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="95.7" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="128.9" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="162.0" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="195.1" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="228.3" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="261.4" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="294.6" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="327.7" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-03</text><text x="360.9" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="394.0" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-03</text><text x="427.1" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="460.3" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="493.4" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="526.6" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="559.7" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="592.9" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="626.0" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-02</text><text x="659.1" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="692.3" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-02</text><text x="725.4" y="204" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Overlay Lineage</h2> | |
| <div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div> | |
| <div class="lineage"> | |
| <div class="lineage-node lineage-baseline"> | |
| <div class="lineage-title">Initial overlay</div> | |
| <div class="lineage-sub">starting champion</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 1</div> | |
| <div class="lineage-sub">selected candidate-03</div> | |
| <div class="lineage-path">candidates/iter-1/candidate-03.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-03</div><div class="lineage-detail">champion after gates: iter-1 candidate-03</div><div class="lineage-detail">selected val: 0.8250</div><div class="lineage-detail">improvement: 0.0188</div><div class="lineage-detail">result: promoted</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 2</div> | |
| <div class="lineage-sub">selected candidate-00</div> | |
| <div class="lineage-path">candidates/iter-1/candidate-03.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">champion after gates: iter-1 candidate-03</div><div class="lineage-detail">selected val: 0.8157</div><div class="lineage-detail">improvement: -0.0093</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-promoted"> | |
| <div class="lineage-title">Promoted overlay</div> | |
| <div class="lineage-sub">final published artifact</div> | |
| <div class="lineage-path">promoted.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Iteration Candidates</h2> | |
| <div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 1</h3> | |
| <div class="sub">selected candidate: 3 | result: promoted</div> | |
| <div class="sub">gate reasons: none</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.4100</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.6150</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 1 | val 1 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7226</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.9867</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">1.0000</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.8250</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 2</h3> | |
| <div class="sub">selected candidate: 0 | result: held previous champion</div> | |
| <div class="sub">val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7408</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.8157</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 2 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">fallback_copy</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.5958</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">fallback_copy</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6592</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.5850</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 1 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.2300</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 0 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Promotion Decisions</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>1</td><td>3</td><td>0.8250</td><td>0.0188</td><td>yes</td><td></td></tr><tr><td>2</td><td>0</td><td>0.8157</td><td>-0.0093</td><td>no</td><td>val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Token and Cost Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>1,370,591</td><td>33,419</td><td>387,357</td><td>0</td><td>$4.7293</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>979,440</td><td>158,833</td><td>38,913,792</td><td>0</td><td>$16.9950</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>90,158</td><td>18,512</td><td>407,168</td><td>0</td><td>$0.6703</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>3,812</td><td>6,206</td><td>113,792</td><td>0</td><td>$0.1387</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Provider Invocation Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>claude</td><td>100</td><td>4</td><td>48</td><td>48</td><td>1,073,410</td><td>183,551</td><td>39,434,752</td><td>$22.5332</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Run Diagnostics</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Metric</th><th>Value</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>Run started at (UTC)</td><td>2026-03-08T19:20:32.414232+00:00</td></tr> | |
| <tr><td>Run ended at (UTC)</td><td>2026-03-08T21:26:17.197126+00:00</td></tr> | |
| <tr><td>Wall clock duration</td><td>2h 5m 45s</td></tr> | |
| <tr><td>Active runtime (eval + mutator)</td><td>2h 8m 55s</td></tr> | |
| <tr><td>Orchestration overhead</td><td>0s</td></tr> | |
| <tr><td>Eval runtime total</td><td>2h 8m 0s</td></tr> | |
| <tr><td>Mutator runtime total</td><td>55s</td></tr> | |
| <tr><td>Eval tasks with runtime</td><td>48</td></tr> | |
| <tr><td>Mutator calls with duration</td><td>4</td></tr> | |
| <tr><td>Candidate overlays generated</td><td>8</td></tr> | |
| <tr><td>Candidate metadata records</td><td>8</td></tr> | |
| <tr><td>Candidate overlays train-tested</td><td>8</td></tr> | |
| <tr><td>Candidate overlays val/core-tested</td><td>4</td></tr> | |
| <tr><td>Promotion events</td><td>1</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Quality Flags</h2> | |
| <ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul> | |
| </section> | |
| <section> | |
| <h2>Overlay Diff (Initial vs Promoted)</h2> | |
| <div class="sub">Initial: /home/agent/skillpa/reports/opt-booking-20260308-192032/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-booking-20260308-192032/promoted.overlay.md</div> | |
| <div class="sub">Initial lines: 37 | Promoted lines: 64 | +27 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/inputs/initial.overlay.md</span> | |
| <span class="diff-file">+++ opt-booking-20260308-192032/promoted.overlay.md</span> | |
| <span class="diff-hunk">@@ -30,8 +30,35 @@</span> | |
| <span class="diff-neutral"> - Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span> | |
| <span class="diff-neutral"> - Use `wait --load networkidle` or `wait --text` for synchronization.</span> | |
| <span class="diff-neutral"> - Always capture final URL via `steel browser get url`.</span> | |
| <span class="diff-add">+- Use `fill` with the positional value argument, e.g. `steel browser fill e16 "Rome" --session booking-search`.</span> | |
| <span class="diff-add">+- Do not use deprecated `--value`; use value as the second positional argument.</span> | |
| <span class="diff-add">+- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span> | |
| <span class="diff-add">+- Always call `steel browser stop` in success and failure paths.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Output discipline</span> | |
| <span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span> | |
| <span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span> | |
| <span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Execution contract</span> | |
| <span class="diff-add">+- Every workflow MUST begin with `steel browser start` to establish a session.</span> | |
| <span class="diff-add">+- Every workflow MUST end with `steel browser stop` for cleanup.</span> | |
| <span class="diff-add">+- Session ID must be captured and reused: `steel browser start --name my-session`.</span> | |
| <span class="diff-add">+- Never omit `stop` even if commands fail mid-sequence.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Failure policy</span> | |
| <span class="diff-add">+- If snapshot yields no matching elements, retry with `wait --load networkidle` then re-snapshot.</span> | |
| <span class="diff-add">+- If a selector fails, fall back to broader selectors or text-based search.</span> | |
| <span class="diff-add">+- If extraction fails after 3 attempts, return partial results with clear limitation notice.</span> | |
| <span class="diff-add">+- Always execute `steel browser stop` regardless of task outcome.</span> | |
| <span class="diff-add">+- Report failures with context: step number, command attempted, and error message.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Known error map</span> | |
| <span class="diff-add">+- **Missing `--name` on start**: Always use `--name` parameter for session reusability.</span> | |
| <span class="diff-add">+- **Omitted `stop` command**: Never end workflows without cleanup; `stop` is mandatory.</span> | |
| <span class="diff-add">+- **Brittle CSS selectors**: Prefer snapshot refs (`@e1`) over raw CSS/XPath.</span> | |
| <span class="diff-add">+- **No wait before extraction**: Always synchronize with `wait --load networkidle` before scraping.</span> | |
| <span class="diff-add">+- **Hardcoded dates**: Use relative dates or user-provided values; avoid fixed date strings.</span> | |
| <span class="diff-add">+- **Snapshot before page load**: Never snapshot before `wait` confirms page readiness.</span> | |
| <span class="diff-add">+- **Invalid steel commands**: Do not use `click` directly; use `steel browser click` with element refs.</span> | |
| <span class="diff-add">+- **Missing URL capture**: Always `steel browser get url` to preserve traceability.</span></pre> | |
| </section> | |
| <section> | |
| <h2>Stepwise Overlay Diffs</h2> | |
| <div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div> | |
| <details class="journey-step"> | |
| <summary>Iteration 1: selected candidate-03</summary> | |
| <div class="sub">opt-booking-20260308-192032/inputs/initial.overlay.md -> opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</div> | |
| <div class="sub">selected val 0.8250 | improvement 0.0188 | promoted</div> | |
| <div class="sub">lines: 37 -> 60 | +23 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/inputs/initial.overlay.md</span> | |
| <span class="diff-file">+++ opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</span> | |
| <span class="diff-hunk">@@ -35,3 +35,26 @@</span> | |
| <span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span> | |
| <span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span> | |
| <span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Execution contract</span> | |
| <span class="diff-add">+- Every workflow MUST begin with `steel browser start` to establish a session.</span> | |
| <span class="diff-add">+- Every workflow MUST end with `steel browser stop` for cleanup.</span> | |
| <span class="diff-add">+- Session ID must be captured and reused: `steel browser start --name my-session`.</span> | |
| <span class="diff-add">+- Never omit `stop` even if commands fail mid-sequence.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Failure policy</span> | |
| <span class="diff-add">+- If snapshot yields no matching elements, retry with `wait --load networkidle` then re-snapshot.</span> | |
| <span class="diff-add">+- If a selector fails, fall back to broader selectors or text-based search.</span> | |
| <span class="diff-add">+- If extraction fails after 3 attempts, return partial results with clear limitation notice.</span> | |
| <span class="diff-add">+- Always execute `steel browser stop` regardless of task outcome.</span> | |
| <span class="diff-add">+- Report failures with context: step number, command attempted, and error message.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Known error map</span> | |
| <span class="diff-add">+- **Missing `--name` on start**: Always use `--name` parameter for session reusability.</span> | |
| <span class="diff-add">+- **Omitted `stop` command**: Never end workflows without cleanup; `stop` is mandatory.</span> | |
| <span class="diff-add">+- **Brittle CSS selectors**: Prefer snapshot refs (`@e1`) over raw CSS/XPath.</span> | |
| <span class="diff-add">+- **No wait before extraction**: Always synchronize with `wait --load networkidle` before scraping.</span> | |
| <span class="diff-add">+- **Hardcoded dates**: Use relative dates or user-provided values; avoid fixed date strings.</span> | |
| <span class="diff-add">+- **Snapshot before page load**: Never snapshot before `wait` confirms page readiness.</span> | |
| <span class="diff-add">+- **Invalid steel commands**: Do not use `click` directly; use `steel browser click` with element refs.</span> | |
| <span class="diff-add">+- **Missing URL capture**: Always `steel browser get url` to preserve traceability.</span></pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 2: selected candidate-00</summary> | |
| <div class="sub">opt-booking-20260308-192032/candidates/iter-1/candidate-03.md -> opt-booking-20260308-192032/candidates/iter-2/candidate-00.md</div> | |
| <div class="sub">selected val 0.8157 | improvement -0.0093 | not promoted | val_improvement_below_threshold: improvement=-0.0093, threshold=0.0100</div> | |
| <div class="sub">lines: 60 -> 60 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Final champion to promoted overlay</summary> | |
| <div class="sub">opt-booking-20260308-192032/candidates/iter-1/candidate-03.md -> opt-booking-20260308-192032/promoted.overlay.md</div> | |
| <div class="sub">artifact diff</div> | |
| <div class="sub">lines: 60 -> 64 | +4 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260308-192032/candidates/iter-1/candidate-03.md</span> | |
| <span class="diff-file">+++ opt-booking-20260308-192032/promoted.overlay.md</span> | |
| <span class="diff-hunk">@@ -30,6 +30,10 @@</span> | |
| <span class="diff-neutral"> - Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span> | |
| <span class="diff-neutral"> - Use `wait --load networkidle` or `wait --text` for synchronization.</span> | |
| <span class="diff-neutral"> - Always capture final URL via `steel browser get url`.</span> | |
| <span class="diff-add">+- Use `fill` with the positional value argument, e.g. `steel browser fill e16 "Rome" --session booking-search`.</span> | |
| <span class="diff-add">+- Do not use deprecated `--value`; use value as the second positional argument.</span> | |
| <span class="diff-add">+- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span> | |
| <span class="diff-add">+- Always call `steel browser stop` in success and failure paths.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Output discipline</span> | |
| <span class="diff-neutral"> - Commands must include session lifecycle (`start` and `stop`).</span></pre> | |
| </details> | |
| </section> | |
| <div class="foot">Report schema v1 | Generated: 2026-03-10T01:45:04.380848+00:00</div> | |
| </div> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment