dual_rater.md

  DUAL RATER PARALLEL FLOW
  ════════════════════════

  For each sample:
                      ┌─────────────────┐
                      │   Sample text   │
                      └────────┬────────┘
                               │
                ┌──────────────┴──────────────┐
                │      ThreadPoolExecutor     │
                │        max_workers=2        │
                └──────────────┬──────────────┘
                      ┌────────┴────────┐
                      ▼                 ▼
           ┌──────────────────┐ ┌──────────────────┐
           │ claude -p        │ │ claude -p        │
           │ --model haiku    │ │ --model opus     │
           │                  │ │                  │
           │ {realness: 4,    │ │ {realness: 3,    │
           │  intent: 5,      │ │  intent: 4,      │
           │  genuine: 4}     │ │  genuine: 4}     │
           └────────┬─────────┘ └─────────┬────────┘
                    │                     │
                    └──────────┬──────────┘
                               ▼
                      ┌─────────────────┐
                      │    COMBINER     │
                      ├─────────────────┤
                      │ avg_realness: 3.5│
                      │ realness_diff: 1 │
                      │ strong_agree: ✓  │
                      └─────────────────┘

  At end: Compute Pearson r across all (haiku, opus) pairs

  Run it:
  python scripts/quality_audit_dual.py --limit 50  # test
  python scripts/quality_audit_dual.py --sample-rate 0.1  # 10% of dataset
bigsnarfdude/dual_rater.md

Select an option

No results found

Select an option

No results found