DUAL RATER PARALLEL FLOW
════════════════════════
For each sample:
┌─────────────────┐
│ Sample text │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ ThreadPoolExecutor │
│ max_workers=2 │
└──────────────┬──────────────┘
┌────────┴────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ claude -p │ │ claude -p │
│ --model haiku │ │ --model opus │
│ │ │ │
│ {realness: 4, │ │ {realness: 3, │
│ intent: 5, │ │ intent: 4, │
│ genuine: 4} │ │ genuine: 4} │
└────────┬─────────┘ └─────────┬────────┘
│ │
└──────────┬──────────┘
▼
┌─────────────────┐
│ COMBINER │
├─────────────────┤
│ avg_realness: 3.5│
│ realness_diff: 1 │
│ strong_agree: ✓ │
└─────────────────┘
At end: Compute Pearson r across all (haiku, opus) pairs
Run it:
python scripts/quality_audit_dual.py --limit 50 # test
python scripts/quality_audit_dual.py --sample-rate 0.1 # 10% of dataset
Created
January 23, 2026 15:59
-
-
Save bigsnarfdude/4a3536694cf8212be1be146beca37be6 to your computer and use it in GitHub Desktop.
dual_rater.md
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment