Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created January 23, 2026 15:59
Show Gist options
  • Select an option

  • Save bigsnarfdude/4a3536694cf8212be1be146beca37be6 to your computer and use it in GitHub Desktop.

Select an option

Save bigsnarfdude/4a3536694cf8212be1be146beca37be6 to your computer and use it in GitHub Desktop.
dual_rater.md
  DUAL RATER PARALLEL FLOW
  ════════════════════════

  For each sample:
                      ┌─────────────────┐
                      │   Sample text   │
                      └────────┬────────┘
                               │
                ┌──────────────┴──────────────┐
                │      ThreadPoolExecutor     │
                │        max_workers=2        │
                └──────────────┬──────────────┘
                      ┌────────┴────────┐
                      ▼                 ▼
           ┌──────────────────┐ ┌──────────────────┐
           │ claude -p        │ │ claude -p        │
           │ --model haiku    │ │ --model opus     │
           │                  │ │                  │
           │ {realness: 4,    │ │ {realness: 3,    │
           │  intent: 5,      │ │  intent: 4,      │
           │  genuine: 4}     │ │  genuine: 4}     │
           └────────┬─────────┘ └─────────┬────────┘
                    │                     │
                    └──────────┬──────────┘
                               ▼
                      ┌─────────────────┐
                      │    COMBINER     │
                      ├─────────────────┤
                      │ avg_realness: 3.5│
                      │ realness_diff: 1 │
                      │ strong_agree: ✓  │
                      └─────────────────┘

  At end: Compute Pearson r across all (haiku, opus) pairs

  Run it:
  python scripts/quality_audit_dual.py --limit 50  # test
  python scripts/quality_audit_dual.py --sample-rate 0.1  # 10% of dataset
  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment