Prepared by: Ibrahim Elsherbini
For: Jason Vanderheyden
Date: February 2026
This brief is intentionally scoped to the two items requested for the paid micro-milestone:
- Electron audio routing: USB mic + system audio into the selected WebRTC transport (LiveKit or Daily).
- Chrome Manifest V3 sync engine: ACRCloud offset matching + drift mitigation in the extension.
Out of scope for this document: vendor pricing analysis, long-term roadmap, and non-MVP product strategy.
Source View
Creator Studio (Electron)
├─ Mic input ───────────────┐
├─ System audio input ──────┼─> Mix path -> WebRTC publish -> LiveKit/Daily room
└─ System audio clean split ┘
|
v
Sync API (backend service) -> ACRCloud (reference ingest + identify)
Viewer (Chrome MV3)
Popup/UI -> Service Worker (orchestration) -> Offscreen Document (persistent audio runtime)
├-> Captures tab audio sample for ACRCloud matching
└-> Subscribes to creator stream from LiveKit/Daily
and plays aligned creator audio
Design principle: transport and sync are decoupled. LiveKit/Daily handles low-latency delivery; synchronization decisions come from ACRCloud offset + local playback timing. Sync relies on ACRCloud + local playback clock (not provider timestamps alone), so the strategy remains stable across LiveKit or Daily.
- USB microphone (creator voice)
- System audio (game/stadium feed)
System Audio -> Split
├-> Gain(game) -> Mixer -> Out track (creator program mix)
└-> Clean reference channel (no creator voice)
Mic -----------> Gain(mic) ----^
- Broadcast output (mixed): published to LiveKit/Daily as the creator audio program heard by viewers.
- Sync reference output (clean system audio): sent to backend for fingerprint matching reference.
Reference feed use: the clean system-audio split is used as the reference input for the ACRCloud matching workflow (exact ingest mode validated during M1).
Implementation is provider-agnostic at architecture level:
- Publisher side: use selected provider SDK/API to publish creator mixed audio track.
- Subscriber side: extension subscribes to provider audio stream for playback.
- No sync logic tied to provider internals: switching LiveKit <-> Daily changes publish/subscribe integration, not the sync algorithm.
- Windows: WASAPI loopback for system audio capture.
- macOS: virtual loopback driver (for example BlackHole) required for reliable system audio routing.
- Service Worker: lifecycle orchestration, permission flow, start/stop actions.
- Offscreen Document: persistent runtime for Web Audio + WebRTC playback + sync loop.
- Popup/Content UI: viewer controls and status display.
- Viewer clicks Start.
- Service worker creates/ensures offscreen document.
- Offscreen document captures tab audio sample (before muting native page audio/output).
- Offscreen sends sample to backend; backend calls ACRCloud identify API.
- Backend returns matched offset (play position reference).
- Offscreen subscribes/plays creator stream from LiveKit/Daily aligned to returned position.
- Only after alignment is confirmed, extension mutes native tab audio.
At sync points, extension resolves:
viewer_position = matched_reference_timestamp + play_offset_ms
Then maps viewer_position to creator audio timeline and updates playback cursor accordingly. The offscreen runtime compares current playback time vs target viewer_position and applies either rate nudging (small drift) or re-seek/rebuffer (large drift) to restore sync.
- Small drift (for example <= ~200 ms): playback-rate nudging (about +/-1% to +/-2% max).
- Medium drift: short micro-seek/rebuffer adjustment.
- Large drift or unstable state: hard re-sync using fresh fingerprint sample.
- Periodic interval (for example every 30–60 seconds).
- Buffering/stall detection (tab audio silence or playback stall).
- Viewer seek/jump events.
- Drift threshold breach.
No-match fallback (MVP): coast on last known offset for a short window, retry fingerprint, and expose manual resync control if confidence remains low.
If viewer network buffers for ~3 seconds:
- Detect stall/silence in viewer tab.
- Immediately pause creator playback (avoid speaking over frozen game).
- Capture fresh tab sample after stream resumes.
- Re-identify offset via ACRCloud.
- Resume creator audio at corrected position (optional short crossfade for smoothness).
Result: playback re-locks to viewer’s current game moment without manual user adjustment.
- System audio capture differs by OS and requires setup validation (especially macOS loopback path).
- Cross-provider commentary differences can reduce fingerprint confidence; validate with real game samples in M1.
- ACRCloud no-match/outage path requires fallback behavior (coast + retry + manual +/- slider).
- Drift thresholds are tuned empirically during validation (network conditions and stream variance).
- Final transport SDK specifics depend on whether LiveKit or Daily is selected before implementation starts.
This brief is intentionally scoped to architecture alignment for the MVP micro-milestone.
If helpful, I can provide a separate technical appendix with implementation pseudocode and validation test cases.
