SportVoice MVP — Micro-Milestone Architecture Brief

Prepared by: Ibrahim Elsherbini
For: Jason Vanderheyden
Date: February 2026

1. Scope of This Brief

This brief is intentionally scoped to the two items requested for the paid micro-milestone:

Electron audio routing: USB mic + system audio into the selected WebRTC transport (LiveKit or Daily).
Chrome Manifest V3 sync engine: ACRCloud offset matching + drift mitigation in the extension.

Out of scope for this document: vendor pricing analysis, long-term roadmap, and non-MVP product strategy.

2. High-Level Architecture

Source View

Creator Studio (Electron)
  ├─ Mic input ───────────────┐
  ├─ System audio input ──────┼─> Mix path -> WebRTC publish -> LiveKit/Daily room
  └─ System audio clean split ┘
                 |
                 v
        Sync API (backend service) -> ACRCloud (reference ingest + identify)

Viewer (Chrome MV3)
  Popup/UI -> Service Worker (orchestration) -> Offscreen Document (persistent audio runtime)
                                              ├-> Captures tab audio sample for ACRCloud matching
                                              └-> Subscribes to creator stream from LiveKit/Daily
                                                   and plays aligned creator audio

Design principle: transport and sync are decoupled. LiveKit/Daily handles low-latency delivery; synchronization decisions come from ACRCloud offset + local playback timing. Sync relies on ACRCloud + local playback clock (not provider timestamps alone), so the strategy remains stable across LiveKit or Daily.

3. Electron Studio Audio Routing (LiveKit/Daily)

Inputs

USB microphone (creator voice)
System audio (game/stadium feed)

Internal Audio Graph

System Audio -> Split
                ├-> Gain(game) -> Mixer -> Out track (creator program mix)
                └-> Clean reference channel (no creator voice)
Mic -----------> Gain(mic) ----^

Outputs

Broadcast output (mixed): published to LiveKit/Daily as the creator audio program heard by viewers.
Sync reference output (clean system audio): sent to backend for fingerprint matching reference.

Reference feed use: the clean system-audio split is used as the reference input for the ACRCloud matching workflow (exact ingest mode validated during M1).

Provider Integration (Scoped)

Implementation is provider-agnostic at architecture level:

Publisher side: use selected provider SDK/API to publish creator mixed audio track.
Subscriber side: extension subscribes to provider audio stream for playback.
No sync logic tied to provider internals: switching LiveKit <-> Daily changes publish/subscribe integration, not the sync algorithm.

OS Capture Assumptions

Windows: WASAPI loopback for system audio capture.
macOS: virtual loopback driver (for example BlackHole) required for reliable system audio routing.

4. MV3 Sync Engine (ACRCloud + Drift Control)

MV3 Component Roles

Service Worker: lifecycle orchestration, permission flow, start/stop actions.
Offscreen Document: persistent runtime for Web Audio + WebRTC playback + sync loop.
Popup/Content UI: viewer controls and status display.

Playback and Sync Sequence

Viewer clicks Start.
Service worker creates/ensures offscreen document.
Offscreen document captures tab audio sample (before muting native page audio/output).
Offscreen sends sample to backend; backend calls ACRCloud identify API.
Backend returns matched offset (play position reference).
Offscreen subscribes/plays creator stream from LiveKit/Daily aligned to returned position.
Only after alignment is confirmed, extension mutes native tab audio.

Offset Mapping

At sync points, extension resolves:

viewer_position = matched_reference_timestamp + play_offset_ms

Then maps viewer_position to creator audio timeline and updates playback cursor accordingly. The offscreen runtime compares current playback time vs target viewer_position and applies either rate nudging (small drift) or re-seek/rebuffer (large drift) to restore sync.

5. Drift Mitigation and Buffer Recovery

Drift Policy

Small drift (for example <= ~200 ms): playback-rate nudging (about +/-1% to +/-2% max).
Medium drift: short micro-seek/rebuffer adjustment.
Large drift or unstable state: hard re-sync using fresh fingerprint sample.

Re-sync Triggers

Periodic interval (for example every 30–60 seconds).
Buffering/stall detection (tab audio silence or playback stall).
Viewer seek/jump events.
Drift threshold breach.

No-match fallback (MVP): coast on last known offset for a short window, retry fingerprint, and expose manual resync control if confidence remains low.

3-Second Buffer Case (Interview Question)

If viewer network buffers for ~3 seconds:

Detect stall/silence in viewer tab.
Immediately pause creator playback (avoid speaking over frozen game).
Capture fresh tab sample after stream resumes.
Re-identify offset via ACRCloud.
Resume creator audio at corrected position (optional short crossfade for smoothness).

Result: playback re-locks to viewer’s current game moment without manual user adjustment.

6. Risks and Assumptions (MVP)

System audio capture differs by OS and requires setup validation (especially macOS loopback path).
Cross-provider commentary differences can reduce fingerprint confidence; validate with real game samples in M1.
ACRCloud no-match/outage path requires fallback behavior (coast + retry + manual +/- slider).
Drift thresholds are tuned empirically during validation (network conditions and stream variance).
Final transport SDK specifics depend on whether LiveKit or Daily is selected before implementation starts.

This brief is intentionally scoped to architecture alignment for the MVP micro-milestone.
If helpful, I can provide a separate technical appendix with implementation pseudocode and validation test cases.

drhema/SportVoiceMVP.md

Select an option

No results found