Skip to content

Instantly share code, notes, and snippets.

@algesten
Created December 4, 2025 15:36
Show Gist options
  • Select an option

  • Save algesten/e7c2f988ee46465c19a902fe64a89065 to your computer and use it in GitHub Desktop.

Select an option

Save algesten/e7c2f988ee46465c19a902fe64a89065 to your computer and use it in GitHub Desktop.

Probe Clusters in str0m

This document outlines a plan for implementing probe clusters in str0m, matching libWebRTC's bandwidth probing approach.

Scope

This plan covers probe clusters for existing media RTX/padding only. Pre-media SSRC 0 probing is out of scope and will be addressed in a separate PR that builds on this infrastructure.

Current State

str0m currently has basic padding support:

  • Pacer sets padding_rate based on min(estimate, desired_bitrate)
  • Padding is sent as RTX packets (retransmitting old packets or pure padding)
  • No concept of probe clusters - padding is continuous as long as pacer requests it
  • No coordination with TWCC feedback for probe evaluation

libWebRTC's Probe Cluster Approach

Key Concepts

A probe cluster is a short burst of packets sent at a target bitrate to measure available bandwidth. Key properties:

Parameter Default Value Purpose
min_probe_duration 15ms Minimum time to sustain probe rate
min_probe_packets_sent 5 Minimum packets per cluster
min_probe_delta 2ms Minimum time between probe packets
max_probe_bitrate 5 Mbit/s Cap on probe rate
max_waiting_time 1 second Wait for TWCC feedback before next probe

Probe Cluster Lifecycle

1. Create probe cluster with target bitrate
2. Transition to Active state when media packet is sent
3. Send packets at target rate for min_probe_duration
4. Complete cluster when both conditions met:
   - sent_bytes >= min_bytes (rate × duration)
   - sent_packets >= min_probe_packets_sent
5. Wait for TWCC feedback (up to 1 second)
6. Evaluate results and potentially create next probe cluster

Probing States

┌──────────┐    create cluster    ┌──────────┐
│ Disabled │ ─────────────────────► Inactive │
└──────────┘                      └────┬─────┘
                                       │
                             media sent│
                                       ▼
                                  ┌─────────┐
                                  │  Active │
                                  └────┬────┘
                                       │
                         cluster done  │
                                       ▼
                                  ┌──────────┐
                                  │ Inactive │ (wait for feedback)
                                  └──────────┘

When Probes Are Created

libWebRTC creates probe clusters at specific triggers:

  1. Initial probing - On first media packet (exponential: 3x and 6x initial estimate)
  2. Allocation change - When max allocated bitrate increases
  3. ALR exit - When exiting Application Limited Region
  4. Recovery - After significant bitrate drop
  5. Network state - Based on network state estimator hints

Cluster ID and TWCC Correlation

libWebRTC assigns each probe cluster a unique ID. This ID is NOT sent in the RTP packets themselves. Instead:

  1. The PacedPacketInfo struct carries the cluster ID through the pacing/sending chain
  2. When a packet is sent, its TWCC sequence number is associated with the cluster ID internally
  3. When TWCC feedback arrives, the received sequence numbers are matched to their cluster IDs
  4. This allows the ProbeBitrateEstimator to calculate bitrate estimates per cluster
Sending side:
┌─────────────┐    PacedPacketInfo     ┌─────────────┐
│ ProbeCluster│ ──────────────────────► RTP Packet  │
│  id: 42     │    (cluster_id: 42)    │ twcc_seq: 5 │
└─────────────┘                        └─────────────┘
                                              │
Internal tracking:                            │
  twcc_seq 5 → cluster 42                     │
  twcc_seq 6 → cluster 42                     ▼
  twcc_seq 7 → cluster 42              ┌─────────────┐
                                       │TWCC Feedback│
Feedback arrives:                      │ seq 5,6,7   │
  Match seq 5,6,7 to cluster 42        └─────────────┘
  Calculate probe bitrate for cluster 42

Application Limited Region (ALR)

ALR (Application Limited Region) is when the sending application isn't requesting enough bandwidth to fill the available network capacity. This is NOT about whether we can send padding - it's about application demand.

Key insight: Padding is driven by desired_bitrate:

padding_rate = min(estimate, desired_bitrate) - current_bitrate

Even with a full RTX cache, padding only happens when the application asks for more bandwidth than it's currently using. If desired_bitrate is low, no padding is requested.

ALR examples:

  • Video paused → no new frames, low desired_bitrate
  • Screen share with static content → very few frames needed
  • Application reduces quality tier → lower desired_bitrate
  • Audio-only period in video call

In ALR state:

  • Send queue is consistently empty
  • Sending well below estimated capacity
  • But application hasn't requested more bandwidth
  • BWE estimates become stale (no congestion signal)

Why probe when exiting ALR?

When exiting ALR (e.g., video resumes), the application suddenly wants high bitrate again, but the BWE estimate is stale from the idle period. Probing rediscovers actual network capacity.

Detection methods:

  • Track if send queue is consistently empty
  • Compare actual send rate to estimated capacity
  • Check if desired_bitrate < estimate for extended period

Proposed Design for str0m

New Types

/// Configuration for probe clusters
pub struct ProbeClusterConfig {
    /// Target bitrate for this probe
    pub target_rate: Bitrate,
    /// Minimum duration to sustain probe
    pub min_duration: Duration,
    /// Minimum packets to send
    pub min_packets: usize,
    /// Minimum time between packets
    pub min_delta: Duration,
    /// Unique cluster ID for TWCC correlation
    pub id: u32,
}

/// State of a probe cluster
pub struct ProbeCluster {
    config: ProbeClusterConfig,
    created_at: Instant,
    started_at: Option<Instant>,
    sent_bytes: usize,
    sent_packets: usize,
}

/// Probe cluster manager
pub struct BitrateProber {
    state: ProbingState,
    clusters: VecDeque<ProbeCluster>,
    next_probe_time: Instant,
    next_cluster_id: u32,
    config: BitrateProberConfig,
}

pub struct BitrateProberConfig {
    pub max_probe_bitrate: Bitrate,           // 5 Mbit/s
    pub min_probe_duration: Duration,          // 15ms
    pub min_probe_packets: usize,              // 5
    pub min_probe_delta: Duration,             // 2ms
    pub max_probe_delay: Duration,             // 10ms (discard if delayed)
    pub cluster_timeout: Duration,             // 5 seconds
}

Integration Points

1. Pacer Changes

The pacer needs to understand probe clusters:

impl Pacer {
    /// Check if we should send a probe packet now
    fn should_send_probe(&self, now: Instant) -> Option<&ProbeCluster> {
        self.prober.current_cluster(now)
    }
    
    /// Register that a probe packet was sent
    fn probe_sent(&mut self, now: Instant, size: DataSize) {
        self.prober.probe_sent(now, size);
    }
    
    /// Get recommended probe packet size
    fn recommended_probe_size(&self) -> DataSize {
        self.prober.recommended_min_probe_size()
    }
}

2. Probe Controller

A new component to decide WHEN to probe:

pub struct ProbeController {
    config: ProbeControllerConfig,
    
    // State
    max_bitrate: Bitrate,
    estimated_bitrate: Option<Bitrate>,
    last_probe_time: Instant,
    
    // Triggers
    in_alr: bool,
    allocation_changed: bool,
}

impl ProbeController {
    /// Called on BWE estimate update
    fn on_estimate(&mut self, estimate: Bitrate) -> Vec<ProbeClusterConfig>;
    
    /// Called when max allocated bitrate changes  
    fn on_max_allocation(&mut self, rate: Bitrate) -> Vec<ProbeClusterConfig>;
    
    /// Called periodically
    fn process(&mut self, now: Instant) -> Vec<ProbeClusterConfig>;
}

3. Session Integration

impl Session {
    fn handle_timeout(&mut self, now: Instant) {
        // Check for new probe clusters
        if let Some(bwe) = &mut self.bwe {
            let new_clusters = bwe.probe_controller.process(now);
            for cluster in new_clusters {
                self.pacer.create_probe_cluster(cluster);
            }
        }
        
        // Existing pacer/queue logic...
    }
}

Padding Mechanism (RTX-based)

str0m already prefers RTX payload padding over pure padding. In StreamTx::poll_packet_padding():

// From src/streams/send.rs
if self.padding > MIN_SPURIOUS_PADDING_SIZE {  // 50 bytes
    // Try to find a cached packet to retransmit as padding
    if let Some(pkt) = self.rtx_cache.get_cached_packet_smaller_than(max_size) {
        // Use RTX retransmission (larger, more efficient)
        return Some(NextPacket { kind: NextPacketKind::Resend(...) });
    }
}
// Fall back to blank padding (max 240 bytes)
return Some(NextPacket { kind: NextPacketKind::Blank(...) });

This means:

  • For padding requests > 50 bytes: Try RTX first (can be ~1200 bytes)
  • Fall back to pure padding only if no suitable cached packet exists
  • Pure padding limited to 240 bytes per packet

Probe clusters will use this existing mechanism - they just control WHEN and HOW MUCH padding is requested.

Implementation Plan

Phase 1: BitrateProber (Core)

  1. Create src/packet/prober.rs with:

    • ProbeCluster struct
    • BitrateProber struct
    • Probe state machine (Disabled → Inactive → Active)
    • create_cluster(), current_cluster(), probe_sent()
  2. Unit tests for:

    • Cluster lifecycle
    • State transitions
    • Timing constraints

Phase 2: Pacer Integration

  1. Add BitrateProber to Pacer
  2. Modify padding generation to respect probe cluster timing
  3. Track probe packet sends via probe_sent()
  4. Handle recommended_probe_size()

Phase 3: ProbeController

  1. Create src/packet/probe_controller.rs
  2. Implement probe triggers:
    • Initial exponential probing (on first media)
    • Allocation-based probing
  3. Wire into BWE feedback loop

Phase 4: Configuration

  1. Add probe cluster settings to RtcConfig
  2. Expose key tunables:
    • max_probe_bitrate
    • min_probe_duration
    • Enable/disable probing

Future Work (Out of Scope)

  • Pre-media SSRC 0 probing: Will build on this infrastructure in a separate PR
  • ALR detection: Application Limited Region detection for smarter probing
  • Network state estimator: Additional probing hints

Testing Strategy

Unit Tests

  • Probe cluster state machine
  • Timing constraints (min_delta, min_duration)
  • Cluster completion conditions
  • Multiple pending clusters

Integration Tests

  • Probe clusters generate expected packet counts
  • TWCC feedback correlates with probe cluster IDs
  • Probing respects bitrate caps
  • Probing triggers on allocation changes

Performance Tests

  • CPU usage at max probe rate
  • Memory usage with multiple pending clusters
  • Latency impact during probing

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment