This document outlines a plan for implementing probe clusters in str0m, matching libWebRTC's bandwidth probing approach.
This plan covers probe clusters for existing media RTX/padding only. Pre-media SSRC 0 probing is out of scope and will be addressed in a separate PR that builds on this infrastructure.
str0m currently has basic padding support:
- Pacer sets
padding_ratebased onmin(estimate, desired_bitrate) - Padding is sent as RTX packets (retransmitting old packets or pure padding)
- No concept of probe clusters - padding is continuous as long as pacer requests it
- No coordination with TWCC feedback for probe evaluation
A probe cluster is a short burst of packets sent at a target bitrate to measure available bandwidth. Key properties:
| Parameter | Default Value | Purpose |
|---|---|---|
min_probe_duration |
15ms | Minimum time to sustain probe rate |
min_probe_packets_sent |
5 | Minimum packets per cluster |
min_probe_delta |
2ms | Minimum time between probe packets |
max_probe_bitrate |
5 Mbit/s | Cap on probe rate |
max_waiting_time |
1 second | Wait for TWCC feedback before next probe |
1. Create probe cluster with target bitrate
2. Transition to Active state when media packet is sent
3. Send packets at target rate for min_probe_duration
4. Complete cluster when both conditions met:
- sent_bytes >= min_bytes (rate × duration)
- sent_packets >= min_probe_packets_sent
5. Wait for TWCC feedback (up to 1 second)
6. Evaluate results and potentially create next probe cluster
┌──────────┐ create cluster ┌──────────┐
│ Disabled │ ─────────────────────► Inactive │
└──────────┘ └────┬─────┘
│
media sent│
▼
┌─────────┐
│ Active │
└────┬────┘
│
cluster done │
▼
┌──────────┐
│ Inactive │ (wait for feedback)
└──────────┘
libWebRTC creates probe clusters at specific triggers:
- Initial probing - On first media packet (exponential: 3x and 6x initial estimate)
- Allocation change - When max allocated bitrate increases
- ALR exit - When exiting Application Limited Region
- Recovery - After significant bitrate drop
- Network state - Based on network state estimator hints
libWebRTC assigns each probe cluster a unique ID. This ID is NOT sent in the RTP packets themselves. Instead:
- The
PacedPacketInfostruct carries the cluster ID through the pacing/sending chain - When a packet is sent, its TWCC sequence number is associated with the cluster ID internally
- When TWCC feedback arrives, the received sequence numbers are matched to their cluster IDs
- This allows the
ProbeBitrateEstimatorto calculate bitrate estimates per cluster
Sending side:
┌─────────────┐ PacedPacketInfo ┌─────────────┐
│ ProbeCluster│ ──────────────────────► RTP Packet │
│ id: 42 │ (cluster_id: 42) │ twcc_seq: 5 │
└─────────────┘ └─────────────┘
│
Internal tracking: │
twcc_seq 5 → cluster 42 │
twcc_seq 6 → cluster 42 ▼
twcc_seq 7 → cluster 42 ┌─────────────┐
│TWCC Feedback│
Feedback arrives: │ seq 5,6,7 │
Match seq 5,6,7 to cluster 42 └─────────────┘
Calculate probe bitrate for cluster 42
ALR (Application Limited Region) is when the sending application isn't requesting enough bandwidth to fill the available network capacity. This is NOT about whether we can send padding - it's about application demand.
Key insight: Padding is driven by desired_bitrate:
padding_rate = min(estimate, desired_bitrate) - current_bitrate
Even with a full RTX cache, padding only happens when the application asks for more bandwidth than it's currently using. If desired_bitrate is low, no padding is requested.
ALR examples:
- Video paused → no new frames, low
desired_bitrate - Screen share with static content → very few frames needed
- Application reduces quality tier → lower
desired_bitrate - Audio-only period in video call
In ALR state:
- Send queue is consistently empty
- Sending well below estimated capacity
- But application hasn't requested more bandwidth
- BWE estimates become stale (no congestion signal)
Why probe when exiting ALR?
When exiting ALR (e.g., video resumes), the application suddenly wants high bitrate again, but the BWE estimate is stale from the idle period. Probing rediscovers actual network capacity.
Detection methods:
- Track if send queue is consistently empty
- Compare actual send rate to estimated capacity
- Check if
desired_bitrate< estimate for extended period
/// Configuration for probe clusters
pub struct ProbeClusterConfig {
/// Target bitrate for this probe
pub target_rate: Bitrate,
/// Minimum duration to sustain probe
pub min_duration: Duration,
/// Minimum packets to send
pub min_packets: usize,
/// Minimum time between packets
pub min_delta: Duration,
/// Unique cluster ID for TWCC correlation
pub id: u32,
}
/// State of a probe cluster
pub struct ProbeCluster {
config: ProbeClusterConfig,
created_at: Instant,
started_at: Option<Instant>,
sent_bytes: usize,
sent_packets: usize,
}
/// Probe cluster manager
pub struct BitrateProber {
state: ProbingState,
clusters: VecDeque<ProbeCluster>,
next_probe_time: Instant,
next_cluster_id: u32,
config: BitrateProberConfig,
}
pub struct BitrateProberConfig {
pub max_probe_bitrate: Bitrate, // 5 Mbit/s
pub min_probe_duration: Duration, // 15ms
pub min_probe_packets: usize, // 5
pub min_probe_delta: Duration, // 2ms
pub max_probe_delay: Duration, // 10ms (discard if delayed)
pub cluster_timeout: Duration, // 5 seconds
}The pacer needs to understand probe clusters:
impl Pacer {
/// Check if we should send a probe packet now
fn should_send_probe(&self, now: Instant) -> Option<&ProbeCluster> {
self.prober.current_cluster(now)
}
/// Register that a probe packet was sent
fn probe_sent(&mut self, now: Instant, size: DataSize) {
self.prober.probe_sent(now, size);
}
/// Get recommended probe packet size
fn recommended_probe_size(&self) -> DataSize {
self.prober.recommended_min_probe_size()
}
}A new component to decide WHEN to probe:
pub struct ProbeController {
config: ProbeControllerConfig,
// State
max_bitrate: Bitrate,
estimated_bitrate: Option<Bitrate>,
last_probe_time: Instant,
// Triggers
in_alr: bool,
allocation_changed: bool,
}
impl ProbeController {
/// Called on BWE estimate update
fn on_estimate(&mut self, estimate: Bitrate) -> Vec<ProbeClusterConfig>;
/// Called when max allocated bitrate changes
fn on_max_allocation(&mut self, rate: Bitrate) -> Vec<ProbeClusterConfig>;
/// Called periodically
fn process(&mut self, now: Instant) -> Vec<ProbeClusterConfig>;
}impl Session {
fn handle_timeout(&mut self, now: Instant) {
// Check for new probe clusters
if let Some(bwe) = &mut self.bwe {
let new_clusters = bwe.probe_controller.process(now);
for cluster in new_clusters {
self.pacer.create_probe_cluster(cluster);
}
}
// Existing pacer/queue logic...
}
}str0m already prefers RTX payload padding over pure padding. In StreamTx::poll_packet_padding():
// From src/streams/send.rs
if self.padding > MIN_SPURIOUS_PADDING_SIZE { // 50 bytes
// Try to find a cached packet to retransmit as padding
if let Some(pkt) = self.rtx_cache.get_cached_packet_smaller_than(max_size) {
// Use RTX retransmission (larger, more efficient)
return Some(NextPacket { kind: NextPacketKind::Resend(...) });
}
}
// Fall back to blank padding (max 240 bytes)
return Some(NextPacket { kind: NextPacketKind::Blank(...) });This means:
- For padding requests > 50 bytes: Try RTX first (can be ~1200 bytes)
- Fall back to pure padding only if no suitable cached packet exists
- Pure padding limited to 240 bytes per packet
Probe clusters will use this existing mechanism - they just control WHEN and HOW MUCH padding is requested.
-
Create
src/packet/prober.rswith:ProbeClusterstructBitrateProberstruct- Probe state machine (Disabled → Inactive → Active)
create_cluster(),current_cluster(),probe_sent()
-
Unit tests for:
- Cluster lifecycle
- State transitions
- Timing constraints
- Add
BitrateProbertoPacer - Modify padding generation to respect probe cluster timing
- Track probe packet sends via
probe_sent() - Handle
recommended_probe_size()
- Create
src/packet/probe_controller.rs - Implement probe triggers:
- Initial exponential probing (on first media)
- Allocation-based probing
- Wire into BWE feedback loop
- Add probe cluster settings to
RtcConfig - Expose key tunables:
max_probe_bitratemin_probe_duration- Enable/disable probing
- Pre-media SSRC 0 probing: Will build on this infrastructure in a separate PR
- ALR detection: Application Limited Region detection for smarter probing
- Network state estimator: Additional probing hints
- Probe cluster state machine
- Timing constraints (min_delta, min_duration)
- Cluster completion conditions
- Multiple pending clusters
- Probe clusters generate expected packet counts
- TWCC feedback correlates with probe cluster IDs
- Probing respects bitrate caps
- Probing triggers on allocation changes
- CPU usage at max probe rate
- Memory usage with multiple pending clusters
- Latency impact during probing
- libWebRTC
modules/pacing/bitrate_prober.cc - libWebRTC
modules/congestion_controller/goog_cc/probe_controller.cc - WebRTC Hacks: Bandwidth Probing