Adding a safe_mode parameter to inflate_fast() allows the fast path to run with
as few as 3 bytes of avail_out (down from 260). This eliminates the performance
cliff where PNG-style row-by-row decompression falls back to the slow inflate()
state-machine path for the last 260 bytes of each row.
Related: zlib-ng/zlib-ng#2062
| Spec | Value |
|---|---|
| CPU | Apple M3 |
| RAM | 24 GB |
| Arch | arm64 |
| OS | macOS 15.7.4 |
| Compiler | Apple Clang (default) |
| Build | Release, static |
Simulates PNG-style row-by-row decompression with constrained avail_out.
256 KB of compressible data decompressed in fixed-size chunks.
| avail_out | Baseline (ns) | Contender (ns) | Change |
|---|---|---|---|
| 64 | 143,288 | 118,668 | -17.2% |
| 128 | 100,689 | 79,391 | -21.2% |
| 256 | 80,936 | 55,975 | -30.8% |
| 512 | 58,234 | 47,555 | -18.3% |
| 1024 | 45,580 | 40,797 | -10.5% |
| 2048 | 39,171 | 36,858 | -5.9% |
| 4096 | 36,570 | 35,171 | -3.8% |
| 16384 | 34,097 | 33,515 | -1.7% |
CPU mean times, 5 repetitions each.
Standard inflate benchmark with large output buffers to verify no regression on the normal (non-safe) code path.
| Input size | Baseline (ns) | Contender (ns) | Change |
|---|---|---|---|
| 1 | 19.2 | 19.2 | +0.1% |
| 64 | 134 | 136 | +1.2% |
| 1,024 | 294 | 291 | -0.9% |
| 16,384 | 3,813 | 3,827 | +0.4% |
| 131,072 | 15,036 | 15,077 | +0.3% |
| 1,048,576 | 105,320 | 106,299 | +0.9% |
CPU mean times, 5 repetitions each. All within noise — no regression.
Small output buffers (64–512 bytes, typical PNG row sizes) see -17% to -31%
improvement. The improvement diminishes as avail_out grows, since larger buffers
already spend most of their time in the fast path. No regression observed on
standard large-buffer inflate.