This benchmark reproduces the scenario discussed in chat:
- 4 Puma workers
- open-model traffic at 12 req/s (k6 constant-arrival-rate)
- synthetic Rack app with ~250ms service time
- service alternates CPU and IO slices randomly every ~10ms
- compare
threads=5vsthreads=25 - during traffic: kill one worker, wait for restart, let it serve ~5s, then kill a different worker (round-robin)
app.ru- synthetic 50/50 CPU/IO Rack apppuma.rb- Puma config (WEB_CONCURRENCY=4, thread count via env)docker-compose.yml- Puma + k6 services (defaults to 4 CPUs/container)k6/open_traffic.js- k6 open-model load generatorrun.fish- runs both thread scenarios and performs TERM/restart loop
cd benchmarks/docker/term-capacity
# optional override, defaults to 4 CPUs per container
set -x CONTAINER_CPUS 4
./run.fishResults are written under:
benchmarks/docker/term-capacity/results/<timestamp>/
Each scenario folder (threads-5, threads-25) includes:
k6-summary.jsonk6-metrics.jsonpuma-stats.ndjson(control server snapshots each second)kill-events.ndjson(every TERM + observed restart)kill-summary.txt(counts: terms, restarts, timeouts, boot lines)puma.log