A comprehensive load generation and benchmarking toolkit from Weka for testing LLM inference performance with realistic KV cache patterns.
The weka-new-kv-cache-tester is a Python-based toolkit designed to benchmark LLM inference servers with realistic agentic coding workloads. It includes multiple testing tools and a dataset of 588 real Claude Code conversation traces.
Author: Callan Fox (Weka)
License: Apache 2.0