Skip to content

Instantly share code, notes, and snippets.

@AaronBeier
Created July 24, 2025 11:07
Show Gist options
  • Select an option

  • Save AaronBeier/bb803d399b42177cd59bf1c40782fa8c to your computer and use it in GitHub Desktop.

Select an option

Save AaronBeier/bb803d399b42177cd59bf1c40782fa8c to your computer and use it in GitHub Desktop.
Comparing llama.cpp vs llama.cpp + AMD's BLIS fork
llama.cpp b5970
blis 837d3974d43eaa84bb8758e4b80385b4150306b2
gcc 15.1.1+r7+gf36ec88aa85a
linux 6.15.7.arch1-1
llama-bench --model Qwen3-Embedding-8B-Q5_K_M.gguf --embeddings 1 --prio 2 --threads 12
Default build:
| model | size | params | backend | threads | embd | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ---------: | --------------: | -------------------: |
| qwen3 8B Q5_K - Medium | 5.04 GiB | 7.57 B | CPU | 12 | 1 | pp512 | 68.82 ± 0.22 |
| qwen3 8B Q5_K - Medium | 5.04 GiB | 7.57 B | CPU | 12 | 1 | tg128 | 12.86 ± 0.00 |
AMD's BLIS fork:
| model | size | params | backend | threads | embd | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ---------: | --------------: | -------------------: |
| qwen3 8B Q5_K - Medium | 5.04 GiB | 7.57 B | BLAS | 12 | 1 | pp512 | 87.01 ± 0.38 |
| qwen3 8B Q5_K - Medium | 5.04 GiB | 7.57 B | BLAS | 12 | 1 | tg128 | 12.87 ± 0.00 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment