Skip to content

Instantly share code, notes, and snippets.

@AaronBeier
Created June 1, 2025 12:11
Show Gist options
  • Select an option

  • Save AaronBeier/b4aa9f863831a05055def311c1b9eab4 to your computer and use it in GitHub Desktop.

Select an option

Save AaronBeier/b4aa9f863831a05055def311c1b9eab4 to your computer and use it in GitHub Desktop.
llama.cpp benchmarks, AMD Ryzen 9 9900X, Intel Arc A380
llama.cpp b5466
intel-compute-runtime 25.18.33578.6
intel-media-driver 25.2.3
intel-oneapi-basekit 2025.0.1.46
vulkan-intel 1:25.1.1
openblas 0.3.29
gcc 15.1.1+r7+gf36ec88aa85a
linux 6.14.9.arch1-1
model Unsloth Phi-4-Mini-Reasoning Q5_K_M
common options --ctx-size 4096 --flash-attn --mlock --jinja
first prompt How to solve 3*x^2+4*x+5=1?
second prompt solve {\left( {z - 2} \right)^2} - 36 = 0
cpu only (--cache-type-k q8_0 --cache-type-v q8_0)
prompt eval time = 231.52 ms / 34 tokens ( 6.81 ms per token, 146.85 tokens per second)
eval time = 91585.48 ms / 1874 tokens ( 48.87 ms per token, 20.46 tokens per second)
total time = 91817.00 ms / 1908 tokens
prompt eval time = 3476.85 ms / 455 tokens ( 7.64 ms per token, 130.87 tokens per second)
eval time = 61639.12 ms / 1256 tokens ( 49.08 ms per token, 20.38 tokens per second)
total time = 65115.96 ms / 1711 tokens
vulkan 10 layers
prompt eval time = 1490.77 ms / 34 tokens ( 43.85 ms per token, 22.81 tokens per second)
eval time = 81832.84 ms / 1134 tokens ( 72.16 ms per token, 13.86 tokens per second)
total time = 83323.61 ms / 1168 tokens
vulkan 20 layers
prompt eval time = 1410.32 ms / 34 tokens ( 41.48 ms per token, 24.11 tokens per second)
eval time = 136721.74 ms / 1454 tokens ( 94.03 ms per token, 10.63 tokens per second)
total time = 138132.06 ms / 1488 tokens
cpu openblas
prompt eval time = 1146.41 ms / 34 tokens ( 33.72 ms per token, 29.66 tokens per second)
eval time = 83908.54 ms / 1785 tokens ( 47.01 ms per token, 21.27 tokens per second)
total time = 85054.95 ms / 1819 tokens
prompt eval time = 2545.14 ms / 372 tokens ( 6.84 ms per token, 146.16 tokens per second)
eval time = 81429.75 ms / 1599 tokens ( 50.93 ms per token, 19.64 tokens per second)
total time = 83974.90 ms / 1971 tokens
sycl 10 layers
prompt eval time = 1934.28 ms / 34 tokens ( 56.89 ms per token, 17.58 tokens per second)
eval time = 116342.05 ms / 1731 tokens ( 67.21 ms per token, 14.88 tokens per second)
total time = 118276.33 ms / 1765 tokens
prompt eval time = 65.88 ms / 1 tokens ( 65.88 ms per token, 15.18 tokens per second)
eval time = 96242.63 ms / 1359 tokens ( 70.82 ms per token, 14.12 tokens per second)
total time = 96308.51 ms / 1360 tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment