Skip to content

Instantly share code, notes, and snippets.

@Randomblock1
Created March 5, 2026 22:48
Show Gist options
  • Select an option

  • Save Randomblock1/bf9b15926d7f6e96defb839f63c175a3 to your computer and use it in GitHub Desktop.

Select an option

Save Randomblock1/bf9b15926d7f6e96defb839f63c175a3 to your computer and use it in GitHub Desktop.
llama.cpp Vulkan vs HIP 6.4.4 backend Radeon 890M

llama.cpp benchmarks for various compiler settings and backends on my machine

CPU: Ryzen 9 HX 370 GPU: Radeon 890M (gfx1150) HIP version: 6.4.43484-9999 (HSA_OVERRIDE_GFX_VERSION=11.0.0) Linux framework 6.18.13-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 19 19:54:01 UTC 2026 x86_64 GNU/Linux Fedora Linux 43 (KDE Plasma Desktop Edition)

llama2-7b

Default compile settings

build/bin/llama-bench -m ~/Downloads/llama-2-7b.Q4_0.gguf -ngl 99 -fa 0,1 -dev Vulkan0,ROCm0 ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 890M Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon 890M Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params backend ngl fa dev test t/s
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 Vulkan0 pp512 267.79 ± 4.90
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 Vulkan0 tg128 17.14 ± 0.23
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 Vulkan0 pp512 302.60 ± 3.91
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 Vulkan0 tg128 18.22 ± 0.07
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 ROCm0 pp512 332.76 ± 3.10
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 ROCm0 tg128 15.52 ± 0.17
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 ROCm0 pp512 402.08 ± 5.89
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 ROCm0 tg128 16.04 ± 0.97

build: a0ed91a44 (8211)

With -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1 and -DGGML_HIP_ROCWMMA_FATTN=ON

build/bin/llama-bench -m ~/Downloads/llama-2-7b.Q4_0.gguf -ngl 99 -fa 0,1 -dev Vulkan0,ROCm0 ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 890M Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon 890M Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params backend ngl fa dev test t/s
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 Vulkan0 pp512 272.62 ± 2.09
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 Vulkan0 tg128 17.34 ± 0.08
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 Vulkan0 pp512 297.01 ± 5.06
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 Vulkan0 tg128 18.41 ± 0.02
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 ROCm0 pp512 341.77 ± 3.12
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 0 ROCm0 tg128 15.72 ± 0.26
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 ROCm0 pp512 398.34 ± 3.52
llama 7B Q4_0 3.56 GiB 6.74 B ROCm,Vulkan 99 1 ROCm0 tg128 17.01 ± 0.53

build: a0ed91a44 (8211)

Qwen3.5-35B-A3B

Default compile settings

build/bin/llama-bench -m ~/Downloads/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -ngl 99 -fa 0,1 -dev Vulkan0,ROCm0 ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 890M Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon 890M Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params backend ngl fa dev test t/s
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 Vulkan0 pp512 212.05 ± 1.62
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 Vulkan0 tg128 14.85 ± 0.05
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 Vulkan0 pp512 212.82 ± 1.05
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 Vulkan0 tg128 14.83 ± 0.09
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 ROCm0 pp512 227.54 ± 2.71
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 ROCm0 tg128 12.68 ± 0.14
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 ROCm0 pp512 229.81 ± 2.53
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 ROCm0 tg128 13.18 ± 0.61

build: a0ed91a44 (8211)

With -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1 and -DGGML_HIP_ROCWMMA_FATTN=ON

build/bin/llama-bench -m ~/Downloads/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -ngl 99 -fa 0,1 -dev Vulkan0,ROCm0 ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 890M Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon 890M Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params backend ngl fa dev test t/s
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 Vulkan0 pp512 210.64 ± 1.11
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 Vulkan0 tg128 14.57 ± 0.28
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 Vulkan0 pp512 203.18 ± 0.38
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 Vulkan0 tg128 14.36 ± 0.26
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 ROCm0 pp512 227.87 ± 3.01
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 0 ROCm0 tg128 12.61 ± 0.11
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 ROCm0 pp512 231.49 ± 2.51
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm,Vulkan 99 1 ROCm0 tg128 13.44 ± 0.84

build: a0ed91a44 (8211)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment