Last active
May 9, 2025 19:41
-
-
Save miminashi/e39b1992efb5c15fa1115532584ff273 to your computer and use it in GitHub Desktop.
2025-05-10
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ubuntu@t120h-k80:~/llama.cpp (master)$ ./build/bin/llama-bench -p 0 -n 128,256,512 \ | |
| > -m ~/.cache/llama.cpp/unsloth_Qwen3-32B-GGUF_Qwen3-32B-Q8_0.gguf \ | |
| > -m ~/.cache/llama.cpp/unsloth_Qwen3-30B-A3B-GGUF_Qwen3-30B-A3B-Q8_0.gguf \ | |
| > -m ~/.cache/llama.cpp/mmns_Qwen3-32B-F16.gguf \ | |
| > -m ~/.cache/llama.cpp/mmns_Qwen3-30B-A3B-F16.gguf \ | |
| > -m ~/.cache/llama.cpp/unsloth_DeepSeek-R1-Distill-Llama-70B-GGUF_DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf | |
| ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no | |
| ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no | |
| ggml_cuda_init: found 8 CUDA devices: | |
| Device 0: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 1: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 2: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 3: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 4: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 5: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 6: Tesla K80, compute capability 3.7, VMM: yes | |
| Device 7: Tesla K80, compute capability 3.7, VMM: yes | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 32B Q8_0 | 32.42 GiB | 32.76 B | CUDA | 99 | tg128 | 2.06 ± 0.00 | | |
| | qwen3 32B Q8_0 | 32.42 GiB | 32.76 B | CUDA | 99 | tg256 | 2.02 ± 0.00 | | |
| | qwen3 32B Q8_0 | 32.42 GiB | 32.76 B | CUDA | 99 | tg512 | 1.95 ± 0.00 | | |
| | qwen3moe 30B.A3B Q8_0 | 30.25 GiB | 30.53 B | CUDA | 99 | tg128 | 10.19 ± 0.00 | | |
| | qwen3moe 30B.A3B Q8_0 | 30.25 GiB | 30.53 B | CUDA | 99 | tg256 | 9.89 ± 0.00 | | |
| | qwen3moe 30B.A3B Q8_0 | 30.25 GiB | 30.53 B | CUDA | 99 | tg512 | 9.27 ± 0.00 | | |
| | qwen3 32B F16 | 61.03 GiB | 32.76 B | CUDA | 99 | tg128 | 1.28 ± 0.00 | | |
| | qwen3 32B F16 | 61.03 GiB | 32.76 B | CUDA | 99 | tg256 | 1.28 ± 0.00 | | |
| | qwen3 32B F16 | 61.03 GiB | 32.76 B | CUDA | 99 | tg512 | 1.28 ± 0.01 | | |
| | qwen3moe 30B.A3B F16 | 56.89 GiB | 30.53 B | CUDA | 99 | tg128 | 6.53 ± 0.00 | | |
| | qwen3moe 30B.A3B F16 | 56.89 GiB | 30.53 B | CUDA | 99 | tg256 | 6.40 ± 0.00 | | |
| | qwen3moe 30B.A3B F16 | 56.89 GiB | 30.53 B | CUDA | 99 | tg512 | 6.13 ± 0.00 | | |
| | llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA | 99 | tg128 | 1.15 ± 0.06 | | |
| | llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA | 99 | tg256 | 1.05 ± 0.02 | | |
| | llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA | 99 | tg512 | 1.07 ± 0.01 | | |
| build: b486ba05 (5321) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment