Created
November 14, 2025 13:33
-
-
Save ugovaretto/dc2dec952075d392461a5e3eafe2964d to your computer and use it in GitHub Desktop.
llama-swap configuration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # llama-swap configuration | |
| models: | |
| VibeThinker-1.5B: | |
| cmd: llama-server --port ${PORT} -c 0 --model /home/ugo/.cache/llama.cpp/VibeThinker-1.5B.f16.gguf -ngl 99 | |
| Aquif-3.5-Max-42B-A3B: | |
| cmd: > | |
| llama-server --port ${PORT} | |
| --model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf -ngl 99 -fa on | |
| Aquif-3.5-Max-42B-A3B-Coding-Q6_K_XL-KVQ8: | |
| cmd: > | |
| llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf | |
| -ngl 99 | |
| --ctx-size 196000 --temp 0.2 --top-p 0.90 | |
| --repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q8_0 -ctv q8_0 | |
| Aquif-3.5-Max-42B-A3B-Coding-Q6_K_XL-KVQ4: | |
| cmd: > | |
| llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf | |
| -ngl 99 | |
| --ctx-size 196000 --temp 0.2 --top-p 0.90 | |
| --repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0 | |
| Aquif-3.5-Max-42B-A3B-Coding-BF16-KVQ4: | |
| cmd: > | |
| llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth/aquif-3.5-max-42b-a3b-gguf/bf16/aquif-3.5-max-42b-a3b-bf16-00001-of-00002.gguf | |
| -ngl 99 | |
| --ctx-size 196000 --temp 0.2 --top-p 0.90 | |
| --repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0 | |
| MiniMax-M2-Q2_K-KV4: | |
| cmd: > | |
| llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth_MiniMax-M2-GGUF_Q2_K_MiniMax-M2-Q2_K-00001-of-00002.gguf | |
| -ngl 99 | |
| --ctx-size 131000 --temp 0.2 --top-p 0.90 | |
| --repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0 | |
| Qwen3-VL-30B-A3B-Thinking: | |
| cmd: > | |
| llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Q8_0/Qwen3-VL-30B-A3B-Thinking-Q8_0.gguf | |
| -ngl 99 --jinja -ctk q8_0 -ctv q8_0 --ctx-size 131000 --mmproj /home/ugo/.cache/llama.cpp/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Q8_0/mmproj-BF16.gguf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment