Skip to content

Instantly share code, notes, and snippets.

@ugovaretto
Created November 14, 2025 13:33
Show Gist options
  • Select an option

  • Save ugovaretto/dc2dec952075d392461a5e3eafe2964d to your computer and use it in GitHub Desktop.

Select an option

Save ugovaretto/dc2dec952075d392461a5e3eafe2964d to your computer and use it in GitHub Desktop.
llama-swap configuration
# llama-swap configuration
models:
VibeThinker-1.5B:
cmd: llama-server --port ${PORT} -c 0 --model /home/ugo/.cache/llama.cpp/VibeThinker-1.5B.f16.gguf -ngl 99
Aquif-3.5-Max-42B-A3B:
cmd: >
llama-server --port ${PORT}
--model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf -ngl 99 -fa on
Aquif-3.5-Max-42B-A3B-Coding-Q6_K_XL-KVQ8:
cmd: >
llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf
-ngl 99
--ctx-size 196000 --temp 0.2 --top-p 0.90
--repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q8_0 -ctv q8_0
Aquif-3.5-Max-42B-A3B-Coding-Q6_K_XL-KVQ4:
cmd: >
llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth-aquif-3.5-Max-42B-A3B-GGUF/aquif-3.5-Max-42B-A3B-UD-Q6_K_XL.gguf
-ngl 99
--ctx-size 196000 --temp 0.2 --top-p 0.90
--repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0
Aquif-3.5-Max-42B-A3B-Coding-BF16-KVQ4:
cmd: >
llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth/aquif-3.5-max-42b-a3b-gguf/bf16/aquif-3.5-max-42b-a3b-bf16-00001-of-00002.gguf
-ngl 99
--ctx-size 196000 --temp 0.2 --top-p 0.90
--repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0
MiniMax-M2-Q2_K-KV4:
cmd: >
llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth_MiniMax-M2-GGUF_Q2_K_MiniMax-M2-Q2_K-00001-of-00002.gguf
-ngl 99
--ctx-size 131000 --temp 0.2 --top-p 0.90
--repeat-penalty 1.1 --min-p 0.05 --top-k 40 --jinja -ctk q4_0 -ctv q4_0
Qwen3-VL-30B-A3B-Thinking:
cmd: >
llama-server --port ${PORT} --model /home/ugo/.cache/llama.cpp/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Q8_0/Qwen3-VL-30B-A3B-Thinking-Q8_0.gguf
-ngl 99 --jinja -ctk q8_0 -ctv q8_0 --ctx-size 131000 --mmproj /home/ugo/.cache/llama.cpp/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Q8_0/mmproj-BF16.gguf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment