Skip to content

Instantly share code, notes, and snippets.

@DaGeRe
Last active November 21, 2025 09:00
Show Gist options
  • Select an option

  • Save DaGeRe/586be58fbd93585c356faec964c02aff to your computer and use it in GitHub Desktop.

Select an option

Save DaGeRe/586be58fbd93585c356faec964c02aff to your computer and use it in GitHub Desktop.
Run llama.cpp on RX 480

Running llama.cpp on RX 480

Even though the RX 480 is not a very recent GPU, the 8 GB VRAM are quite decent and make it possible to run LLMs much faster than with the CPU.

To make it running, three steps are necessary: Installing vulkan, building shaderc and finally building lama.cpp including the vulkan backend (assuming git, a c compiler, cmake, ninja, etc. are installed on a sufficiently recent Ubuntu).

Step 1: Installing Vulkan

Run sudo apt install libvulkan-dev vulkan-tools glslang-tools.

Step 2: Building shaderc

To install shaderc, execute the following commands (and in case of problems, consult their documentation):

git clone https://github.com/google/shaderc
cd shaderc
git checkout known-good
./update_shaderc_sources.py
cd src/
./utils/git-sync-deps
mkdir build
export BUILD_DIR=$(pwd)/build
export SOURCE_DIR=$(pwd)
cmake -GNinja -DCMAKE_BUILD_TYPE=Release $SOURCE_DIR
ninja

To have the created binary available, execute export PATH=$MYPATH/shaderc/src/build/glslc:$PATH.

Step 3: Build llama.cpp

git clone [email protected]:ggml-org/llama.cpp.git
cd llama.cpp
mkdir build
cd build
cmake .. -B build -DGGML_VULKAN=ON
cmake --build build --config Release

Afterwards, it is possible to run models:

  • A small quantizised version of Mistral-Small-24B works (bin/llama-server --gpu-layers 35 -hf unsloth/Mistral-Small-24B-Instruct-2501-GGUF:Q2_K_L); on my machine, if has a high TTFT (takes ~10 seconds), but relatively good TPS (5-10)
  • DeepSeek R1 works (bin/llama-server --gpu-layers 60 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_M) -- this chain of thaught feature also effectively makes the TTFT high, but TPS are between 5 and 10 (not super to work with, but acceptable)
  • Gemma 12b works quite nice (bin/llama-server --gpu-layers 49 -hf unsloth/gemma-3-12b-it-GGUF:Q2_K_XL), with 15-20 TPS and TTFT below 1 second - so this seems to be the best fit for RX 480 (might be different with different CPU etc., but for me, this works quite nice).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment