Skip to content

Instantly share code, notes, and snippets.

@robert-mcdermott
Created November 2, 2025 03:24
Show Gist options
  • Select an option

  • Save robert-mcdermott/2abca01c3387a442ad2fde9bfd182ee0 to your computer and use it in GitHub Desktop.

Select an option

Save robert-mcdermott/2abca01c3387a442ad2fde9bfd182ee0 to your computer and use it in GitHub Desktop.
Docker vLLM server on DGX Spark with local huggingface cache
docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8900:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/vllm:25.09-py3 \
vllm serve "Qwen/Qwen3-1.7B"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment