Created
November 18, 2025 07:58
-
-
Save initcron/6cff65d5d98a7ccfb537957779f7cb03 to your computer and use it in GitHub Desktop.
Dockerfile for vLLM with CPU only Serving
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| FROM openeuler/vllm-cpu:0.9.1-oe2403lts | |
| # Patch the cpu_worker.py to handle zero NUMA nodes | |
| RUN sed -i 's/cpu_count_per_numa = cpu_count \/\/ numa_size/cpu_count_per_numa = cpu_count \/\/ numa_size if numa_size > 0 else cpu_count/g' \ | |
| /workspace/vllm/vllm/worker/cpu_worker.py | |
| ENV VLLM_TARGET_DEVICE=cpu \ | |
| VLLM_CPU_KVCACHE_SPACE=1 \ | |
| OMP_NUM_THREADS=2 \ | |
| OPENBLAS_NUM_THREADS=1 \ | |
| MKL_NUM_THREADS=1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment