Skip to content

Instantly share code, notes, and snippets.

@fuzboxz
Last active October 11, 2025 09:59
Show Gist options
  • Select an option

  • Save fuzboxz/a338e1966888d40f0e5e37c36f50a758 to your computer and use it in GitHub Desktop.

Select an option

Save fuzboxz/a338e1966888d40f0e5e37c36f50a758 to your computer and use it in GitHub Desktop.
Local LLM Stack

Local LLM Stack for Windows + NVIDIA πŸ’»πŸ§ πŸš€

This is my previous Local LLM Stack environment, which I wanted to share. It's built for machines running Windows and an NVIDIA GPU, leveraging Docker Compose for containerization.

The stack provides a fully portable, accelerated local AI environment using:

  • Ollama: The runtime for pulling, serving, and managing local large language models (LLMs) using your NVIDIA GPU. πŸ“¦
  • Open WebUI: A feature-rich, self-hosted web interface to interact with the models served by Ollama. 🌐
  • Caddy: A powerful reverse proxy that manages HTTPS for the entire stack. πŸ”’
  • Watchtower: Configured for automatic updates of all services. πŸ”„

How to Use

  1. Drop & Go: Simply place the docker-compose.yml file and your Caddyfile into an empty directory.

  2. Start: Run the following command in that directory:

    docker compose up -d

    Docker Compose will automatically create the necessary data folders (e.g., ollama-data, openwebui-data, etc.) on your host machine.

The environment will start and be accessible at https://localhost:3000.

Key Features

  • GPU Acceleration: Configured to automatically utilize your NVIDIA GPU for all model inference. ⚑
  • Portability: Uses local bind mounts (e.g., ./ollama-data) for all data, making the configuration independent of the Docker project name and easily transferable between machines. 🧳
  • Automatic HTTPS: Caddy is set up to provide a basic HTTPS endpoint. You will need to modify the included Caddyfile to configure your desired hostname or domain and manage the certificate trust. πŸ› οΈ
  • Auto-Updates: Watchtower is enabled on all core services to keep them up-to-date automatically. ✨

Disclaimer: This is an old personal stack shared as-is. Additional hardening, security, and network configuration are required for production or public use. I accept no responsibility whatsoever for its use, misuse, or any consequences resulting from running this configuration. ⚠️

# Replace localhost with hostname
localhost:3000 {
reverse_proxy open-webui:8080
tls internal
}
name: llmstack
x-logging:
default: &default
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
services:
# LLM Runtime
ollama:
container_name: ${COMPOSE_PROJECT_NAME}-ollama
image: ollama/ollama:latest
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- CUDA_VISIBLE_DEVICES=0
- LOG_LEVEL=debug
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
# --- CHANGED: Using local bind mount for model data ---
volumes:
- ./ollama_data:/root/.ollama
networks:
- llm_network
labels:
- "com.centurylinklabs.watchtower.enable=true"
logging: *default
restart: unless-stopped
# Open Web UI
open-webui:
image: ghcr.io/open-webui/open-webui:cuda
container_name: ${COMPOSE_PROJECT_NAME}-open-webui
# --- CHANGED: Using local bind mount for Open WebUI data ---
volumes:
- ./openwebui_data:/app/backend/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- llm_network
environment:
- 'OLLAMA_BASE_URL=http://ollama:11434'
depends_on:
- ollama
labels:
- "com.centurylinklabs.watchtower.enable=true"
logging: *default
restart: unless-stopped
# Reverse proxy
caddy:
container_name: ${COMPOSE_PROJECT_NAME}-caddy
image: caddy:2.9-alpine
ports:
- "3000:3000"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- ./caddy_data:/data
- ./caddy_config:/config
networks:
- llm_network
depends_on:
- open-webui
labels:
- "com.centurylinklabs.watchtower.enable=true"
logging: *default
restart: unless-stopped
# Auto update
watchtower:
container_name: ${COMPOSE_PROJECT_NAME}-watchtower
image: containrrr/watchtower
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command: >
--cleanup=true
--label-enable
--interval=300
networks:
- llm_network
labels:
- "com.centurylinklabs.watchtower.enable=true"
logging: *default
restart: unless-stopped
networks:
llm_network:
driver: bridge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment