This setup gives you a private, local ChatGPT-like interface with web search capabilities.
- Ollama - Runs LLMs locally on your GPU
- Open WebUI - Clean web interface (like ChatGPT) to chat with your models
- SearxNG - Private meta-search engine that Open WebUI can use for web searches
- Docker & Docker Compose
- NVIDIA GPU with NVIDIA Container Toolkit installed
- At least 8GB VRAM (16GB+ recommended for larger models)
-
Install NVIDIA Container Toolkit (if you haven't already):
# Ubuntu/Debian distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
-
Create the directory structure:
mkdir -p ollama-setup/searxng cd ollama-setup -
Copy the files:
docker-compose.yml- Main compose filesearxng/settings.yml- SearxNG configurationsearxng/limiter.toml- Rate limiting config (disabled for local use)
-
Start everything:
docker-compose up -d
-
Access Open WebUI:
- Open your browser to http://localhost:3000
- Create an account (first user is auto-admin)
- The interface will be empty until you pull a model
-
Pull your first model:
# Option 1: From the command line docker exec -it ollama ollama pull llama3.2 # Option 2: From Open WebUI interface # Go to Settings β Models β Pull a model from Ollama.com
Small (8GB VRAM):
llama3.2- Fast, good qualitymistral- Great performance/quality balance
Medium (16GB VRAM):
llama3.1:8b- Excellent for most tasksmixtral- Very capable, good reasoning
Large (24GB+ VRAM):
llama3.1:70b- Top-tier qualityqwen2.5:72b- Excellent coding and reasoning
- In Open WebUI, start a chat
- Click the "+" button next to the message box
- Enable "Web Search"
- Your queries will now search the web using SearxNG
# View logs
docker-compose logs -f
# List downloaded models
docker exec -it ollama ollama list
# Pull a new model
docker exec -it ollama ollama pull <model-name>
# Stop everything
docker-compose down
# Stop and remove all data
docker-compose down -v- 3000 - Open WebUI interface
- 11434 - Ollama API
- 8080 - SearxNG search engine
GPU not detected:
# Check if NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smiContainer won't start:
# Check logs
docker-compose logs <service-name>Models running slow:
- Check GPU usage:
nvidia-smi - Larger models need more VRAM
- Try a smaller quantization (e.g.,
llama3.1:8binstead ofllama3.1:70b)
- All data is stored in Docker volumes and persists between restarts
- SearxNG is configured with rate limiting disabled for local use
- First model pull can take a while depending on your internet speed
- You can run multiple models, they're loaded into VRAM on-demand
Pros:
- Completely private - nothing leaves your machine
- No usage limits or costs
- Full control over models and settings
- Works offline (except web search)
Cons:
- Slower on consumer GPUs
- Need to manage models yourself
- Limited by your VRAM for model size