Skip to content

Instantly share code, notes, and snippets.

@nateclos
Created November 30, 2025 22:56
Show Gist options
  • Select an option

  • Save nateclos/9c33a944e9526488a2dc509394a038e8 to your computer and use it in GitHub Desktop.

Select an option

Save nateclos/9c33a944e9526488a2dc509394a038e8 to your computer and use it in GitHub Desktop.
OLlama, Open WebUI, and SearXNG Setup

Local LLM Setup with Ollama + Open WebUI + SearxNG

This setup gives you a private, local ChatGPT-like interface with web search capabilities.

What's Included

  • Ollama - Runs LLMs locally on your GPU
  • Open WebUI - Clean web interface (like ChatGPT) to chat with your models
  • SearxNG - Private meta-search engine that Open WebUI can use for web searches

Requirements

  • Docker & Docker Compose
  • NVIDIA GPU with NVIDIA Container Toolkit installed
  • At least 8GB VRAM (16GB+ recommended for larger models)

Quick Start

  1. Install NVIDIA Container Toolkit (if you haven't already):

    # Ubuntu/Debian
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker
  2. Create the directory structure:

    mkdir -p ollama-setup/searxng
    cd ollama-setup
  3. Copy the files:

    • docker-compose.yml - Main compose file
    • searxng/settings.yml - SearxNG configuration
    • searxng/limiter.toml - Rate limiting config (disabled for local use)
  4. Start everything:

    docker-compose up -d
  5. Access Open WebUI:

    • Open your browser to http://localhost:3000
    • Create an account (first user is auto-admin)
    • The interface will be empty until you pull a model
  6. Pull your first model:

    # Option 1: From the command line
    docker exec -it ollama ollama pull llama3.2
    
    # Option 2: From Open WebUI interface
    # Go to Settings β†’ Models β†’ Pull a model from Ollama.com

Recommended Models

Small (8GB VRAM):

  • llama3.2 - Fast, good quality
  • mistral - Great performance/quality balance

Medium (16GB VRAM):

  • llama3.1:8b - Excellent for most tasks
  • mixtral - Very capable, good reasoning

Large (24GB+ VRAM):

  • llama3.1:70b - Top-tier quality
  • qwen2.5:72b - Excellent coding and reasoning

Using Web Search

  1. In Open WebUI, start a chat
  2. Click the "+" button next to the message box
  3. Enable "Web Search"
  4. Your queries will now search the web using SearxNG

Common Commands

# View logs
docker-compose logs -f

# List downloaded models
docker exec -it ollama ollama list

# Pull a new model
docker exec -it ollama ollama pull <model-name>

# Stop everything
docker-compose down

# Stop and remove all data
docker-compose down -v

Ports

  • 3000 - Open WebUI interface
  • 11434 - Ollama API
  • 8080 - SearxNG search engine

Troubleshooting

GPU not detected:

# Check if NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Container won't start:

# Check logs
docker-compose logs <service-name>

Models running slow:

  • Check GPU usage: nvidia-smi
  • Larger models need more VRAM
  • Try a smaller quantization (e.g., llama3.1:8b instead of llama3.1:70b)

Notes

  • All data is stored in Docker volumes and persists between restarts
  • SearxNG is configured with rate limiting disabled for local use
  • First model pull can take a while depending on your internet speed
  • You can run multiple models, they're loaded into VRAM on-demand

What's Different From ChatGPT?

Pros:

  • Completely private - nothing leaves your machine
  • No usage limits or costs
  • Full control over models and settings
  • Works offline (except web search)

Cons:

  • Slower on consumer GPUs
  • Need to manage models yourself
  • Limited by your VRAM for model size
services:
# Ollama - Runs LLMs locally using your GPU
# Models are stored in a Docker volume and loaded into VRAM on-demand
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama:/root/.ollama # Persistent storage for downloaded models
ports:
- "11434:11434" # API port - Open WebUI connects here
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all # Use all available GPUs
capabilities: [gpu]
restart: unless-stopped
# Open WebUI - ChatGPT-like interface for Ollama
# First user to sign up becomes the admin
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- open-webui:/app/backend/data # Stores chats, settings, user data
ports:
- "3000:8080" # Access at http://localhost:3000
environment:
# Web search configuration - uses SearxNG running below
- ENABLE_RAG_WEB_SEARCH=true
- RAG_WEB_SEARCH_ENGINE=searxng
- RAG_WEB_SEARCH_RESULT_COUNT=5
- RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10
- SEARXNG_QUERY_URL=http://searxng:8080/search?q=<query>&format=json
extra_hosts:
- "host.docker.internal:host-gateway" # Allows connecting to host services
restart: unless-stopped
# SearxNG - Private meta-search engine
# Aggregates results from multiple search engines
# Config files in ./searxng/ directory
searxng:
image: searxng/searxng:latest
container_name: searxng
ports:
- "8080:8080" # Search endpoint, also has a web UI
volumes:
- ./searxng:/etc/searxng:rw # Mount config directory
restart: unless-stopped
# Docker volumes - persist data between container restarts
volumes:
ollama: # Stores downloaded models (can get large!)
open-webui: # Stores chat history, settings, user accounts
# Disable rate limiting for local use
[botdetection.ip_limit]
link_token = false
[botdetection.ip_lists]
block_ip = []
pass_ip = []
use_default_settings: true
server:
secret_key: "changeme123456789012345678901234"
limiter: false
image_proxy: true
port: 8080
bind_address: "0.0.0.0"
search:
safe_search: 0
autocomplete: ""
default_lang: ""
formats:
- html
- json
#!/bin/bash
# Simple setup script for Ollama + Open WebUI + SearxNG
echo "πŸš€ Starting Ollama + Open WebUI + SearxNG setup..."
echo ""
# Check if Docker is running
if ! docker info > /dev/null 2>&1; then
echo "❌ Docker is not running. Please start Docker first."
exit 1
fi
# Check for NVIDIA GPU
echo "Checking for NVIDIA GPU..."
if ! docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi > /dev/null 2>&1; then
echo "⚠️ Warning: Could not detect NVIDIA GPU or NVIDIA Container Toolkit."
echo " Make sure you have:"
echo " 1. NVIDIA GPU installed"
echo " 2. NVIDIA drivers installed"
echo " 3. NVIDIA Container Toolkit installed"
echo ""
read -p "Continue anyway? (y/N) " -n 1 -r
echo ""
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
else
echo "βœ… NVIDIA GPU detected"
fi
# Start services
echo ""
echo "Starting services..."
docker-compose up -d
# Wait for services to be ready
echo ""
echo "Waiting for services to start..."
sleep 5
# Check if services are running
if docker ps | grep -q "ollama"; then
echo "βœ… Ollama is running"
else
echo "❌ Ollama failed to start"
fi
if docker ps | grep -q "open-webui"; then
echo "βœ… Open WebUI is running"
else
echo "❌ Open WebUI failed to start"
fi
if docker ps | grep -q "searxng"; then
echo "βœ… SearxNG is running"
else
echo "❌ SearxNG failed to start"
fi
echo ""
echo "πŸ“ Next steps:"
echo ""
echo "1. Open your browser to http://localhost:3000"
echo "2. Create an account (first user becomes admin)"
echo "3. Pull a model:"
echo " docker exec -it ollama ollama pull llama3.2"
echo ""
echo "4. Start chatting!"
echo ""
echo "πŸ“š See README.md for more information and troubleshooting"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment