Skip to content

Instantly share code, notes, and snippets.

@axelquack
Last active April 16, 2025 16:49
Show Gist options
  • Select an option

  • Save axelquack/f96e8e7305c7bd0dda1aa6a3ae7c5582 to your computer and use it in GitHub Desktop.

Select an option

Save axelquack/f96e8e7305c7bd0dda1aa6a3ae7c5582 to your computer and use it in GitHub Desktop.

RAG refinement

Current approach: RAG with Open WebUI

RAG is a technique that enhances large language models (LLMs) by retrieving relevant information from a document base to improve response accuracy.

Based on the Open WebUI RAG Tutorial (Open WebUI RAG Tutorial), my current approach to Retrieval-Augmented Generation (RAG) leverages Open WebUI’s built-in knowledge base feature to enhance LLM responses with context from my NAS files (Markdown, PDF, and ePub).

Documents are uploaded to a named knowledge base (e.g., “NAS Documents”) in Open WebUI, processed into chunks, and stored in a vector database (the default Chroma). Using a custom model configured with this knowledge base, queries prefixed with # (e.g., #NAS Documents How do I configure Docker?) retrieve relevant document chunks via Chroma, which Ollama’s LLM (e.g., deepseek-r1:14b) uses to generate accurate, context-aware responses.

Cons of Current RAG Approach

  • Manual Upload: Uploading files to Open WebUI’s knowledge base is manual, requiring drag-and-drop or API calls, which is time-consuming for large NAS collections compared to fully automated ingestion.
  • Query Syntax: Users must use the #knowledge_base_name syntax for RAG queries, which may confuse non-technical users or require training.
  • Chunking Limitations: Open WebUI’s default chunking may not optimize for all document types (e.g., dense PDFs), potentially affecting retrieval accuracy.

Advanced approach: Setup overview

  • Document Processing with N8N: Using N8N to monitor the NAS, extract text from PDFs and ePubs, and prepare them for RAG. For PDFs, tools like Marker can help, while ePubs may need text extraction using ebooklib before loading.
  • Storage and Retrieval: Store document metadata in your local Supabase instance and use Qdrant for vector embeddings, enabling efficient similarity searches for RAG.
  • RAG and Interaction: Configure Open WebUI to use Qdrant as the vector store and Ollama for running the LLM locally. This setup allows users to interact with the RAG system through a user-friendly interface, leveraging your document knowledge base.

Advanced approach: RAG and components

My future setup involves:

  • NAS Files: Primarily Markdown, PDF, and ePub documents, serving as the knowledge base for a Retrieval-Augmented Generation (RAG) system.
  • Probably being replaced by Docling. Marker: An open-source tool for converting PDFs to Markdown and JSON, optimized for complex documents like scientific papers. It uses deep learning for text, table, and layout extraction, with optional Ollama integration (e.g., llama3.2-vision:3b (lightweight), llava:13b (probably best accuracy), qwen2-vision:7b, bakllava:7b) to enhance accuracy for tables and math. Runs fully locally, making it ideal for offline RAG pipelines. (GitHub - VikParuchuri/marker
  • ebooklib: A Python library for reading and parsing ePub files, extracting text and metadata from their HTML-based structure. Used to preprocess ePubs in the RAG pipeline, converting them to text or Markdown for further processing by tools like Marker or direct embedding. (GitHub - aerkalov/ebooklib)
  • N8N: A workflow automation tool to process and ingest files, orchestrating parsing (Marker for PDFs, ebooklib for ePubs), metadata storage (Supabase), and vector embedding (Qdrant). (cf. Docker Installation)
  • Supabase: A local PostgreSQL-based backend for storing document metadata and parsed Markdown, with the pgvector extension for potential secondary vector storage, though Qdrant is the primary vector database. (cf. Supabase Self-Hosting with Docker; also pgvector)
  • Qdrant: An open-source vector database for storing embeddings of document chunks, enabling similarity searches for RAG. Embeddings are generated using Ollama’s nomic-embed-text model. (cf. How to Get Started with Qdrant Locally)
  • Ollama: A tool for running LLMs and embedding models locally, such as llama3.2:3b, qwen2.5:14b, or others (e.g., deepseek-r1:14b, pending availability), for response generation and embeddings. Models will be tested to determine the best fit for accuracy and performance. (cf. ollama/ollama on Docker Hub)
  • Open WebUI: A self-hosted AI platform with built-in RAG support, designed to operate offline. It integrates with Qdrant for vector searches (configurable via environment variables, defaulting to Chroma) and Ollama for generation, allowing queries over the NAS knowledge base. (cf. RAG Features, Vector Store Configuration)

Notes:

  • All components run locally via Docker, ensuring an offline setup.
  • Documents are parsed (PDFs via Marker, ePubs via ebooklib), chunked into ~512-character segments, embedded in Qdrant, and queried through Open WebUI with Ollama for context-aware responses.
  • Hardware considerations: Marker with --use_llm requires ~3GB VRAM; larger Ollama models (e.g., qwen2.5:14b) need 8-16GB RAM.

Document Processing

Given my files are mostly PDFs and ePubs, processing them is the first step. My research indicates took the following into consideration:

Feature LlamaParse PyPDF2 Unstructured.io Marker ebooklib
File Types PDF, ePub, .docx, .pptx, .rtf, .pages, etc. PDF only PDF, .docx, .txt, images, HTML, and more PDF (ePubs with preprocessing) ePub only
ePub Support Native support, extracts text effectively Not supported Limited, may require preprocessing Limited, requires text conversion (e.g., via ebooklib) Native, extracts text and metadata
Output Format Markdown, text, JSON (structured) Raw text Text, JSON with metadata Markdown, JSON Text, HTML (convertible to Markdown)
Complex Elements Tables, images, equations, with context-aware parsing via LLM Basic text, struggles with tables/images Tables, images, but less precise for complex layouts Tables, math, figures (enhanced with Ollama --use_llm) Basic text, no tables/images
Custom Instructions Natural language prompts for parsing (e.g., “summarize tables”) None Limited, relies on predefined rules None, but Ollama integration allows some flexibility None
RAG Integration Built for RAG, integrates with vector stores like Qdrant Manual post-processing needed Good for RAG, but less seamless than LlamaParse Excellent for RAG, Markdown output suits Qdrant/Open WebUI Requires post-processing for RAG
Ease of Use High, with API and Python SDK, but requires API key Simple for basic PDFs, Python-based Moderate, requires configuration for advanced features Moderate, needs Python 3.10+, PyTorch setup Simple, Python-based, but ePub-specific
Local Deployment Cloud-based API, local via Docker possible but less common Fully local Fully local, open-source Fully local, open-source Fully local, open-source
Cost Free tier (1,000 pages/day), then $0.003/page Free, open-source Free, open-source Free, open-source Free, open-source
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment