RAG refinement

Current approach: RAG with Open WebUI

RAG is a technique that enhances large language models (LLMs) by retrieving relevant information from a document base to improve response accuracy.

Based on the Open WebUI RAG Tutorial (Open WebUI RAG Tutorial), my current approach to Retrieval-Augmented Generation (RAG) leverages Open WebUI’s built-in knowledge base feature to enhance LLM responses with context from my NAS files (Markdown, PDF, and ePub).

Documents are uploaded to a named knowledge base (e.g., “NAS Documents”) in Open WebUI, processed into chunks, and stored in a vector database (the default Chroma). Using a custom model configured with this knowledge base, queries prefixed with # (e.g., #NAS Documents How do I configure Docker?) retrieve relevant document chunks via Chroma, which Ollama’s LLM (e.g., deepseek-r1:14b) uses to generate accurate, context-aware responses.

Cons of Current RAG Approach

Manual Upload: Uploading files to Open WebUI’s knowledge base is manual, requiring drag-and-drop or API calls, which is time-consuming for large NAS collections compared to fully automated ingestion.
Query Syntax: Users must use the #knowledge_base_name syntax for RAG queries, which may confuse non-technical users or require training.
Chunking Limitations: Open WebUI’s default chunking may not optimize for all document types (e.g., dense PDFs), potentially affecting retrieval accuracy.

Advanced approach: Setup overview

Document Processing with N8N: Using N8N to monitor the NAS, extract text from PDFs and ePubs, and prepare them for RAG. For PDFs, tools like Marker can help, while ePubs may need text extraction using ebooklib before loading.
Storage and Retrieval: Store document metadata in your local Supabase instance and use Qdrant for vector embeddings, enabling efficient similarity searches for RAG.
RAG and Interaction: Configure Open WebUI to use Qdrant as the vector store and Ollama for running the LLM locally. This setup allows users to interact with the RAG system through a user-friendly interface, leveraging your document knowledge base.

Advanced approach: RAG and components

My future setup involves:

NAS Files: Primarily Markdown, PDF, and ePub documents, serving as the knowledge base for a Retrieval-Augmented Generation (RAG) system.
Probably being replaced by Docling. Marker: An open-source tool for converting PDFs to Markdown and JSON, optimized for complex documents like scientific papers. It uses deep learning for text, table, and layout extraction, with optional Ollama integration (e.g., llama3.2-vision:3b (lightweight), llava:13b (probably best accuracy), qwen2-vision:7b, bakllava:7b) to enhance accuracy for tables and math. Runs fully locally, making it ideal for offline RAG pipelines. (GitHub - VikParuchuri/marker
ebooklib: A Python library for reading and parsing ePub files, extracting text and metadata from their HTML-based structure. Used to preprocess ePubs in the RAG pipeline, converting them to text or Markdown for further processing by tools like Marker or direct embedding. (GitHub - aerkalov/ebooklib)
N8N: A workflow automation tool to process and ingest files, orchestrating parsing (Marker for PDFs, ebooklib for ePubs), metadata storage (Supabase), and vector embedding (Qdrant). (cf. Docker Installation)
Supabase: A local PostgreSQL-based backend for storing document metadata and parsed Markdown, with the pgvector extension for potential secondary vector storage, though Qdrant is the primary vector database. (cf. Supabase Self-Hosting with Docker; also pgvector)
Qdrant: An open-source vector database for storing embeddings of document chunks, enabling similarity searches for RAG. Embeddings are generated using Ollama’s nomic-embed-text model. (cf. How to Get Started with Qdrant Locally)
Ollama: A tool for running LLMs and embedding models locally, such as llama3.2:3b, qwen2.5:14b, or others (e.g., deepseek-r1:14b, pending availability), for response generation and embeddings. Models will be tested to determine the best fit for accuracy and performance. (cf. ollama/ollama on Docker Hub)
Open WebUI: A self-hosted AI platform with built-in RAG support, designed to operate offline. It integrates with Qdrant for vector searches (configurable via environment variables, defaulting to Chroma) and Ollama for generation, allowing queries over the NAS knowledge base. (cf. RAG Features, Vector Store Configuration)

Notes:

All components run locally via Docker, ensuring an offline setup.
Documents are parsed (PDFs via Marker, ePubs via ebooklib), chunked into ~512-character segments, embedded in Qdrant, and queried through Open WebUI with Ollama for context-aware responses.
Hardware considerations: Marker with --use_llm requires ~3GB VRAM; larger Ollama models (e.g., qwen2.5:14b) need 8-16GB RAM.

Document Processing

Given my files are mostly PDFs and ePubs, processing them is the first step. My research indicates took the following into consideration:

Feature	LlamaParse	PyPDF2	Unstructured.io	Marker	ebooklib
File Types	PDF, ePub, .docx, .pptx, .rtf, .pages, etc.	PDF only	PDF, .docx, .txt, images, HTML, and more	PDF (ePubs with preprocessing)	ePub only
ePub Support	Native support, extracts text effectively	Not supported	Limited, may require preprocessing	Limited, requires text conversion (e.g., via ebooklib)	Native, extracts text and metadata
Output Format	Markdown, text, JSON (structured)	Raw text	Text, JSON with metadata	Markdown, JSON	Text, HTML (convertible to Markdown)
Complex Elements	Tables, images, equations, with context-aware parsing via LLM	Basic text, struggles with tables/images	Tables, images, but less precise for complex layouts	Tables, math, figures (enhanced with Ollama `--use_llm`)	Basic text, no tables/images
Custom Instructions	Natural language prompts for parsing (e.g., “summarize tables”)	None	Limited, relies on predefined rules	None, but Ollama integration allows some flexibility	None
RAG Integration	Built for RAG, integrates with vector stores like Qdrant	Manual post-processing needed	Good for RAG, but less seamless than LlamaParse	Excellent for RAG, Markdown output suits Qdrant/Open WebUI	Requires post-processing for RAG
Ease of Use	High, with API and Python SDK, but requires API key	Simple for basic PDFs, Python-based	Moderate, requires configuration for advanced features	Moderate, needs Python 3.10+, PyTorch setup	Simple, Python-based, but ePub-specific
Local Deployment	Cloud-based API, local via Docker possible but less common	Fully local	Fully local, open-source	Fully local, open-source	Fully local, open-source
Cost	Free tier (1,000 pages/day), then $0.003/page	Free, open-source	Free, open-source	Free, open-source	Free, open-source

axelquack/localRag.md

Select an option

No results found