In this milestone, I focused on finalizing the core RAG functionality by implementing the previously discussed features.
I developed a Gradio interface to interact with the chatbot and visually inspect the results, including the sources used to answer each query. These sources are passed through a reranking model to ensure that only the most relevant ones are selected before being sent to the LLM.
-
Developed using Gradio for interactive exploration of BeagleBoard documentation.
-
Markdown-rendered answer area, supporting:
- Clean formatting (bold text, bullet points, inline code)
- Clickable links to documentation files
- Embedded images and references
The user can ask questions like "How to blink an LED using BeagleBoard?", and the system responds with relevant reformulated documentation snippets.
-
Right panel labeled Sources & References displays:
- File name, path, and clickable GitHub link
- Scoring metrics: Composite, Rerank, and Content Quality
- Formatted content preview for readability
Note: Reranking model implementation details are in retrieval.py.
The src directory contains the core logic for the QA system:
main.py: Entry pointgradio_app.py: Gradio UIqa_system.py: Manages end-to-end QA pipelineretrieval.py: Document retrieval with scoringsearch_vectorstore.py: Searches Milvus vector storegithub_direct_ingester.py: Pulls data from GitHub reposgraph_qa.py: Prototype for graph-based QArouter.py: Routes requestsconfig.py: Stores configs and model params
To streamline integration with OpenBeagle:
-
Integrated with GitLab CI for continuous deployment.
-
Pipeline (WIP) to include automated testing and linting to:
- Maintain code quality
- Catch errors early
- Ensure deployment consistency
- Chose Milvus for the vector store, based on benchmarking with VectorDBBench
- Prioritized scalability, performance, and ecosystem support
- Used Chonkie, which uses the CHOMP pipeline for modular and semantic chunking
- Improves granularity and chunk retrieval accuracy
-
Proposing a Graph-RAG approach:
- Uses structured relationships (e.g., diagrams ↔ code ↔ documentation)
- Inspired by GRAG Paper
- Scripts in development to benchmark impact
- Primary:
BAAI/bge-large-en - Reranker Candidate:
all-MiniLM-L6-v2
- Sourced from BeagleBoard GitHub repos (code, diagrams, documentation)
- Indexing metadata to enable semantically rich queries
- 24B params, instruction-tuned, quantized
- Handles 128K tokens
- Ideal for local deployment (e.g., 4090 GPU)
- Strong multilingual and code performance
- Developed by Microsoft
- Excels in reasoning and competitive programming
- Compact with performance matching larger models
- Optimized for code gen and repair
- Supports 131K context length
- Memory-efficient with 4-bit quantization
- Dual-mode reasoning
- Agentic task support
- Ideal for long-form processing and tool-based interaction
- Model size
- MMLU, MBPP, EvalPlus, MATH
- Inference speed (tokens/s)
- Google Colab Pro: Good for prototyping and tuning
- HF Inference Endpoints: Considered for hosting (higher cost)
- Unsloth: For efficient 4-bit fine-tuning
-
GitHub Scraper + API
-
Markdown, code, PDF processing
-
External sources:
- eLinux.org
- Datasheets
- Community forums (if allowed)
-
QA pair generation via:
- Manual annotation
- LLM-based synthetic prompts (verified)
- Evaluates reasoning, planning, tool usage
Tools considered:
- DeepEval
- Opik
- JudgeLM
- AgentBench
- Perplexity
- BLEU / ROUGE
- F1 Score
- BERTScore
- Exact Match (EM)
- Latency / Throughput
Working on improving the RAG architecture using advanced retrieval methods:
- Graph RAG (GRAG): Leverages structured entity relationships
- Contextual Semantic Search: Uses semantic embeddings + cross-encoders
- Dense Passage Retrieval (DPR): Efficient dual-encoder retrieval
These approaches aim to outperform traditional RAG by improving the contextual relevance and accuracy of responses, especially on the BeagleBoard dataset.
➡️ Demo will be recorded in an introductory video.