sourangshupal/prod_rag_stack.md

Created January 30, 2026 05:06

Star (3) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sourangshupal/6eaea62bc8335d5ade0e95f345de2497.js"></script>
Save sourangshupal/6eaea62bc8335d5ade0e95f345de2497 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

prod_rag_stack.md

Production RAG Stack

Document Processing

Docling / Unstructured / PyMuPDF / Llamaparse / Azure Document Intelligence

Chunking + Metadata

LangChain/LlamaIndex/Chonkie/Doclings chunkers
GLiNER for metadata extraction

Embeddings

BGE-M3 / Voyage / Cohere v3
Text-Embedding-3 (OpenAI)

Vector Database

Milvus/Zilliz (HNSW/IVF)
Qdrant / Weaviate

Hybrid Retrieval

BM25 (sparse)
SPLADE++ (learned sparse)
Dense embeddings
Late interaction (ColBERT)

Reranking

BGE-reranker-v2
Cohere rerank
ColBERTv2

LLM Serving

vLLM / Ollama
TGI / OpenAI API

Orchestration

LangChain / LlamaIndex
Haystack / DSPy

Production Ops

Eval: RAGAS, DeepEval, Opik
Logging: LangSmith, Phoenix, W&B
RBAC: FastAPI + Auth0/Cognito
Backups: Vector DB snapshots + S3
Deploy: AWS (ECS/Lambda) / Modal
Monitoring: Prometheus + Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment