Skip to content

Instantly share code, notes, and snippets.

@sourangshupal
Created January 30, 2026 05:06
Show Gist options
  • Select an option

  • Save sourangshupal/6eaea62bc8335d5ade0e95f345de2497 to your computer and use it in GitHub Desktop.

Select an option

Save sourangshupal/6eaea62bc8335d5ade0e95f345de2497 to your computer and use it in GitHub Desktop.

Production RAG Stack

Document Processing

  • Docling / Unstructured / PyMuPDF / Llamaparse / Azure Document Intelligence

Chunking + Metadata

  • LangChain/LlamaIndex/Chonkie/Doclings chunkers
  • GLiNER for metadata extraction

Embeddings

  • BGE-M3 / Voyage / Cohere v3
  • Text-Embedding-3 (OpenAI)

Vector Database

  • Milvus/Zilliz (HNSW/IVF)
  • Qdrant / Weaviate

Hybrid Retrieval

  • BM25 (sparse)
  • SPLADE++ (learned sparse)
  • Dense embeddings
  • Late interaction (ColBERT)

Reranking

  • BGE-reranker-v2
  • Cohere rerank
  • ColBERTv2

LLM Serving

  • vLLM / Ollama
  • TGI / OpenAI API

Orchestration

  • LangChain / LlamaIndex
  • Haystack / DSPy

Production Ops

  • Eval: RAGAS, DeepEval, Opik
  • Logging: LangSmith, Phoenix, W&B
  • RBAC: FastAPI + Auth0/Cognito
  • Backups: Vector DB snapshots + S3
  • Deploy: AWS (ECS/Lambda) / Modal
  • Monitoring: Prometheus + Grafana
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment