Date: November 6, 2025
Role: Head of Engineering
Context: RAG, MCP, and LLM expertise
Build an AI-based project document generator and parser that can:
- Generate project documents from scratch
- Parse uploaded documents and provide clear explanations
- Review documents for completeness and consistency
- Technical specifications
- Proposals
- Project documents for review
VM Specifications (Ubuntu 20.04.6 LTS):
- CPU: 6-Core AMD EPYC (Zen architecture)
- RAM: 16 GB
- Storage: 400 GB (390 GB available)
- Network: Internet connected
- Virtualization: QEMU/KVM
- Container Platform: Docker (already running multiple containers)
Development Environment:
- MacBook Pro M1, 16GB RAM
┌─────────────────────────────────────┐
│ FastAPI Server (Port 8000) │
│ Document API + Review Engine │
└──────────────┬──────────────────────┘
│
┌────────┴────────┐
│ │
┌─────▼──────┐ ┌─────▼─────┐
│ Ollama │ │ Qdrant │
│ (Port 11434)│ │(Port 6333)│
│ │ │ │
│ - mistral │ │ Vector DB │
│ - nomic │ │ │
└────────────┘ └───────────┘
1. LLM Layer: Ollama (Self-hosted)
- Model:
mistral:7b-instructordeepseek-coder:6.7b - Embeddings:
nomic-embed-text - Memory Footprint: ~8GB RAM
- Reasoning: Self-hosted for privacy, client document compliance
2. Vector Database: Qdrant
- Deployment: Docker container
- Memory: 512MB-1GB for POC, scales to 2GB+
- Collections:
technical_specs- Technical specification embeddingsproposals- Proposal documentstemplates- High-quality reference examples
3. Document Processing
- Parser: Unstructured.io (handles PDF/DOCX/MD without format assumptions)
- Alternative: LlamaParse (better for complex layouts)
- Orchestration: LangChain or FastAPI + direct Ollama calls
4. API Layer
- Framework: FastAPI
- Port: 8000
- Features: Document upload, generation, review endpoints
-
Required Sections:
- Project Overview/Introduction
- Requirements (Functional/Non-functional)
- Architecture/Technical Design
- Timeline/Milestones
RACI Matrix(Deferred to future phase)
-
Technical Components Coverage:
- All mentioned technologies are addressed
- Architecture decisions are documented
- Dependencies are identified
- Cross-reference validation between sections
- Technical terminology consistency
- Version/date consistency
- Identify timeline commitments
- Extract deliverables
- Map dependencies
Document → Parse → LLM Structure Extraction → Validate Checklist
Purpose:
- Identify document sections
- Check completeness against required sections
- Extract metadata (page ranges, section hierarchy)
Output:
{
"sections": [
{
"name": "Requirements",
"present": true,
"page_range": "5-12",
"completeness_score": 0.85
},
{
"name": "RACI Matrix",
"present": false,
"completeness_score": 0.0
}
]
}Identified Sections → Chunk → Embed → Semantic Search → Technical Validation
Purpose:
- Deep technical content analysis
- Consistency checking across document
- Compare against similar historical documents (when available)
Goal: Validate architecture with one client document
Steps:
- Set up Docker infrastructure
- Parse single document
- Extract structure
- Generate review report
- Iterate on prompts
Success Criteria:
- Successfully parse document structure
- Identify missing sections
- Generate actionable review comments
- Ingest historical documents
- Build RAG knowledge base
- Comparative analysis capabilities
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: doc_ai_ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
qdrant:
image: qdrant/qdrant:latest
container_name: doc_ai_qdrant
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
restart: unless-stopped
api:
build: ./app
container_name: doc_ai_api
ports:
- "8000:8000"
environment:
- OLLAMA_HOST=http://ollama:11434
- QDRANT_HOST=qdrant
- QDRANT_PORT=6333
volumes:
- ./app:/app
- uploads:/app/uploads
depends_on:
- ollama
- qdrant
restart: unless-stopped
volumes:
ollama_data:
qdrant_data:
uploads:project/
├── docker-compose.yml
├── app/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py # FastAPI application
│ ├── services/
│ │ ├── parser.py # Document parsing logic
│ │ ├── structure_extractor.py # Section detection & classification
│ │ ├── embedder.py # Embedding service (Ollama)
│ │ └── reviewer.py # Review orchestration
│ ├── prompts/
│ │ ├── structure.txt # Structure extraction prompt
│ │ └── review.txt # Review prompt templates
│ └── models/
│ └── schemas.py # Pydantic models
└── README.md
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
unstructured[pdf]==0.11.0
qdrant-client==1.7.0
langchain==0.1.0
langchain-community==0.0.10
ollama==0.1.6
pydantic==2.5.0
python-docx==1.1.0
PyPDF2==3.0.1User Prompt
↓
Retrieve Similar Specs (RAG)
↓
Extract Common Structure/Patterns
↓
LLM Generation with Context
↓
Format with Templates
↓
Return Generated Document
Upload PDF/DOCX
↓
Parse Structure (Unstructured.io)
↓
Extract Sections with Metadata
↓
Embed Chunks (nomic-embed-text)
↓
Store in Qdrant
↓
Semantic Search for Similar Docs
↓
LLM Explains with Comparative Context
↓
Return Explanation Report
Document + Review Criteria
↓
Extract Key Sections (Structure Extractor)
↓
Validate Completeness Checklist
↓
Retrieve Best Practices (RAG)
↓
Technical Consistency Check
↓
Identify Gaps/Inconsistencies
↓
Generate Structured Review Report
Extract and classify sections from this technical document:
Required Sections:
- Project Overview/Introduction
- Requirements (Functional/Non-functional)
- Architecture/Technical Design
- Timeline/Milestones
For each section found, return:
{
"name": "section name",
"present": true/false,
"page_range": "start-end",
"content_summary": "brief summary",
"completeness_score": 0.0-1.0
}
Document Content:
{document_text}
Review this technical specification for:
1. Completeness:
- All required sections present
- Technical components adequately covered
- Timeline/milestones clearly defined
2. Consistency:
- Technical terminology usage
- Cross-references validate
- Version/date consistency
3. Technical Accuracy:
- Architecture decisions justified
- Technology choices appropriate
- Dependencies identified
Document Sections:
{structured_sections}
Provide structured feedback with:
- Missing elements
- Inconsistencies found
- Recommendations
- Self-hosted LLM: All processing on-premises
- No external API calls: Client documents never leave infrastructure
- Data isolation: Each client project in separate Qdrant collection
- Access control: (To be implemented in production)
- Encrypted uploads (HTTPS)
- Temporary storage only during processing
- Option to purge after review
- Audit logging for compliance
Ollama (Mistral 7B): ~8GB
Qdrant: ~1-2GB
FastAPI + Workers: ~2GB
System + Docker: ~3GB
Buffer: ~2GB
------------------------
Total: ~16GB
- Ollama Models: ~4-8GB per model
- Qdrant Collections: Scales with document volume
- POC Estimate: 10-20GB for single document testing
-
Infrastructure Setup:
- Deploy docker-compose stack on VM
- Pull Ollama models (
mistral:7b-instruct,nomic-embed-text) - Verify Qdrant connectivity
-
Code Implementation:
- Build FastAPI skeleton
- Implement document parser
- Create structure extractor service
-
Prompt Engineering:
- Develop structure extraction prompts
- Create review prompt templates
- Test with sample document
Choose one of the following to proceed:
Option A: Full Implementation
- Complete parser → structure extractor → RAG → review pipeline
- Estimated time: 2-3 days
- Best for: Complete POC validation
Option B: Critical Components First
- Structure extraction + review prompts only
- Estimated time: 1 day
- Best for: Quick validation of concept
Option C: Infrastructure + Iterative
- Setup stack, then iterate on logic incrementally
- Estimated time: Ongoing
- Best for: Learning and refinement
- Q: Do historical documents follow any patterns?
- A: No specific format - adaptive parsing required
- Current: Single document POC
- Future: Multi-document repository with comparative analysis
- Status: Deferred to future phase
- Reason: Complexity of table extraction and inference
- Future: Will need to decide between:
- RACI table extraction (if provided)
- RACI inference (if generated from content)
- Resource Efficient: Fits comfortably in 16GB RAM
- Privacy Compliant: Fully self-hosted, no external dependencies
- Offline Capable: All inference on-premises
- MCP-Ready: Can expose tools via MCP servers in future iterations
- Scalable: Can swap Ollama for API calls if cloud deployment needed
- Docker-based: Consistent development across Mac M1 → VM deployment
Local (Mac M1) Remote (VM)
↓ ↓
Docker Compose Docker Compose
↓ ↓
Hot Reload Dev → Deploy to Production
↓ ↓
Test Locally Client Documents
- Successfully parse uploaded document
- Extract document structure with 80%+ accuracy
- Identify missing sections
- Generate actionable review comments
- Process document in <2 minutes
- Process 10+ documents in repository
- Comparative analysis across documents
- API response time <30s for review
- Accuracy validation against manual reviews
- Decision: Two-stage pipeline (structure extraction → RAG validation)
- Rationale: No standardized format requires structure detection before technical analysis
- Deferred: RACI matrix extraction (complexity vs POC scope)
- Agreed: Start with single document POC before scaling
- Self-hosted over API: Privacy/compliance requirement
- Ollama over OpenAI: On-premises, cost control, data sovereignty
- Structure-first approach: Necessary for completeness validation without standard format
- Docker Compose: Deployment consistency, already familiar infrastructure
Project Lead: Rannie (Head of Engineering)
Expertise: RAG, MCP, LLM, Laravel/PHP, DevOps, System Architecture
Current Focus: Malta GPG integration, ETL pipelines, multi-PHP setups
Document Status: Architecture Planning Complete - Awaiting Implementation Decision