AI Project Document Generator & Parser - Architecture Discussion

Date: November 6, 2025
Role: Head of Engineering
Context: RAG, MCP, and LLM expertise

Project Overview

Goal

Build an AI-based project document generator and parser that can:

Generate project documents from scratch
Parse uploaded documents and provide clear explanations
Review documents for completeness and consistency

Target Document Types

Technical specifications
Proposals
Project documents for review

Constraints & Infrastructure

VM Specifications (Ubuntu 20.04.6 LTS):

CPU: 6-Core AMD EPYC (Zen architecture)
RAM: 16 GB
Storage: 400 GB (390 GB available)
Network: Internet connected
Virtualization: QEMU/KVM
Container Platform: Docker (already running multiple containers)

Development Environment:

MacBook Pro M1, 16GB RAM

Architecture Design

Core Stack Recommendation

┌─────────────────────────────────────┐
│      FastAPI Server (Port 8000)     │
│    Document API + Review Engine     │
└──────────────┬──────────────────────┘
               │
      ┌────────┴────────┐
      │                 │
┌─────▼──────┐   ┌─────▼─────┐
│   Ollama   │   │  Qdrant   │
│ (Port 11434)│   │(Port 6333)│
│            │   │           │
│ - mistral  │   │ Vector DB │
│ - nomic    │   │           │
└────────────┘   └───────────┘

Technology Stack

1. LLM Layer: Ollama (Self-hosted)

Model: mistral:7b-instruct or deepseek-coder:6.7b
Embeddings: nomic-embed-text
Memory Footprint: ~8GB RAM
Reasoning: Self-hosted for privacy, client document compliance

2. Vector Database: Qdrant

Deployment: Docker container
Memory: 512MB-1GB for POC, scales to 2GB+
Collections:
- technical_specs - Technical specification embeddings
- proposals - Proposal documents
- templates - High-quality reference examples

3. Document Processing

Parser: Unstructured.io (handles PDF/DOCX/MD without format assumptions)
Alternative: LlamaParse (better for complex layouts)
Orchestration: LangChain or FastAPI + direct Ollama calls

4. API Layer

Framework: FastAPI
Port: 8000
Features: Document upload, generation, review endpoints

Review Workflow Requirements

Completeness Checks

Required Sections:
- Project Overview/Introduction
- Requirements (Functional/Non-functional)
- Architecture/Technical Design
- Timeline/Milestones
- ~~RACI Matrix~~ (Deferred to future phase)
Technical Components Coverage:
- All mentioned technologies are addressed
- Architecture decisions are documented
- Dependencies are identified

Consistency Validation

Cross-reference validation between sections
Technical terminology consistency
Version/date consistency

Milestone Extraction

Identify timeline commitments
Extract deliverables
Map dependencies

Implementation Strategy

Two-Stage Pipeline

Stage 1: Structure Extraction & Validation

Document → Parse → LLM Structure Extraction → Validate Checklist

Purpose:

Identify document sections
Check completeness against required sections
Extract metadata (page ranges, section hierarchy)

Output:

{
  "sections": [
    {
      "name": "Requirements",
      "present": true,
      "page_range": "5-12",
      "completeness_score": 0.85
    },
    {
      "name": "RACI Matrix",
      "present": false,
      "completeness_score": 0.0
    }
  ]
}

Stage 2: RAG for Technical Verification

Identified Sections → Chunk → Embed → Semantic Search → Technical Validation

Purpose:

Deep technical content analysis
Consistency checking across document
Compare against similar historical documents (when available)

POC Approach

Phase 1: Single Document Validation

Goal: Validate architecture with one client document

Steps:

Set up Docker infrastructure
Parse single document
Extract structure
Generate review report
Iterate on prompts

Success Criteria:

Successfully parse document structure
Identify missing sections
Generate actionable review comments

Phase 2: Scale to Document Repository (Future)

Ingest historical documents
Build RAG knowledge base
Comparative analysis capabilities

Docker Compose Configuration

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: doc_ai_ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

  qdrant:
    image: qdrant/qdrant:latest
    container_name: doc_ai_qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped

  api:
    build: ./app
    container_name: doc_ai_api
    ports:
      - "8000:8000"
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - QDRANT_HOST=qdrant
      - QDRANT_PORT=6333
    volumes:
      - ./app:/app
      - uploads:/app/uploads
    depends_on:
      - ollama
      - qdrant
    restart: unless-stopped

volumes:
  ollama_data:
  qdrant_data:
  uploads:

Project Structure

project/
├── docker-compose.yml
├── app/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py                      # FastAPI application
│   ├── services/
│   │   ├── parser.py                # Document parsing logic
│   │   ├── structure_extractor.py   # Section detection & classification
│   │   ├── embedder.py              # Embedding service (Ollama)
│   │   └── reviewer.py              # Review orchestration
│   ├── prompts/
│   │   ├── structure.txt            # Structure extraction prompt
│   │   └── review.txt               # Review prompt templates
│   └── models/
│       └── schemas.py               # Pydantic models
└── README.md

Key Dependencies

fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
unstructured[pdf]==0.11.0
qdrant-client==1.7.0
langchain==0.1.0
langchain-community==0.0.10
ollama==0.1.6
pydantic==2.5.0
python-docx==1.1.0
PyPDF2==3.0.1

Use Case Flows

Use Case 1: Generate New Document

User Prompt
  ↓
Retrieve Similar Specs (RAG)
  ↓
Extract Common Structure/Patterns
  ↓
LLM Generation with Context
  ↓
Format with Templates
  ↓
Return Generated Document

Use Case 2: Upload & Explain

Upload PDF/DOCX
  ↓
Parse Structure (Unstructured.io)
  ↓
Extract Sections with Metadata
  ↓
Embed Chunks (nomic-embed-text)
  ↓
Store in Qdrant
  ↓
Semantic Search for Similar Docs
  ↓
LLM Explains with Comparative Context
  ↓
Return Explanation Report

Use Case 3: Review Document

Document + Review Criteria
  ↓
Extract Key Sections (Structure Extractor)
  ↓
Validate Completeness Checklist
  ↓
Retrieve Best Practices (RAG)
  ↓
Technical Consistency Check
  ↓
Identify Gaps/Inconsistencies
  ↓
Generate Structured Review Report

Prompt Engineering Strategy

Structure Extraction Prompt Template

Extract and classify sections from this technical document:

Required Sections:
- Project Overview/Introduction
- Requirements (Functional/Non-functional)
- Architecture/Technical Design
- Timeline/Milestones

For each section found, return:
{
  "name": "section name",
  "present": true/false,
  "page_range": "start-end",
  "content_summary": "brief summary",
  "completeness_score": 0.0-1.0
}

Document Content:
{document_text}

Review Prompt Template

Review this technical specification for:

1. Completeness:
   - All required sections present
   - Technical components adequately covered
   - Timeline/milestones clearly defined

2. Consistency:
   - Technical terminology usage
   - Cross-references validate
   - Version/date consistency

3. Technical Accuracy:
   - Architecture decisions justified
   - Technology choices appropriate
   - Dependencies identified

Document Sections:
{structured_sections}

Provide structured feedback with:
- Missing elements
- Inconsistencies found
- Recommendations

Privacy & Compliance Considerations

Security Requirements

Self-hosted LLM: All processing on-premises
No external API calls: Client documents never leave infrastructure
Data isolation: Each client project in separate Qdrant collection
Access control: (To be implemented in production)

Document Handling

Encrypted uploads (HTTPS)
Temporary storage only during processing
Option to purge after review
Audit logging for compliance

Resource Allocation

Memory Budget (16GB VM)

Ollama (Mistral 7B):     ~8GB
Qdrant:                  ~1-2GB
FastAPI + Workers:       ~2GB
System + Docker:         ~3GB
Buffer:                  ~2GB
------------------------
Total:                   ~16GB

Storage Requirements

Ollama Models: ~4-8GB per model
Qdrant Collections: Scales with document volume
POC Estimate: 10-20GB for single document testing

Next Steps

Immediate Actions

Infrastructure Setup:
- Deploy docker-compose stack on VM
- Pull Ollama models (mistral:7b-instruct, nomic-embed-text)
- Verify Qdrant connectivity
Code Implementation:
- Build FastAPI skeleton
- Implement document parser
- Create structure extractor service
Prompt Engineering:
- Develop structure extraction prompts
- Create review prompt templates
- Test with sample document

Decision Point: Implementation Approach

Choose one of the following to proceed:

Option A: Full Implementation

Complete parser → structure extractor → RAG → review pipeline
Estimated time: 2-3 days
Best for: Complete POC validation

Option B: Critical Components First

Structure extraction + review prompts only
Estimated time: 1 day
Best for: Quick validation of concept

Option C: Infrastructure + Iterative

Setup stack, then iterate on logic incrementally
Estimated time: Ongoing
Best for: Learning and refinement

Open Questions & Considerations

Document Structure

Q: Do historical documents follow any patterns?
A: No specific format - adaptive parsing required

Scale Considerations

Current: Single document POC
Future: Multi-document repository with comparative analysis

RACI Integration

Status: Deferred to future phase
Reason: Complexity of table extraction and inference
Future: Will need to decide between:
- RACI table extraction (if provided)
- RACI inference (if generated from content)

Architecture Benefits

Why This Stack?

Resource Efficient: Fits comfortably in 16GB RAM
Privacy Compliant: Fully self-hosted, no external dependencies
Offline Capable: All inference on-premises
MCP-Ready: Can expose tools via MCP servers in future iterations
Scalable: Can swap Ollama for API calls if cloud deployment needed
Docker-based: Consistent development across Mac M1 → VM deployment

Development Workflow

Local (Mac M1)              Remote (VM)
    ↓                           ↓
Docker Compose          Docker Compose
    ↓                           ↓
Hot Reload Dev    →     Deploy to Production
    ↓                           ↓
Test Locally            Client Documents

Success Metrics

POC Validation

Successfully parse uploaded document
Extract document structure with 80%+ accuracy
Identify missing sections
Generate actionable review comments
Process document in <2 minutes

Production Readiness (Future)

Process 10+ documents in repository
Comparative analysis across documents
API response time <30s for review
Accuracy validation against manual reviews

Resources & References

Ollama Models

Vector Database

Document Processing

Notes & Decisions Log

2025-11-06 - Initial Architecture Discussion

Decision: Two-stage pipeline (structure extraction → RAG validation)
Rationale: No standardized format requires structure detection before technical analysis
Deferred: RACI matrix extraction (complexity vs POC scope)
Agreed: Start with single document POC before scaling

Key Architectural Choices

Self-hosted over API: Privacy/compliance requirement
Ollama over OpenAI: On-premises, cost control, data sovereignty
Structure-first approach: Necessary for completeness validation without standard format
Docker Compose: Deployment consistency, already familiar infrastructure

Contact & Collaboration

Project Lead: Rannie (Head of Engineering)
Expertise: RAG, MCP, LLM, Laravel/PHP, DevOps, System Architecture
Current Focus: Malta GPG integration, ETL pipelines, multi-PHP setups

Document Status: Architecture Planning Complete - Awaiting Implementation Decision

einnar82/claude.md