AI coding tools like Cursor, Claude Code, Windsurf, and Visual Studio Code (with Copilot) enhance productivity by integrating user prompts (e.g., "Refactor this function to use async/await and improve readability") with codebase context. At their core, they rely on Retrieval-Augmented Generation (RAG) or similar mechanisms to fetch relevant code snippets, documentation, or patterns from a vector database or indexed store. This context is then augmented into the prompt sent to a Large Language Model (LLM) like Claude 3.5 Sonnet or GPT-4o.
The general flow is:
- Indexing/Ingestion: Codebase is chunked (e.g., by functions or AST nodes) and embedded into vectors.
- Retrieval: User's prompt is embedded and queried against the vector DB for similar chunks.
- Augmentation: Retrieved context (e.g., related files, git history) is injected into the LLM prompt.
- Generation: LLM produces refactored code, which the tool applies (with d