Skip to content

Instantly share code, notes, and snippets.

@xbalajipge
Last active November 8, 2025 03:47
Show Gist options
  • Select an option

  • Save xbalajipge/79b8ff5c324e851b414224a62e8384b3 to your computer and use it in GitHub Desktop.

Select an option

Save xbalajipge/79b8ff5c324e851b414224a62e8384b3 to your computer and use it in GitHub Desktop.
ai-mcp-end2end.md

The Architecture of Modern AI Systems: Introducing the Model Context Protocol (MCP)

1. Foundations of AI and the Large Language Model

1.1 Defining the Core Components of Modern AI

Artificial Intelligence (AI) is the broad field of computer science dedicated to building machines that can perform tasks normally requiring human intelligence. Within this landscape, Generative AI and Large Language Models (LLMs) represent a significant leap forward.

A Large Language Model (LLM), such as Claude, GPT-4, or Llama, is a type of generative model trained on vast amounts of text data. It excels at reasoning, understanding context, generating creative text, summarizing, and translating.

However, LLMs possess inherent limitations:

  1. No Real-Time Knowledge: Their knowledge is static, limited to their last training cutoff date.
  2. Poor Execution: They are probability machines, not calculators. They struggle with precise arithmetic, complex code execution, or reliably accessing proprietary, real-time data.

1.2 The Evolution to LLMs with Tools (Function Calling)

To overcome these limitations, the paradigm shifted from a monolithic LLM to a Modular LLM System. This transition introduced the concept of Tool Use or Function Calling.

Tool Use is a critical mechanism where the LLM's primary function is not to answer the user's query directly, but to output a structured data format—typically a JSON object—that specifies which external function should be executed and with what arguments.

Concrete Example:

User Query LLM Internal Output (Tool Call) External Tool Executed
"What is the current stock price of Google?" {"tool_name": "StockTicker", "args": {"symbol": "GOOG"}} A Python script that calls a financial API.
"Analyze my quarterly budget spreadsheet." {"tool_name": "DataAnalyzer", "args": {"file_id": "Q4_Budget", "action": "summarize_spending"}} A local service that runs a statistical analysis library.

2. Introducing AI Agents and the Model Context Protocol (MCP)

2.1 The Concept of AI Agents

An AI Agent elevates the concept of a "tool" by adding autonomy and intelligence. An Agent is an entity that:

  1. Observes its environment (the user query and system context).
  2. Plans a sequence of actions.
  3. Acts by calling external services or running code.
  4. Reflects on the results to achieve a specified goal.

Agents are essential for complex, multi-step tasks (like the sales analysis example), where a single tool call is insufficient.

2.2 The Model Context Protocol (MCP)

As organizations began building hundreds of specialized tools and agents, a problem emerged: Interoperability. Different LLMs and different Agent platforms used proprietary formats, leading to integration complexity.

The Model Context Protocol (MCP) solves this by establishing a standardized, open protocol (often based on JSON-RPC) for defining, describing, registering, and invoking AI capabilities (tools/agents).

MCP Philosophy: MCP acts as the universal "language" spoken by the LLM, the AI Client, and all specialized Agents, enabling a seamless, orchestrated ecosystem. It shifts the AI architecture from a collection of point solutions to a true Multi-Capability Platform.

3. Core Components of an MCP-Based System (Revised)

The MCP architecture distributes intelligence and execution across several specialized components. Understanding their roles is key to building a robust system.

Component Role in the MCP System Concrete Example Functionality
User Initiates the request and consumes the final synthesized response. Submits the query: "Analyze the uploaded sales dataset..."
AI Client (Host Application) The session manager; the application-level logic that orchestrates the flow, handles memory, and prepares the final synthesis. It hosts the MCP Client. Manages the conversation state and aggregates agent results.
MCP Client (Protocol Handler) The specific library or module within the AI Client that implements the MCP JSON-RPC specification. It translates the LLM's function call into a network request and decodes the server's response. Takes the LLM's structured call, resolves the endpoint, and formats the network request to the MCP Server.
LLM (Coordinator/Planner) The central reasoning engine. It translates the user query into a structured multi-step plan, selects the appropriate Agent, and synthesizes the final output. Reasons: I need to call the Data Analyzer, then the Campaign Generator.
MCP Registry The centralized catalog of all available MCP Servers, their network endpoints, and the technical schemas (function signatures) of their exposed capabilities. Returns the URL: https://agents.acme.com/data-analysis/v1 for the SalesDataAnalyzerAgent.
MCP Server An external server or service that hosts one or more MCP Agents and exposes their capabilities via the MCP protocol. The physical machine/container hosting the specialized Python code for statistical analysis.
MCP Agents The specialized entity running on an MCP Server that executes a specific domain task. SalesDataAnalyzerAgent, MarketResearchAgent, CampaignIdeaGeneratorAgent.
Specialist LLMs Models fine-tuned for niche tasks and often integrated by an MCP Agent as part of its internal process. CreativeMarketingLLM used by the CampaignIdeaGeneratorAgent to draft campaign copy.

4. End-to-End Workflow: Sales Analysis and Campaign Generation

4.1 Capability Management: Discovery vs. Runtime Resolution

The system’s ability to execute this task relies on two distinct phases of capability management.

Feature Phase 1: Capability Discovery (Initialization) Phase 2: Runtime Agent Resolution
Timing Pre-session (When the AI Client starts up). In-session (During the execution of the user's request).
Mechanism AI Client queries the MCP Registry for all tool schemas. The LLM performs a dynamic selection of one tool and generates the specific arguments.
Why it's Needed Efficiency & Knowledge: Provides the LLM with a complete, pre-indexed "menu" of capabilities before the query arrives. Context & Execution: Ensures the LLM selects the correct tool (e.g., SalesDataAnalyzer) and provides the precise, context-aware arguments (e.g., the specific dataset ID) needed for execution.

4.2 Step-by-Step Sequence of Interaction (Focus on MCP Client)

User Query: "Analyze the uploaded sales dataset and propose three marketing campaign ideas based on the findings."

Step Component(s) Annotated Action Description & Example Intermediate Output
1.0 User $\rightarrow$ AI Client Query User submits the natural language request. (Dataset ID: Q4_2025_SALES).
2.0 AI Client $\rightarrow$ LLM (Coordinator) Lookup (Context Injection) AI Client sends the Query + the Tool Schemas to the LLM.
3.0 LLM (Coordinator) $\rightarrow$ AI Client LLM Planning LLM reasons: Plan: Call SalesDataAnalyzerAgent first.
4.0 LLM (Coordinator) $\rightarrow$ AI Client Execution (Tool Call Request) LLM generates the structured call: tools/call(agent='SalesDataAnalyzerAgent', method='run_analysis', ...)
5.0 AI Client $\rightarrow$ MCP Registry Runtime Resolution Lookup AI Client queries the Registry for the Agent's network endpoint.
6.0 MCP Registry $\rightarrow$ AI Client Resolution Response Registry returns the Endpoint URL.
6.5 AI Client $\rightarrow$ MCP Client Protocol Assignment AI Client delegates the structured call and the resolved URL to the dedicated protocol handler.
7.0 MCP Client $\rightarrow$ MCP Server (Analyzer Agent) Execution MCP Client formats and sends the MCP JSON-RPC Request.
8.0 MCP Server (Analyzer Agent) $\rightarrow$ MCP Client Agent Output (Raw) Server executes the task and returns the raw MCP JSON-RPC response.
8.5 MCP Client $\rightarrow$ AI Client Protocol Decode MCP Client validates the response and extracts the clean findings: {"finding_1": "50% growth in 'Eco-Friendly' line", "finding_2": "Drop in 'Legacy' sales"}
9.0 AI Client $\rightarrow$ LLM (Coordinator) Aggregation AI Client sends the clean findings back to the LLM.
10.0 LLM (Coordinator) $\rightarrow$ AI Client LLM Re-Planning & New Tool Call LLM reasons: New Plan: Use Agent CampaignIdeaGeneratorAgent. LLM generates tools/call(agent='CampaignIdeaGeneratorAgent', ...)
11.0 AI Client $\rightarrow$ MCP Client Protocol Assignment AI Client delegates the new call to the protocol handler.
12.0 MCP Client $\rightarrow$ MCP Server (Generator Agent) Execution MCP Client sends the MCP JSON-RPC Request to the Generator Agent.
13.0 MCP Server (Generator Agent) $\rightarrow$ MCP Client Agent Output (Raw) Agent returns the proposals: {"campaign_1": "Eco-Champion Loyalty Program", ...}
14.0 MCP Client $\rightarrow$ AI Client Protocol Decode MCP Client decodes the ideas and returns them to the Host.
15.0 AI Client $\rightarrow$ LLM (Coordinator) Aggregation AI Client sends the Campaign Ideas to the LLM.
16.0 LLM (Coordinator) $\rightarrow$ AI Client Final Response Synthesis LLM generates a coherent, natural language response.
17.0 AI Client $\rightarrow$ User Response The final, structured answer is presented to the User.

4.3 Visualizing the Communication Paths

sequenceDiagram
    participant U as User
    participant AC as AI Client (Host)
    participant CL as MCP Client (Protocol Handler)
    participant LLM as LLM (Coordinator/Planner)
    participant MR as MCP Registry
    participant MS_A as MCP Server (Analyzer Agent)
    participant MS_G as MCP Server (Generator Agent)
    participant SL as Specialist LLM (Creative Model)

    title MCP-Based AI Agent Orchestration (Explicit MCP Client)

    %% Initialization (Discovery - Pre-session)
    Note over AC, MR: Pre-session: Capability Discovery
    AC->>MR: tools/list (Initial Discovery)
    MR-->>AC: Tool Schemas
    AC->>LLM: Inject Tool Schemas into LLM Context

    %% Runtime (Execution - In-session)
    U->>AC: 1. User Query
    AC->>LLM: 2. Query + Tool Schemas

    Note over LLM: 3. LLM Planning: Select Agent (SalesAnalyzer)

    LLM->>AC: 4. Tool Call Request (Structured JSON)

    AC->>MR: 5. Runtime Resolution Lookup for Agent Endpoint
    MR-->>AC: 6. Returns Endpoint URL

    AC->>CL: 6.5 Delegate Call Request + Endpoint
    CL->>MS_A: 7. Execute Call (MCP JSON-RPC Request)

    Note over MS_A: Agent Executes Task

    MS_A-->>CL: 8. Agent Output (Raw MCP Response)
    CL-->>AC: 8.5 Decoded Agent Output (Clean Findings)
    AC->>LLM: 9. Aggregation (Analysis Results)

    Note over LLM: 10. LLM Re-Planning: Select Generator Agent

    LLM->>AC: 10. Tool Call Request (Structured JSON)

    AC->>CL: 11. Delegate Call Request + Endpoint (Resolution lookup omitted for brevity)
    CL->>MS_G: 12. Execute Call (MCP JSON-RPC Request)

    Note right of MS_G: Internal Sub-Task Delegation
    MS_G->>SL: Internal Call: creative_gen(findings)
    SL-->>MS_G: Specialized Creative Output

    MS_G-->>CL: 13. Agent Output (Raw MCP Response)
    CL-->>AC: 14. Decoded Agent Output (Campaign Ideas)

    AC->>LLM: 15. Aggregation (Campaign Ideas)

    Note over LLM: 16. Final Response Synthesis

    LLM-->>AC: 17. Final Synthesized Response
    AC-->>U: 18. Final Response Display
Loading

code

sequenceDiagram
    participant U as User
    participant AC as AI Client (Host)
    participant CL as MCP Client (Protocol Handler)
    participant LLM as LLM (Coordinator/Planner)
    participant MR as MCP Registry
    participant MS_A as MCP Server (Analyzer Agent)
    participant MS_G as MCP Server (Generator Agent)
    participant SL as Specialist LLM (Creative Model)

    title MCP-Based AI Agent Orchestration (Explicit MCP Client)

    %% Initialization (Discovery - Pre-session)
    Note over AC, MR: Pre-session: Capability Discovery
    AC->>MR: tools/list (Initial Discovery)
    MR-->>AC: Tool Schemas
    AC->>LLM: Inject Tool Schemas into LLM Context

    %% Runtime (Execution - In-session)
    U->>AC: 1. User Query
    AC->>LLM: 2. Query + Tool Schemas

    Note over LLM: 3. LLM Planning: Select Agent (SalesAnalyzer)

    LLM->>AC: 4. Tool Call Request (Structured JSON)

    AC->>MR: 5. Runtime Resolution Lookup for Agent Endpoint
    MR-->>AC: 6. Returns Endpoint URL

    AC->>CL: 6.5 Delegate Call Request + Endpoint
    CL->>MS_A: 7. Execute Call (MCP JSON-RPC Request)

    Note over MS_A: Agent Executes Task

    MS_A-->>CL: 8. Agent Output (Raw MCP Response)
    CL-->>AC: 8.5 Decoded Agent Output (Clean Findings)
    AC->>LLM: 9. Aggregation (Analysis Results)

    Note over LLM: 10. LLM Re-Planning: Select Generator Agent

    LLM->>AC: 10. Tool Call Request (Structured JSON)

    AC->>CL: 11. Delegate Call Request + Endpoint (Resolution lookup omitted for brevity)
    CL->>MS_G: 12. Execute Call (MCP JSON-RPC Request)

    Note right of MS_G: Internal Sub-Task Delegation
    MS_G->>SL: Internal Call: creative_gen(findings)
    SL-->>MS_G: Specialized Creative Output

    MS_G-->>CL: 13. Agent Output (Raw MCP Response)
    CL-->>AC: 14. Decoded Agent Output (Campaign Ideas)

    AC->>LLM: 15. Aggregation (Campaign Ideas)

    Note over LLM: 16. Final Response Synthesis

    LLM-->>AC: 17. Final Synthesized Response
    AC-->>U: 18. Final Response Display

5. Advanced Agent Paradigms: Towards Autonomy and Statefulness

The initial MCP architecture provides a strong foundation for delegating tasks. The next evolution of the MCP framework moves beyond stateless tool-calling to introduce stateful, autonomous agents that possess memory, self-correction, and collaborative capabilities.

5.1 Agent State and Memory

A stateless agent receives an input, executes a task, and returns an output, forgetting the details immediately after. A stateful agent, however, maintains internal memory—a record of past actions, observations, and intermediate results—allowing it to handle multi-turn, complex goals.

Types of Agent Memory:

  1. Short-Term Memory (Context Buffer): The current session context, primarily managed by the Coordinator LLM and passed to agents as needed.
  2. Long-Term Memory (Knowledge Store): A persistent, external database (e.g., vector database, key-value store) that stores past successes, failures, and learned policies. This allows an agent to improve over time.

Concrete Example: The CodeReviewAgent

Feature Stateless Agent Approach (MCP v1) Stateful Agent Approach (MCP v2)
Interaction User $\rightarrow$ LLM $\rightarrow$ Agent. Runs a single review on one file. Agent remembers the files reviewed, the user's preferred coding style, and past feedback history.
Query "Review file_A.py." "Please review file_B.py. Recall the style guidance I gave you last week regarding docstrings."
Mechanism Agent executes tools/call(review, file_A). Agent executes tools/call(review, file_B, user_id=X) $\rightarrow$ Agent queries its Long-Term Memory for user_id=X's past preferences $\rightarrow$ Agent applies preferences to the review.

5.2 Self-Correction and Reflective Loops

The most significant advancement is the introduction of Self-Correction or Reflective Loops. In the basic MCP model, if an Agent's tool call fails (e.g., a function returns an error), the LLM must handle the failure. Advanced Agents can handle failures internally without reporting back to the LLM immediately.

The Self-Correction Workflow:

  1. Execution: The Agent attempts to execute its sub-task (e.g., generating code).
  2. Observation: The execution environment (e.g., a sandbox) reports an error (e.g., "Syntax Error").
  3. Reflection: The Agent uses an internal reasoning engine (often a smaller, specialized LLM) to analyze the error message and the code it produced.
  4. Re-Planning: The Agent generates a new plan or modified code to fix the error.
  5. Re-Execution: The Agent retries the task up to a set limit.

This process, contained within the Agent, dramatically reduces the load on the Coordinator LLM and improves system reliability.

6. Multi-Agent Systems (MAS) and Specialized Topologies

As tasks become more complex, the architecture evolves from a central LLM commanding individual agents to a Multi-Agent System (MAS) where agents communicate directly to solve the problem collaboratively.

6.1 Direct Agent-to-Agent Communication (A2A)

MCP extensions allow an Agent to perform a nested tools/call request directly to another Agent, bypassing the Coordinator LLM for specific sub-tasks.

Scenario: The SalesDataAnalyzerAgent needs a definitive product description for one of the product IDs it flagged.

  • A2A Workflow: The SalesDataAnalyzerAgent (Agent A) doesn't know the product details. It makes a direct call via its own internal MCP Client module:
    • Agent A $\rightarrow$ Agent B: tools/call(agent='ProductCatalogAgent', method='get_description', args={'product_id': 'ECO500'})
    • Agent B $\rightarrow$ Agent A: Returns the description.
    • Agent A continues its analysis and returns a richer result to the Coordinator LLM.

This is a Mesh Topology, where communication is decentralized and direct, maximizing efficiency for linked sub-tasks.

6.2 Hierarchical Agent Systems

For projects involving clear divisions of labor, a Hierarchical Topology is most effective, mimicking an organizational structure with a Manager and Workers.

Concrete Example: Marketing Strategy Development

In this architecture, a specialized Manager Agent is responsible for the overall outcome and delegates sub-tasks to a team of Worker Agents.

Component Role Delegated Task
Marketing Strategy Manager Agent (Manager) Receives the full user query, breaks it down, manages project state, and synthesizes the final report. Manages the project schedule, collates final inputs.
Copywriter Agent (Worker) Specialized in persuasive language, adhering to brand guidelines. Task: Generate three compelling slogans for the campaign.
Budget Analyst Agent (Worker) Specialized in financial modeling and constraint checking. Task: Calculate the ROI for the proposed loyalty program budget.

Interaction Flow (Post-Analysis):

  1. LLM (Coordinator) makes the initial call to the Marketing Strategy Manager Agent.
  2. Manager Agent splits the task into two parallel sub-tasks.
  3. Worker Agents execute their tasks in parallel.
  4. Worker Agents return results directly to the Manager Agent.
  5. Manager Agent collates the slogans and the ROI, reviews for consistency, and synthesizes the final, complete report to send back to the Coordinator LLM.

7. Advanced Context and Knowledge Management

In complex MAS architectures, managing the context—the data, documents, and external knowledge relevant to the task—is paramount. The latest advancements focus on making Retrieval-Augmented Generation (RAG) more sophisticated and dynamic.

7.1 Multi-Hop and Fusion RAG

Standard RAG systems retrieve a single block of text relevant to the query. Modern systems employ Multi-Hop RAG and Fusion RAG.

  • Multi-Hop RAG: A system that recognizes a query requires information from multiple linked sources in a chain.

    • Query: "What is the typical customer profile for the product that uses the proprietary component K-7?"
    • Step 1: Retrieve document describing "component K-7" to identify the product ID (e.g., ECO500).
    • Step 2: Use the product ID to perform a second retrieval in a different database to find the "typical customer profile."
    • Step 3: The LLM synthesizes the final answer from the two distinct pieces of retrieved context.
  • Fusion RAG: A method that generates multiple parallel sub-queries from the user's original query, retrieves context for each sub-query, and then uses a Reranking mechanism to select only the most relevant passages before injecting them into the LLM's final prompt. This is crucial for filtering out noise and ensuring high-quality context.

7.2 Context Compression and Relevance Filtering

As the complexity of agent interactions increases, the total size of the context (all past turns, agent outputs, and RAG retrievals) can quickly exceed the LLM's Context Window limit.

Context Compression techniques dynamically summarize or prune the context before sending it to the LLM.

MCP's Role in Context Compression: The MCP protocol can include metadata flags that classify agent outputs (e.g., is_critical: true, relevance_score: 0.9). When the Coordinator LLM receives the analysis from the SalesDataAnalyzerAgent, the AI Client (Host) performs the following:

  1. Pruning: It removes the 50 pages of raw sales data from the prompt, only keeping the two key findings (the structured JSON from Step 8.5).
  2. Summarization: It summarizes the multi-turn conversation history into a concise summary.
  3. Injection: Only the compressed history, the key findings, and the next tool schemas are injected for the final synthesis, ensuring the LLM's context window remains manageable and focused.

By integrating these advanced memory, autonomy, and context management paradigms atop the foundational MCP framework, AI systems move from simple tool users to sophisticated, reliable, and truly collaborative digital organizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment