Skip to content

Instantly share code, notes, and snippets.

@haijohn
Created December 7, 2025 13:28
Show Gist options
  • Select an option

  • Save haijohn/39d86067d640dd9cdb2a17bcd83ca9dd to your computer and use it in GitHub Desktop.

Select an option

Save haijohn/39d86067d640dd9cdb2a17bcd83ca9dd to your computer and use it in GitHub Desktop.
Architecture Design: Autonomous API Agent Platform
1. System Overview
The system is a "ReAct" (Reasoning + Acting) Agent platform. It ingests OpenAPI specifications (Swagger) from an Admin, converts them into "Tools," and allows an LLM to orchestrate HTTP requests to fulfill natural language user intents.
high-Level Context Diagram (Mermaid)
Code snippet
graph TD
User((User)) -->|Natural Language| FE[Frontend UI]
Admin((Admin)) -->|OpenAPI Specs / Prompts| FE
FE -->|REST/WebSocket| Gateway[API Gateway / Orchestrator]
subgraph "Agent Backend Core"
Gateway --> Manager[Session Manager]
Manager -->|Get Context| DB[(Database / Vector Store)]
Manager -->|Reasoning| LLM[LLM Service \n(e.g., GPT-4o, Claude 3.5)]
Manager -->|Execute Tool| Executor[API Executor Engine]
Executor -->|Validate & Call| ExtAPI[External REST APIs]
end
ExtAPI -->|JSON Response/Error| Executor
Executor -->|Observation| Manager
Manager -->|Recursion| LLM
2. Core Components
A. Data Model (The "Knowledge")
To make the agent work, we need to map API definitions to a structure the LLM understands.
1. Agent Configuration
System Prompt: The persona (e.g., "You are a DevOps assistant...").
Tool Registry: A collection of available API endpoints.
2. Tool Definition (The Schema) Instead of raw code, we store tools as structured JSON. This is often derived from parsing an uploaded OpenAPI (Swagger) file.
JSON
{
"tool_name": "create_user",
"description": "Creates a new user in the system. Use this when the user asks to sign someone up.",
"method": "POST",
"url_template": "https://api.example.com/users",
"parameters_schema": {
"type": "object",
"properties": {
"username": {"type": "string", "description": "The desired login name"},
"role": {"type": "string", "enum": ["admin", "user"]}
},
"required": ["username"]
},
"auth_config": { "type": "bearer", "token_env_key": "API_KEY_1" }
}
B. The Orchestrator (The "Brain")
This is the application logic that manages the conversation loop. It typically runs a State Machine.
State 1: Context Assembly: Fetch conversation history + relevant Tools.
State 2: Reasoning (LLM Call): Send User Input + Tools to LLM.
State 3: Routing:
If LLM returns text -> Return to User.
If LLM returns tool_call -> Go to Executor.
State 4: Execution: Run the API call.
State 5: Observation: Append the API result (success OR error) to the chat history. Return to State 2.
C. The Executor Engine
This component is a "dumb" HTTP client with safety rails.
Input: URL, Method, Headers, JSON Body (provided by LLM).
Logic:
Inject Authentication headers (from secure storage).
Execute HTTP Request.
Sanitization: Truncate massive JSON responses (to save tokens) before sending back to LLM.
Error Handling: Catch 4xx/5xx errors and return them as text to the Orchestrator, not as exceptions. This allows the LLM to read the error and try again.
3. Detailed Execution Flow (Sequence Diagram)
This diagram illustrates "Chaining" and "Error Recovery."
Scenario: User says "Find the user 'Bob' and delete him."
Requirement: Agent must GET /users?name=Bob, extract the ID, then DELETE /users/{ID}.
Code snippet
sequenceDiagram
participant U as User
participant O as Orchestrator
participant L as LLM (Brain)
participant E as API Executor
participant X as External API
U->>O: "Find Bob and delete him"
loop Reasoning Loop
O->>L: Prompt: History + Tools + "Find Bob and delete him"
L->>O: Response: Call Tool `search_users({name: 'Bob'})`
O->>E: Execute `GET /users?name=Bob`
E->>X: HTTP GET Request
X->>E: 200 OK `[{id: 101, name: "Bob"}]`
E->>O: Observation: `Found: [{id: 101, name: "Bob"}]`
Note right of O: Loop continues (Chaining)
O->>L: Prompt: History + Observation + "What next?"
L->>O: Response: Call Tool `delete_user({id: 101})`
O->>E: Execute `DELETE /users/101`
E->>X: HTTP DELETE
X->>E: 403 Forbidden (Simulated Error)
E->>O: Observation: `Error: 403 Forbidden`
Note right of O: Loop continues (Error Handling)
O->>L: Prompt: History + "Error: 403" + "What next?"
L->>O: Response: Text "I found Bob (ID 101), but I don't have permission to delete him."
end
O->>U: Final Response
4. Implementation Guide for Engineers
Phase 1: The "Tool" Parser
You need a service that ingests standard definitions.
Input: URL to swagger.json or openapi.yaml.
Action: Parse the file. For every path/method combination, generate a JSON Schema description.
Storage: Save these schemas in your database linked to the specific Agent.
Phase 2: The Loop Logic (Pseudo-Code)
This is the core logic your backend engineer needs to write.
Python
MAX_ITERATIONS = 5
def run_agent_loop(user_input, chat_history, available_tools):
messages = chat_history + [{"role": "user", "content": user_input}]
for _ in range(MAX_ITERATIONS):
# 1. Ask LLM what to do
response = llm.chat_completion(
messages=messages,
tools=available_tools # Function calling definitions
)
message = response.choices[0].message
messages.append(message) # Update history state
# 2. Check if LLM wants to run a tool
if message.tool_calls:
for tool_call in message.tool_calls:
# 3. Decode arguments (e.g. JSON string to dict)
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
# 4. EXECUTE API (The "Acting" part)
try:
api_result = http_client.request(func_name, args)
except Exception as e:
api_result = f"Error executing request: {str(e)}"
# 5. Feed result back to LLM (The "Observation")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(api_result)
})
else:
# LLM provided a final text answer
return message.content
return "Error: Agent got stuck in a loop."
Phase 3: Error Handling Strategy
The agent must treat errors as data, not exceptions.
Network Level: If the external API is down (Connection Refused), return a system message: "System Error: API Unreachable". The LLM might say "I can't connect right now."
Application Level: If the API returns 400 Bad Request, return the body: {"error": "Invalid ID format"}. The LLM will read this and can self-correct: "Ah, I used the wrong ID format, let me try again..."
5. Security & Safety (Critical)
Human-in-the-Loop (HITL): For dangerous actions (POST/DELETE), the architecture should support a "Approval" state.
LLM: "I want to delete user 101."
System: Pauses. Sends UI prompt to User: "Agent wants to DELETE user 101. Allow?"
User: Clicks "Yes".
System: Resumes loop.
Output Sanitization: APIs can return massive JSON blobs (1MB+). This will crash your LLM context window. The Executor must summarize or truncate data (e.g., "Response too long, first 5 items: [...]").
Authentication Storage: Never store API keys in the prompt. Store them in a secure Vault (AWS Secrets Manager / HashiCorp Vault) and inject them inside the Executor code only at the moment of the request.
6. Recommended Tech Stack
LLM Model: OpenAI GPT-4o or Anthropic Claude 3.5 Sonnet (Claude is excellent at tool use and coding).
Backend: Python (FastAPI/LangGraph) or TypeScript (Node.js/LangChain.js).
Orchestration Framework:
LangGraph (Python): Highly recommended for this specific state-machine architecture. It handles the looping and state management natively.
Temporal.io: If you need high reliability for long-running agent tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment