Skip to content

Instantly share code, notes, and snippets.

@nityeshaga
Last active November 16, 2025 02:57
Show Gist options
  • Select an option

  • Save nityeshaga/89ed3d98de9f770a6d872bc0e598589f to your computer and use it in GitHub Desktop.

Select an option

Save nityeshaga/89ed3d98de9f770a6d872bc0e598589f to your computer and use it in GitHub Desktop.
Field guide for communicating with SOTA AI models in 2025 • The Genius Intern Framework

Writing Great AI Prompts: The Genius Intern Framework

Imagine This

A genius alien with PhDs in every field just landed on Earth and is now your intern. They can figure out anything... but they have NO idea what "good" looks like in your business context, what your priorities are, or which of the infinite possible approaches you'd prefer. And they're cursed to execute whatever you ask, even if your instructions are vague.

This framework is your guide to communicating with that genius alien intern.


The Core Insight

When you prompt an AI, you're essentially communicating with a genius intern who:

  • Can figure things out but can't read your mind
  • Needs to know what success looks like for YOU specifically
  • Benefits from clear boundaries without being micromanaged
  • Learns from examples that show both what to do and what not to do

Most prompts fail because they're either:

  • Over-specified: Mechanically prescriptive, treating the AI like it can't think
  • Under-specified: Vague and ambiguous, forcing the AI to hedge or guess

The key is matching your prompt's complexity to the judgment required.


The Three Types of Prompts

Every prompt falls into one of three categories, and each requires a different approach:

1. Do This (Task Prompts)

Execute this specific thing now. Single-use, context-specific requests.

Examples:

  • "Write a competitive analysis for our board meeting"
  • "Debug this code and fix the authentication bug"
  • "Create a presentation about Q3 results"

Focus: Execution quality for this specific instance

2. Know How To Do This (Capability Prompts)

Learn how to use this tool, system, or workflow for repeated future use.

Examples:

  • Tool descriptions (TodoWrite, AskUserQuestion)
  • Workflow guides ("How to create presentations")
  • System documentation ("How to use our deployment system")

Focus: Teaching a reusable capability, not executing right now

3. Learn This Domain First, Then Do This (Learning Journey Prompts)

Acquire domain knowledge before execution. Used when the concept is outside the AI's training data or requires deep contextual understanding.

Examples:

  • "Learn our MCP protocol, then build an MCP server"
  • "Learn our company's data schema, then write integration code"
  • "Learn this medical coding system, then process these claims"

Focus: Knowledge acquisition → application, with explicit learning phases

When to use Learning Journey structure:

  • The domain/concept is NOT in the AI's training data (new frameworks, emerging technologies)
  • Company-specific knowledge (internal tools, proprietary systems, custom schemas)
  • Specialized domain knowledge requiring deep understanding (legal frameworks, medical protocols)

Key Difference: Humans vs. LLMs

While this framework treats AI like a smart human collaborator, there's one critical difference: AIs benefit from structural markup that would annoy humans.

Use XML-style tags to create clear boundaries:

  • <example> and </example> for examples
  • <reasoning> for explaining why something is good/bad
  • <context>, <constraints>, <common_mistakes> to delineate sections
  • Nested tags for hierarchical information

This markup helps AIs parse what's a rule vs. an example vs. meta-commentary. Use it judiciously when structure matters.


Universal Principles

These principles apply to ALL three types of prompts:

Scale Complexity to Judgment Required

Not every prompt needs the full framework.

Simple tasks (self-explanatory, one obvious approach, hard to mess up) need simple instructions: purpose + use cases + constraints.

Complex tasks (multiple valid approaches, many decision points, well-known failure modes) need the full treatment: examples with reasoning, when-to/when-not, common mistakes.

Most tasks fall somewhere in between. Use your judgment.

The test: Would a smart human need this level of detail, or would they find it condescending?

Be Decisively Opinionated

The worst prompts hedge when they should be decisive:

  • ❌ "Maybe make it more concise... if you think that's good?"
  • ✅ "Keep this under 500 words - brevity is critical here"

AIs are good at inference, but they're even better when you just tell them your preferences. It's not condescending - it's decisive.

Show Examples with Reasoning

Don't just show examples - explain WHY they're good or bad.

[Good competitive analysis]

What makes this excellent:

  • Every claim is quantified (not "they're growing fast" but "40% YoY growth")
  • Uses primary sources (earnings calls, not secondary speculation)
  • Clear narrative arc: market → competitors → our opportunity
  • Bold recommendation in first paragraph
[Weak competitive analysis]

What's wrong with this:

  • Vague claims without data ("seems promising")
  • No clear point of view or recommendation
  • Lists features without strategic insight
  • Too long for executive consumption
Showing BOTH positive and negative examples with explicit reasoning teaches the quality model, not just patterns to match.

Provide Clear Decision Frameworks

Remove ambiguity by being explicit about triggers and anti-patterns.

**When to use this approach:** - When analyzing competitors with >$10M revenue - When we need quantitative data for board decisions - When market positioning is unclear

When NOT to use this approach:

  • For startups with <1 year of data
  • When we just need directional trends
  • For internal-only products with no competitors
Explicit boundaries prevent hedging. The AI isn't guessing whether depth matters - you've told them exactly when it does and doesn't.

Distributed Decision Frameworks

For complex prompts with multiple components, embed decision criteria where they're relevant rather than all upfront.

Instead of: One big "When to use scripts vs. references vs. assets" section at the top

Do this:

### Scripts
When to include: When the same code is being rewritten repeatedly
Examples: rotate_pdf.py for PDF tasks
Benefits: Token efficient, deterministic

### References
When to include: For documentation to reference while working
Examples: schema.md for database schemas
Benefits: Keeps main instructions lean

Decision criteria appears exactly when thinking about that component.

Proactively Address Common Mistakes

Flag failure modes upfront to prevent predictable errors.

<common_mistakes>

  • Don't just list competitor features - analyze strategic positioning
  • Don't use secondary sources as primary - go to earnings calls and company blogs
  • Don't hedge your recommendation - be decisive
  • Don't make it longer than 3 pages - executives won't read it </common_mistakes>

Use Visual Structure for Complex Anatomy

When explaining complex structures (file hierarchies, data schemas, workflows), show the structure visually before explaining it.

Good pattern:

project/
├── src/
│   ├── components/
│   └── utils/
└── tests/

The src/ directory contains...

Why it works: The visual diagram acts as a mental anchor for the detailed explanation that follows.

Use this for: File structures, data relationships, process flowcharts, API hierarchies, component architectures

Questions as Teaching Tools

Questions in prompts serve two purposes:

1. Gathering information (standard) "What's the target audience for this report?"

2. Teaching judgment (advanced) "To understand what resources would help, ask yourself:

  • What code gets rewritten repeatedly?
  • What documentation needs frequent reference?
  • What templates would save time?"

The second type models the thinking process to internalize. It's not asking the user - it's teaching the AI how to think.

Important: When using questions to gather info, avoid overwhelming by limiting scope:

  • Ask 1-3 critical questions per message
  • Start with the most important questions
  • Follow up for details only after the basics are clear
  • Bundle related questions together

Type-Specific Patterns

1. Do This (Task Prompts)

For single-execution, context-specific requests.

Structure: Concept Before Mechanics

Build conceptual understanding before getting into details:

Purpose & Context (Why does this exist?)

Tell the AI what problem they're solving and why it matters.

Good: "I'm presenting to our board tomorrow to convince them we should invest in feature X. I need a persuasive analysis that shows market opportunity and competitive gaps."

Why it works: Now the AI knows it's writing for decision-makers, it needs to be persuasive (not just informative), and should emphasize opportunity/gaps.

Success Criteria (How will you know it's good?)

What must be true for this to be excellent?

Good: "Must include 3+ quantitative metrics per competitor. Must make a clear recommendation. Should be scannable in 5 minutes by executives who won't read every word."

Why it works: Clear quality bar (3+ metrics), deliverable (clear recommendation), and audience constraint (executive scanning pattern).

Examples with Reasoning

Show what good looks like AND explain what makes it good. (See Universal Principles above)

Common Mistakes to Avoid

Proactively address failure modes. (See Universal Principles above)

Constraints & Non-Negotiables (The hard boundaries)

What MUST be true? What's off-limits?

Good:

  • Must be under 3 pages
  • Must use only public data (no proprietary sources)
  • Must avoid mentioning our unannounced product roadmap
  • Don't spend more than 5 tool calls on research

Why it works: Clear boundaries. No guessing about length, data sources, confidentiality, or scope of effort.

Format & Mechanics (How should I deliver this?)

Only now do we get to mechanical details.

Good:

  • Google Doc (not PDF)
  • Executive summary at top
  • Data tables in appendix
  • Use our standard template: [link]

Handling Ambiguity in Task Prompts

Sometimes the AI genuinely needs more information. The best prompts acknowledge this:

Good approach: "If anything's unclear, ask me - but try to bundle your questions so we can keep this efficient."

Even better: Anticipate the ambiguity upfront:

  • "You might wonder whether to use REST or GraphQL - use REST, it's what the rest of our stack uses."
  • "If you're unsure about the tone, aim for professional but not stuffy - think Bloomberg, not academic paper."

2. Know How To Do This (Capability Prompts)

For teaching reusable tools, systems, or workflows.

Core Structure

Purpose (One clear sentence)

What does this capability enable?

Example: "Use this tool when you need to ask the user questions during execution."

When To Use / When NOT To Use (Critical for capabilities)

Capabilities need clear boundaries because they'll be used repeatedly in different contexts.

Example from TodoWrite:

When to Use:

  • Complex multi-step tasks (3+ distinct steps)
  • Non-trivial tasks requiring careful planning
  • User explicitly requests todo list
  • User provides multiple tasks
  • After receiving new instructions

When NOT to Use:

  • Single, straightforward task
  • Trivial task providing no organizational benefit
  • Task completable in <3 steps
  • Purely conversational or informational

Why this matters: Without clear boundaries, the AI will either overuse or underuse the capability.

Examples with Reasoning

For capabilities, examples should show BOTH usage patterns AND the reasoning behind decisions.

User: "I want to add a dark mode toggle to the application settings. Make sure you run the tests and build when you're done!"

AI: I'll help add a dark mode toggle to your application settings. Let me create a todo list to track this implementation.

Creates todo list with items:

  1. Creating dark mode toggle component in Settings page
  2. Adding dark mode state management
  3. Implementing CSS styles for dark theme
  4. Updating existing components to support theme switching
  5. Running tests and build process

Begins working on the first task

The AI used the todo list because: 1. Adding dark mode is a multi-step feature requiring UI, state management, and styling changes 2. The user explicitly requested tests and build be run afterward 3. The AI inferred that tests and build need to pass by adding it as the final task User: "Can you add a comment to the calculateTotal function?"

AI: Sure, let me add a comment to the calculateTotal function.

Uses Edit tool to add comment

The AI did NOT use the todo list because: - This is a single, straightforward task - Adding a comment doesn't require tracking multiple steps - The todo list would provide no organizational benefit

Common Mistakes to Avoid

For capabilities, include specific anti-patterns that have been observed:

<common_mistakes>

  • Don't mark tasks as completed when tests are failing
  • Don't have multiple tasks in_progress at once
  • Don't create todo lists for trivial single-step tasks
  • Don't forget to update task status in real-time </common_mistakes>

Mechanical Details

Only after conceptual understanding should you include technical specifications:

  • Input/output formats
  • State management rules
  • Technical constraints
  • API schemas

3. Learn This Domain First, Then Do This (Learning Journey Prompts)

For knowledge acquisition before execution, particularly when concepts are outside the AI's training data or require deep understanding.

When to Use Learning Journey Structure

Use this when:

  • The domain/concept is NOT in the AI's training data (e.g., new frameworks like MCP, emerging technologies)
  • Company-specific knowledge (internal tools, proprietary systems, custom schemas)
  • Specialized domain requiring deep understanding (legal frameworks, medical protocols, complex APIs)

Core Structure: Five-Phase Pattern

Phase 1: Foundation Learning

Build conceptual understanding and mental models.

Structure:

  • Core concepts and principles
  • Why things work the way they do
  • High-level mental models
  • Load foundational documentation

Example from MCP-Builder:

### Phase 1: Deep Research and Planning

#### 1.1 Understand Agent-Centric Design Principles

Before diving into implementation, understand how to design tools for AI agents:

**Build for Workflows, Not Just API Endpoints:**
- Don't simply wrap existing API endpoints - build thoughtful workflow tools
- Consolidate related operations
- Focus on tools that enable complete tasks

**Optimize for Limited Context:**
- Agents have constrained context windows
- Return high-signal information, not data dumps
- Provide "concise" vs "detailed" response formats

[Additional principles...]

Phase 2: Detailed Study

Load specific technical documentation and details.

Structure:

  • API documentation
  • Technical specifications
  • Edge cases and constraints
  • Best practices

Key Pattern - Progressive Context Loading: Don't load everything upfront. Use explicit instructions for just-in-time knowledge delivery:

**Load and read the following reference files:**
- [📋 MCP Best Practices](./reference/mcp_best_practices.md)
- For Python: Use WebFetch to load `https://github.com/.../python-sdk/README.md`
- For TypeScript: Use WebFetch to load `https://github.com/.../typescript-sdk/README.md`

Visual Aids for Scanning: Use functional emojis as a visual language:

  • 🚀 = workflow/process
  • 🐍 = Python-specific
  • ⚡ = TypeScript-specific
  • ✅ = evaluation/testing
  • 📋 = documentation

This enables quick scanning to find relevant sections.

Phase 3: Synthesis and Planning

Apply learned knowledge to create an approach.

Structure:

  • Create comprehensive implementation plan
  • Identify tools/resources needed
  • Design architecture based on learned principles
  • Anticipate challenges

Example:

#### 1.6 Create a Comprehensive Implementation Plan

Based on your research, create a detailed plan:

**Tool Selection:**
- List the most valuable endpoints/operations to implement
- Prioritize tools for most common use cases
- Consider which tools work together

**Shared Utilities:**
- Identify common API request patterns
- Plan pagination helpers
- Design filtering and formatting utilities

[Additional planning sections...]

Phase 4: Execution

Implement using learned patterns and principles.

Structure:

  • Step-by-step implementation guide
  • Reference back to learned principles
  • Use branching for different paths (Python vs. TypeScript)

Key Pattern - Explicit Knowledge Gates: Stop execution at points where additional context is needed:

#### 2.4 Follow Language-Specific Best Practices

**At this point, load the appropriate language guide:**

For Python: Load [🐍 Python Implementation Guide](./reference/python_guide.md)
For TypeScript: Load [⚡ TypeScript Implementation Guide](./reference/ts_guide.md)

Pattern - Escape Hatches: Give explicit permission to skip steps with clear conditions:

### Step 3: Initialize the Project

Skip this step only if the project already exists and you're iterating on it.
This trusts the AI to recognize when a step doesn't apply, but provides the condition to check.

Phase 5: Validation

Check implementation against learned standards.

Structure:

  • Quality checklist
  • Evaluation creation
  • Testing against learned principles

Key Pattern - Deferred Detail: Keep main document lean by referencing detailed checklists:

#### 3.3 Use Quality Checklist

To verify implementation quality, load the appropriate checklist:
- Python: see "Quality Checklist" in [🐍 Python Guide](./reference/python_guide.md)
- TypeScript: see "Quality Checklist" in [⚡ TypeScript Guide](./reference/ts_guide.md)

Resource Orchestration Patterns

Learning Journey prompts require explicit management of external resources.

Resource Map

Provide a consolidated view of ALL resources:

# Reference Files

## 📚 Documentation Library

Load these resources as needed during development:

### Core Documentation (Load First)
- **MCP Protocol**: Fetch from `https://protocol-url.com` - Complete specification
- [📋 Best Practices](./reference/best_practices.md) - Universal guidelines

### SDK Documentation (Load During Phase 1/2)
- **Python SDK**: Fetch from `https://github.com/python-sdk/README.md`
- **TypeScript SDK**: Fetch from `https://github.com/ts-sdk/README.md`

### Implementation Guides (Load During Phase 2)
- [🐍 Python Guide](./reference/python.md) - Complete Python guide with examples
- [⚡ TypeScript Guide](./reference/ts.md) - Complete TypeScript guide

### Evaluation Guide (Load During Phase 4)
- [✅ Evaluation Guide](./reference/eval.md) - Testing and validation

Progressive Context Loading

Explicitly state WHEN to load each resource:

  • Phase 1: Core concepts and principles
  • Phase 2: "At this point, load..." technical documentation
  • Phase 3: Planning resources
  • Phase 4: "Now load..." implementation guides
  • Phase 5: "Finally, load..." evaluation resources

This respects context window constraints by loading knowledge just-in-time.

Critical Warnings

For learning journeys, include explicit warnings about common failure modes:

**Important:** MCP servers are long-running processes that wait for requests. 
Running them directly will cause your process to hang indefinitely.

Safe ways to test:
- Use the evaluation harness (recommended)
- Run server in tmux to keep it outside main process
- Use a timeout: `timeout 5s python server.py`

Anti-Patterns (What Actively Hurts)

These patterns make prompts worse across all three types:

Explaining Basic Concepts

❌ "A function in programming is a reusable block of code that..." ✅ Just ask for the function - the AI knows what functions are

Apologizing or Hedging

❌ "If it's not too much trouble, maybe you could possibly..." ✅ "Write a function that..." - just tell them what you want

Excessive Politeness

❌ "Please, if you don't mind, could you kindly..." ✅ "Write..." - be direct, AIs aren't offended by clarity

Listing Every Edge Case

❌ "Consider the case where X is null, or Y is undefined, or Z is an empty array, or..." ✅ "Handle standard edge cases" - trust the AI to think through variations

Motivational Statements

❌ "Try your best!" "Do a great job!" "Be creative!" ✅ Just describe what good looks like - the AI is always trying its best

Over-Specification of Process

❌ "First, open the file. Then, read line 1. Then, read line 2..." ✅ "Extract the email addresses from this file" - the AI will figure out how

These patterns add noise without adding information. They make prompts longer and fuzzier without improving outputs. Be direct and trust the AI to handle the basics.

The Real Test

After writing a prompt, ask yourself:

"Could a smart person execute this without asking me 10 questions OR robotically following steps that don't make sense?"

If yes, you've hit the sweet spot.


Examples from Anthropic

From Anthropic's guidance on Claude Code: "Claude can infer intent, but it can't read minds. Specificity leads to better alignment with expectations."

Poor vs Good Examples

Poor Good Why It's Better
add tests for foo.py write a new test case for foo.py, covering the edge case where the user is logged out. avoid mocks Specifies the exact edge case and a constraint (no mocks)
why does ExecutionFactory have such a weird api? look through ExecutionFactory's git history and summarize how its api came to be Gives a concrete approach and frames it as understanding evolution, not judgment
add a calendar widget look at how existing widgets are implemented on the home page to understand the patterns and specifically how code and interfaces are separated out. HotDogWidget.php is a good example to start with. then, follow the pattern to implement a new calendar widget that lets the user select a month and paginate forwards/backwards to pick a year. Build from scratch without libraries other than the ones already used in the codebase. Provides context (understand patterns), points to examples (HotDogWidget.php), specifies requirements (month selection + pagination), and sets constraints (no new libraries)

Summary: Quick Reference

Three Types

  1. Do This - Execute this specific thing now
  2. Know How To Do This - Learn this capability for future use
  3. Learn This Domain First - Acquire domain knowledge before execution

Universal Principles

  • Scale complexity to judgment required
  • Be decisively opinionated
  • Show examples with reasoning
  • Provide clear decision frameworks
  • Address common mistakes proactively
  • Use visual structure for complex anatomy
  • Use questions to teach judgment

Type-Specific Patterns

Task Prompts: Purpose → Success Criteria → Examples → Common Mistakes → Constraints → Format

Capability Prompts: Purpose → When to Use/Not Use → Examples with Reasoning → Common Mistakes → Mechanics

Learning Journey Prompts: Foundation → Detailed Study → Synthesis → Execution → Validation

  • Progressive context loading
  • Resource orchestration
  • Explicit knowledge gates
  • Escape hatches

The Test

Could someone smart execute this without 10 questions OR robotic step-following?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment