Skip to content

Instantly share code, notes, and snippets.

@jmanhype
Created November 9, 2025 16:06
Show Gist options
  • Select an option

  • Save jmanhype/7dc6e831afba044d7577a9a62f375efc to your computer and use it in GitHub Desktop.

Select an option

Save jmanhype/7dc6e831afba044d7577a9a62f375efc to your computer and use it in GitHub Desktop.
3-Layer Evaluation Stack for Claude Skills - Complete system for building, deploying, and measuring skills using real captured traffic

3-Layer Evaluation Stack for Skills

A complete system for building, deploying, and measuring Claude skills using real captured traffic.

Overview

This stack enables evaluation-driven development for Claude skills:

  1. Build skills from documentation (Skill Seeker)
  2. Expose skills via MCP middleware (Director)
  3. Capture real Claude Code API traffic (CC Trace)
  4. Analyze metrics to improve skills

Result: Data-driven skill optimization instead of guesswork.


Architecture

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Skill Seeker (Documentation → Skills)              │
│ - Scrapes any documentation website                          │
│ - Generates SKILL.md + reference files                       │
│ - Packages into .zip for Claude                              │
│ Location: /Users/speed/straughter/Skill_Seekers              │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Director (MCP Middleware)                          │
│ - Aggregates multiple MCP servers                           │
│ - Exposes unified interface to Claude                       │
│ - Running: http://localhost:3673/full-loop-demo/mcp         │
│ Location: /Users/speed/straughter/Director                  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: CC Trace (Traffic Capture)                         │
│ - mitmproxy on port 8080                                     │
│ - Captures all Claude Code API traffic                      │
│ - Stores in ~/claude-flows.mitm                             │
│ Location: /Users/speed/straughter/cc-trace                  │
└─────────────────────────────────────────────────────────────┘

Components

Layer 1: Skill Seeker

Purpose: Convert any documentation into Claude skills

Key Features:

  • Single-file scraper (doc_scraper.py)
  • llms.txt support (10x faster)
  • AI-powered enhancement
  • MCP integration (9 tools)

Example Usage:

# Scrape docs and build skill
cd /Users/speed/straughter/Skill_Seekers
python3 cli/doc_scraper.py --config configs/obsidian-plugin.json

# Package skill
python3 cli/package_skill.py output/obsidian-plugin/

# Upload to Claude
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) \
python3 cli/upload_skill.py output/obsidian-plugin.zip

Output: obsidian-plugin.zip (6.2 KB) ready for Claude


Layer 2: Director

Purpose: MCP middleware aggregating multiple servers

Key Features:

  • Proxy pattern for MCP servers
  • HTTP Streamable transport
  • TRPC API for management
  • Studio web interface

Running Instance:

# Check status
curl http://localhost:3673/full-loop-demo/mcp

# Expected output:
{
  "name": "full-loop-demo",
  "servers": {
    "skill-seeker": {
      "status": "CONNECTED",
      "lastHeartbeat": "2025-11-09T14:51:32.338Z",
      "tools": 11
    }
  }
}

MCP Servers Connected:

  • skill-seeker: 11 tools (generate_config, scrape_docs, package_skill, etc.)

Layer 3: CC Trace

Purpose: Capture Claude Code API traffic for analysis

Key Features:

  • mitmproxy for HTTPS interception
  • Binary flow format
  • Web UI on port 8081
  • Metric extraction scripts

Setup:

# Start mitmproxy
cd /Users/speed/straughter/cc-trace
mitmdump --listen-port 8080 -w ~/claude-flows.mitm --set confdir=~/.mitmproxy &

# Start web UI
mitmweb --web-port 8081 -w ~/claude-flows.mitm --set confdir=~/.mitmproxy &

# Launch Claude Code through proxy
proxy_claude &

Analysis:

# Check capture size
ls -lh ~/claude-flows.mitm

# Count API requests
mitmdump -nr ~/claude-flows.mitm 2>&1 | grep "POST.*messages" | wc -l

# Extract metrics
mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

Key Files & Locations

Configuration

~/.anthropic_api_key           # Persistent API key (600 permissions)
~/.claude/shell-snapshots/     # proxy_claude function
~/.mitmproxy/                  # mitmproxy certificates

Repositories

/Users/speed/straughter/Skill_Seekers/   # Layer 1: Skill builder
/Users/speed/straughter/Director/        # Layer 2: MCP middleware
/Users/speed/straughter/cc-trace/        # Layer 3: Traffic capture

Data

~/claude-flows.mitm                      # Captured traffic (binary)
/Users/speed/straughter/Skill_Seekers/output/  # Generated skills

Workflows

1. Create New Skill

cd /Users/speed/straughter/Skill_Seekers

# Option A: Use preset config
python3 cli/doc_scraper.py --config configs/react.json

# Option B: Interactive mode
python3 cli/doc_scraper.py --interactive

# Package and upload
python3 cli/package_skill.py output/react/
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) \
python3 cli/upload_skill.py output/react.zip

2. Capture Traffic

# Launch Claude Code through proxy
proxy_claude &

# Use Claude Code normally...

# Analyze captured traffic
ls -lh ~/claude-flows.mitm
mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

3. MCP Integration

# In Claude Code, use natural language:
"List all available configs"
"Generate config for Tailwind at https://tailwindcss.com/docs"
"Package skill at output/react/"
"Upload skill output/react.zip"

Traffic Analysis

Metrics Extractable

  1. Token Usage

    • Input tokens per request
    • Output tokens per request
    • Cache creation/read tokens
  2. Skill References

    • System prompt inclusion
    • Reference file usage
    • Context size with/without skill
  3. API Patterns

    • Request frequency
    • Response times
    • Error rates
  4. Background Traffic

    • MCP server connections
    • Package manager lookups
    • Telemetry calls

Example Analysis Script

#!/usr/bin/env python3
"""Extract metrics from mitmproxy capture."""
import json

def response(flow):
    if 'api.anthropic.com/v1/messages' in flow.request.pretty_url:
        # Parse request
        req = json.loads(flow.request.content.decode('utf-8'))
        print(f"Model: {req.get('model')}")
        print(f"Max tokens: {req.get('max_tokens')}")
        
        # Parse response
        resp = json.loads(flow.response.content.decode('utf-8'))
        usage = resp.get('usage', {})
        print(f"Input tokens: {usage.get('input_tokens')}")
        print(f"Output tokens: {usage.get('output_tokens')}")

Save as /tmp/extract_metrics.py and run:

mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

Current Status

Layer 1: Skill Seeker ✅

  • Status: Operational
  • Skills Built: obsidian-plugin (6.2 KB)
  • MCP Server: Connected to Director
  • Tools Exposed: 11

Layer 2: Director ✅

  • Status: Running (PID unknown)
  • Port: 3673
  • Playbook: full-loop-demo
  • Connected Servers: skill-seeker

Layer 3: CC Trace ✅

  • Status: Running
  • mitmproxy: PID 12745, port 8080
  • mitmweb: port 8081
  • Capture File: ~/claude-flows.mitm (359 KB)
  • Requests Captured: 2 API calls (health checks)

Troubleshooting

API Key Not Found

# Save persistently
echo "sk-ant-api03-..." > ~/.anthropic_api_key
chmod 600 ~/.anthropic_api_key

# Use in commands
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) python3 cli/upload_skill.py skill.zip

No Traffic Captured

# Check proxy is running
lsof -i :8080

# Check environment variables set by proxy_claude
echo $HTTP_PROXY
echo $HTTPS_PROXY

# Verify Claude Code launched through proxy_claude (not direct launch)

Director Not Responding

# Check if running
curl http://localhost:3673/full-loop-demo/mcp

# Restart if needed
cd /Users/speed/straughter/Director
# (restart command here - depends on how it was started)

MCP Server Disconnected

# Check server status via Director API
curl http://localhost:3673/full-loop-demo/mcp

# Look for "status": "CONNECTED" and recent lastHeartbeat

API Key Management

Storage: ~/.anthropic_api_key (600 permissions)

Key: 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

Usage:

# Load and use
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) command

# Never use export (loses across sessions)
# Always load from file

Performance Metrics

Skill Creation

  • Scraping: 15-45 minutes (first time)
  • Building: 1-3 minutes (cached data)
  • Enhancement: 30-60 seconds (local)
  • Packaging: 5-10 seconds
  • Upload: 2-3 seconds

Traffic Capture

  • Overhead: Minimal (<5% latency)
  • Storage: ~10-50 KB per request
  • Analysis: Real-time or batch

MCP Integration

  • Tool Response: <1 second
  • Connection: WebSocket (persistent)
  • Heartbeat: Every 30 seconds

Next Steps

Immediate

  1. Complete traffic capture testing with actual prompts
  2. Extract token usage metrics
  3. Compare skill vs no-skill responses

Short Term

  1. Automate A/B testing (with/without skills)
  2. Build metrics dashboard
  3. Create cost-benefit analysis

Long Term

  1. Multi-skill evaluation
  2. Automated skill optimization
  3. Continuous improvement pipeline

Resources

Documentation

  • Skill Seeker: /Users/speed/straughter/Skill_Seekers/README.md
  • Director: /Users/speed/straughter/Director/CLAUDE.md
  • CC Trace: /Users/speed/straughter/cc-trace/README.md

Phase 2 Docs

  • PHASE_2_EXECUTION_LOG.md (9.7 KB) - Complete technical log
  • PHASE_2_FINDINGS.md (9.3 KB) - Results and analysis
  • PHASE_2_INSTRUCTIONS.md (5.2 KB) - Step-by-step guide
  • PHASE_2_QUICK_REFERENCE.md (4.6 KB) - Quick reference

Tools


License & Attribution

This system combines:

Built by: straughter + Claude Code Date: November 9, 2025 Purpose: Evaluation-driven skill development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment