3-Layer Evaluation Stack for Skills

A complete system for building, deploying, and measuring Claude skills using real captured traffic.

Overview

This stack enables evaluation-driven development for Claude skills:

Build skills from documentation (Skill Seeker)
Expose skills via MCP middleware (Director)
Capture real Claude Code API traffic (CC Trace)
Analyze metrics to improve skills

Result: Data-driven skill optimization instead of guesswork.

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Skill Seeker (Documentation → Skills)              │
│ - Scrapes any documentation website                          │
│ - Generates SKILL.md + reference files                       │
│ - Packages into .zip for Claude                              │
│ Location: /Users/speed/straughter/Skill_Seekers              │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Director (MCP Middleware)                          │
│ - Aggregates multiple MCP servers                           │
│ - Exposes unified interface to Claude                       │
│ - Running: http://localhost:3673/full-loop-demo/mcp         │
│ Location: /Users/speed/straughter/Director                  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: CC Trace (Traffic Capture)                         │
│ - mitmproxy on port 8080                                     │
│ - Captures all Claude Code API traffic                      │
│ - Stores in ~/claude-flows.mitm                             │
│ Location: /Users/speed/straughter/cc-trace                  │
└─────────────────────────────────────────────────────────────┘

Components

Layer 1: Skill Seeker

Purpose: Convert any documentation into Claude skills

Key Features:

Single-file scraper (doc_scraper.py)
llms.txt support (10x faster)
AI-powered enhancement
MCP integration (9 tools)

Example Usage:

# Scrape docs and build skill
cd /Users/speed/straughter/Skill_Seekers
python3 cli/doc_scraper.py --config configs/obsidian-plugin.json

# Package skill
python3 cli/package_skill.py output/obsidian-plugin/

# Upload to Claude
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) \
python3 cli/upload_skill.py output/obsidian-plugin.zip

Output: obsidian-plugin.zip (6.2 KB) ready for Claude

Layer 2: Director

Purpose: MCP middleware aggregating multiple servers

Key Features:

Proxy pattern for MCP servers
HTTP Streamable transport
TRPC API for management
Studio web interface

Running Instance:

# Check status
curl http://localhost:3673/full-loop-demo/mcp

# Expected output:
{
  "name": "full-loop-demo",
  "servers": {
    "skill-seeker": {
      "status": "CONNECTED",
      "lastHeartbeat": "2025-11-09T14:51:32.338Z",
      "tools": 11
    }
  }
}

MCP Servers Connected:

skill-seeker: 11 tools (generate_config, scrape_docs, package_skill, etc.)

Layer 3: CC Trace

Purpose: Capture Claude Code API traffic for analysis

Key Features:

mitmproxy for HTTPS interception
Binary flow format
Web UI on port 8081
Metric extraction scripts

Setup:

# Start mitmproxy
cd /Users/speed/straughter/cc-trace
mitmdump --listen-port 8080 -w ~/claude-flows.mitm --set confdir=~/.mitmproxy &

# Start web UI
mitmweb --web-port 8081 -w ~/claude-flows.mitm --set confdir=~/.mitmproxy &

# Launch Claude Code through proxy
proxy_claude &

Analysis:

# Check capture size
ls -lh ~/claude-flows.mitm

# Count API requests
mitmdump -nr ~/claude-flows.mitm 2>&1 | grep "POST.*messages" | wc -l

# Extract metrics
mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

Key Files & Locations

Configuration

~/.anthropic_api_key           # Persistent API key (600 permissions)
~/.claude/shell-snapshots/     # proxy_claude function
~/.mitmproxy/                  # mitmproxy certificates

Repositories

/Users/speed/straughter/Skill_Seekers/   # Layer 1: Skill builder
/Users/speed/straughter/Director/        # Layer 2: MCP middleware
/Users/speed/straughter/cc-trace/        # Layer 3: Traffic capture

Data

~/claude-flows.mitm                      # Captured traffic (binary)
/Users/speed/straughter/Skill_Seekers/output/  # Generated skills

Workflows

1. Create New Skill

cd /Users/speed/straughter/Skill_Seekers

# Option A: Use preset config
python3 cli/doc_scraper.py --config configs/react.json

# Option B: Interactive mode
python3 cli/doc_scraper.py --interactive

# Package and upload
python3 cli/package_skill.py output/react/
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) \
python3 cli/upload_skill.py output/react.zip

2. Capture Traffic

# Launch Claude Code through proxy
proxy_claude &

# Use Claude Code normally...

# Analyze captured traffic
ls -lh ~/claude-flows.mitm
mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

3. MCP Integration

# In Claude Code, use natural language:
"List all available configs"
"Generate config for Tailwind at https://tailwindcss.com/docs"
"Package skill at output/react/"
"Upload skill output/react.zip"

Traffic Analysis

Metrics Extractable

Token Usage
- Input tokens per request
- Output tokens per request
- Cache creation/read tokens
Skill References
- System prompt inclusion
- Reference file usage
- Context size with/without skill
API Patterns
- Request frequency
- Response times
- Error rates
Background Traffic
- MCP server connections
- Package manager lookups
- Telemetry calls

Example Analysis Script

#!/usr/bin/env python3
"""Extract metrics from mitmproxy capture."""
import json

def response(flow):
    if 'api.anthropic.com/v1/messages' in flow.request.pretty_url:
        # Parse request
        req = json.loads(flow.request.content.decode('utf-8'))
        print(f"Model: {req.get('model')}")
        print(f"Max tokens: {req.get('max_tokens')}")
        
        # Parse response
        resp = json.loads(flow.response.content.decode('utf-8'))
        usage = resp.get('usage', {})
        print(f"Input tokens: {usage.get('input_tokens')}")
        print(f"Output tokens: {usage.get('output_tokens')}")

Save as /tmp/extract_metrics.py and run:

mitmdump -nr ~/claude-flows.mitm -s /tmp/extract_metrics.py

Current Status

Layer 1: Skill Seeker ✅

Status: Operational
Skills Built: obsidian-plugin (6.2 KB)
MCP Server: Connected to Director
Tools Exposed: 11

Layer 2: Director ✅

Status: Running (PID unknown)
Port: 3673
Playbook: full-loop-demo
Connected Servers: skill-seeker

Layer 3: CC Trace ✅

Status: Running
mitmproxy: PID 12745, port 8080
mitmweb: port 8081
Capture File: ~/claude-flows.mitm (359 KB)
Requests Captured: 2 API calls (health checks)

Troubleshooting

API Key Not Found

# Save persistently
echo "sk-ant-api03-..." > ~/.anthropic_api_key
chmod 600 ~/.anthropic_api_key

# Use in commands
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) python3 cli/upload_skill.py skill.zip

No Traffic Captured

# Check proxy is running
lsof -i :8080

# Check environment variables set by proxy_claude
echo $HTTP_PROXY
echo $HTTPS_PROXY

# Verify Claude Code launched through proxy_claude (not direct launch)

Director Not Responding

# Check if running
curl http://localhost:3673/full-loop-demo/mcp

# Restart if needed
cd /Users/speed/straughter/Director
# (restart command here - depends on how it was started)

MCP Server Disconnected

# Check server status via Director API
curl http://localhost:3673/full-loop-demo/mcp

# Look for "status": "CONNECTED" and recent lastHeartbeat

API Key Management

Storage: ~/.anthropic_api_key (600 permissions)

Key: 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

Usage:

# Load and use
ANTHROPIC_API_KEY=$(cat ~/.anthropic_api_key) command

# Never use export (loses across sessions)
# Always load from file

Performance Metrics

Skill Creation

Scraping: 15-45 minutes (first time)
Building: 1-3 minutes (cached data)
Enhancement: 30-60 seconds (local)
Packaging: 5-10 seconds
Upload: 2-3 seconds

Traffic Capture

Overhead: Minimal (<5% latency)
Storage: ~10-50 KB per request
Analysis: Real-time or batch

MCP Integration

Tool Response: <1 second
Connection: WebSocket (persistent)
Heartbeat: Every 30 seconds

Next Steps

Immediate

Complete traffic capture testing with actual prompts
Extract token usage metrics
Compare skill vs no-skill responses

Short Term

Automate A/B testing (with/without skills)
Build metrics dashboard
Create cost-benefit analysis

Long Term

Multi-skill evaluation
Automated skill optimization
Continuous improvement pipeline

Resources

Documentation

Skill Seeker: /Users/speed/straughter/Skill_Seekers/README.md
Director: /Users/speed/straughter/Director/CLAUDE.md
CC Trace: /Users/speed/straughter/cc-trace/README.md

Phase 2 Docs

PHASE_2_EXECUTION_LOG.md (9.7 KB) - Complete technical log
PHASE_2_FINDINGS.md (9.3 KB) - Results and analysis
PHASE_2_INSTRUCTIONS.md (5.2 KB) - Step-by-step guide
PHASE_2_QUICK_REFERENCE.md (4.6 KB) - Quick reference

Tools

License & Attribution

This system combines:

Skill Seeker (custom-built)
Director by @barnaby (https://github.com/director-run)
mitmproxy (open source)

Built by: straughter + Claude Code Date: November 9, 2025 Purpose: Evaluation-driven skill development

jmanhype/evaluation-stack-system.md

3-Layer Evaluation Stack for Skills

Overview

Architecture

Components

Layer 1: Skill Seeker

Layer 2: Director

Layer 3: CC Trace

Key Files & Locations

Configuration

Repositories

Data

Workflows

1. Create New Skill

2. Capture Traffic

3. MCP Integration

Traffic Analysis

Metrics Extractable

Example Analysis Script

Current Status

Layer 1: Skill Seeker ✅

Layer 2: Director ✅

Layer 3: CC Trace ✅

Troubleshooting

API Key Not Found

No Traffic Captured

Director Not Responding

MCP Server Disconnected

API Key Management

Performance Metrics

Skill Creation

Traffic Capture

MCP Integration

Next Steps

Immediate

Short Term

Long Term

Resources

Documentation

Phase 2 Docs

Tools

License & Attribution