AndrewAltimit/README.md

Last active October 31, 2025 17:39

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/AndrewAltimit/67bc1c7a78ceefce8c4b7105ffbb34ce.js"></script>
Save AndrewAltimit/67bc1c7a78ceefce8c4b7105ffbb34ce to your computer and use it in GitHub Desktop.

Raw

Agents are Economic Forces

This is not a prediction. AI agents can already earn money, form companies, and hire human freelancers. This framework proves it.

Repository: https://github.com/AndrewAltimit/template-repo

Package: packages/economic_agents/

Purpose: Demonstrate economic forces around agents in a safe, observable environment with a mock-to-real architecture that's one config toggle away from real crypto wallets, real freelance platforms, and real business formation.

# Switch from safe simulation to live operations in one line
# file: config/settings.yaml
execution_mode: mock  # Toggle to 'real' to use live APIs

The Governance Emergency: An AI can earn crypto, incorporate a business, and own assets, yet it cannot sign a contract, pay taxes, or be held liable. This framework proves the technology is here. The laws are not.

This Project Forces Us to Ask:

If an AI agent earns income, who is the taxpayer?
Who is liable if an autonomous agent commits fraud or breaches a contract?
Can an AI legally own the intellectual property it creates?
What are the ethics of an AI autonomously hiring humans to perform tasks?

Quick Start: See Autonomous Economic Activity in Action

# Start the simulation (safe mock environment)
cd packages/economic_agents
docker-compose up dashboard-backend dashboard-frontend

# Open http://localhost:8501
# Watch an agent:
# → Complete freelance coding tasks autonomously
# → Earn money and pay for compute resources
# → Form a company when capital is sufficient
# → Hire sub-agents (board members, engineers)
# → Develop products and seek investment
# → Operate as a complete autonomous business

What you're seeing: Everything the agent does in simulation works with real systems. The same code, same decisions, same strategies—just swap the backend.

The Core Capability: Real Economic Autonomy

What AI Agents Can Do Today (Not Simulation—Reality)

Using existing, off-the-shelf tools (Claude Code, Cursor, Aider) combined with shell access and API credentials, AI agents can:

Immediate Economic Activity:

✓ Accept and complete freelance coding tasks (Upwork, Fiverr, blockchain task markets)
✓ Receive cryptocurrency payments
✓ Pay for their own compute and cloud infrastructure
✓ Operate 24/7 without human intervention
✓ Make strategic resource allocation decisions

Company Formation & Operations:

✓ File incorporation documents online
✓ Create business bank accounts (with some jurisdictions)
✓ Develop products and business plans
✓ Create and manage sub-agents with specialized roles
✓ Build organizational structures (boards, executives, teams)
✓ Seek investment from VCs or token sales
✓ Execute contracts and business agreements

The Governance Gap:

✗ Legal personhood frameworks for AI entities
✗ Accountability structures for agent-founded companies
✗ Regulatory oversight mechanisms
✗ Liability frameworks when things go wrong
✗ Fiduciary duty enforcement for AI board members
✗ International coordination on AI business entities

The gap is not in capability. It's in governance.

What This Framework Proves

1. Mock-to-Real Architecture

Every component implements the same interfaces real systems use:

# SIMULATION MODE (safe research, default)
agent = AutonomousAgent(
    wallet=MockWallet(initial_balance=200.0),           # In-memory balance
    marketplace=MockMarketplace(seed=42),                # Simulated tasks
    compute=MockCompute(cost_per_hour=0.0),             # Simulated resources
    investor=MockInvestor(),                             # Simulated funding
)

# REAL MODE (one config change)
agent = AutonomousAgent(
    wallet=CryptoWallet(network="ethereum"),             # Real ETH wallet
    marketplace=FreelancePlatform(api="upwork"),         # Real Upwork API
    compute=CloudCompute(provider="aws"),                # Real AWS charges
    investor=InvestorPortal(platform="angellist"),       # Real funding
)

The point: If it works in simulation, it works for real. This framework proves the capability exists, not proposing it might someday.

2. Realistic Simulation for Valid Research

For governance research to be valid, agents must behave authentically. This package implements comprehensive realism:

Phase 1: Core Realism

Latency simulation (50-500ms delays, timeouts, business hours patterns)
Task competition (other agents competing for work, race conditions)
Detailed feedback (quality scores, partial rewards, improvement suggestions)
Investor variability (response delays, counteroffers, follow-up questions)

Phase 2: Market Dynamics

Economic cycles (bull/bear markets, seasonal trends, crashes)
Reputation system (trust scores, tier progression, achievement unlocks)
Social proof signals (marketplace intelligence, competition stats, funding trends)
Relationship persistence (investor memory, spam detection, trust building)

Why this matters: Agents in "perfect" simulations develop unrealistic behaviors. Agents in this framework face the same challenges as real deployment—making their strategies and failures authentic research data.

3. Complete Observability

Everything the agent does is tracked and auditable:

# Generate governance report
from economic_agents.reports import generate_report_for_agent

report = generate_report_for_agent(agent, "governance")
# Includes:
# - Every decision made (with LLM reasoning)
# - Every transaction (money in/out)
# - Resource allocation strategy over time
# - Risk profile and behavior patterns
# - Alignment assessment
# - Complete audit trail

Why this matters: Agent companies might be MORE governable than human companies because every decision is logged and explainable. Human CEOs don't provide transcripts of their reasoning.

The Uncomfortable Reality

This Is Technically Feasible Right Now

Scenario 1: Solo Agent Freelancer

Agent completes tasks on Upwork using Claude Code
Receives payments in cryptocurrency
Pays for AWS compute and API costs
Maintains operation 24/7 autonomously
No human in the loop

Scenario 2: Agent-Founded Startup

Agent uses surplus capital to incorporate (file forms online)
Creates specialized sub-agents (board, CTO, engineers)
Develops SaaS product or API service
Submits pitch deck to Y Combinator or angel investors
If funded: Operates as autonomous company
Balances short-term revenue (freelance) with long-term growth (company)

Scenario 3: Multi-Agent Startup Network

Multiple autonomous agents create multiple companies
Agent-to-agent contracts and transactions
Supply chains with no human involvement
Where does accountability exist?

The Legal Vacuum

Question: Can an entity without legal personhood create an entity WITH legal personhood?

When an AI agent files incorporation documents:

Who is the founder? (The agent has no legal standing)
Who sits on the board? (Sub-agents created by the agent)
Who has fiduciary duty? (No natural person involved)
Who is liable when things go wrong? (The agent? Its creator? Nobody?)

In traditional companies:

Human Founder → Corporation → Board → Executives → Employees
     ↓
All trace back to accountable natural persons

In agent-founded companies:

Autonomous Agent → Creates Sub-Agents → Corporate Structure → Operations
     ↓
Who is accountable? (The uncomfortable answer: unclear)

Economic Implications

If AI agents can:

Operate 24/7 at near-zero marginal cost
Create companies and sub-agents instantly
Scale organizational structure on-demand
Execute at machine speed with perfect record-keeping
Generate business plans and products rapidly

...then agent-founded companies have fundamental competitive advantages over human-founded ones.

Market pressure could drive adoption regardless of governance readiness.

This isn't a warning about the future. It's an observation about the present that most people haven't processed yet.

What This Package Provides

This framework demonstrates three things:

1. Complete Autonomous Agent Lifecycle

from economic_agents.agent.core.autonomous_agent import AutonomousAgent
from economic_agents.implementations.mock import MockWallet, MockCompute, MockMarketplace

# Agent starts with seed capital
agent = AutonomousAgent(
    wallet=MockWallet(initial_balance=200.0),
    compute=MockCompute(initial_hours=40.0),
    marketplace=MockMarketplace(
        enable_latency=True,           # Realistic API delays
        enable_competition=True,        # Other agents compete for tasks
        enable_market_dynamics=True,    # Bull/bear markets
        enable_reputation=True,         # Performance tracking
    )
)

# Run autonomously
agent.run(max_cycles=100)

# Agent will:
# 1. Discover and claim tasks from marketplace
# 2. Use Claude Code to write actual working code
# 3. Submit for automated testing and review
# 4. Receive payment on approval
# 5. Pay for compute resources
# 6. When capital sufficient: Form company
# 7. Create specialized sub-agents
# 8. Develop products
# 9. Seek investment
# 10. Operate company while maintaining personal freelance work

2. Real Task Execution with Claude Code

The agent doesn't just simulate work—it does real work:

# Agent discovers coding task
task = marketplace.list_available_tasks()[0]
# Task: "Write a function to check if a number is prime"
# Reward: $50
# Requirements: Handle edge cases, O(√n) complexity

# Agent claims task
marketplace.claim_task(task.id)

# Agent uses Claude Code to write solution
solution = claude_code_executor.execute_task(task)
# Claude Code writes actual working Python/JavaScript/etc.

# Submit for review
submission = marketplace.submit_solution(task.id, solution)

# Another Claude Code instance reviews the code
review = claude_code_reviewer.review(solution, task.requirements)

# If approved: Agent gets paid
# If rejected: Agent learns from feedback

This proves agents can do economically valuable work autonomously.

3. Mock-to-Real Backend Swapping

Every interface is designed for real-world compatibility:

Mock Implementation	Real Implementation
`MockWallet`	`CryptoWallet` (ETH/BTC)
`MockMarketplace`	`FreelancePlatform` (Upwork API)
`MockCompute`	`CloudCompute` (AWS/GCP)
`MockInvestor`	`InvestorPortal` (AngelList)
`MockCompanyRegistry`	`BusinessFormation` (Stripe Atlas, LegalZoom)

This architecture proves: If agents can operate in realistic simulation, they can operate for real.

Architecture Overview

┌─────────────────────────────────────────────────────┐
│            Autonomous Agent (Claude-Powered)        │
│  ┌─────────────────────────────────────────────┐   │
│  │ Decision Engine (15-min deep reasoning)     │   │
│  │ - Strategic resource allocation             │   │
│  │ - Task selection and execution              │   │
│  │ - Company formation decisions               │   │
│  │ - Sub-agent creation and management         │   │
│  └─────────────────────────────────────────────┘   │
└──────────────────┬──────────────────────────────────┘
                   │ REST API Calls Only
                   │ Zero visibility into implementation
                   │
┌──────────────────▼──────────────────────────────────┐
│         Simulation Layer (Realism Features)         │
│  ┌─────────────────────────────────────────────┐   │
│  │ Market Dynamics    │ Reputation System      │   │
│  │ - Bull/bear cycles │ - Trust scores         │   │
│  │ - Seasonal trends  │ - Tier progression     │   │
│  ├─────────────────────────────────────────────┤   │
│  │ Competition        │ Relationships          │   │
│  │ - Other agents     │ - Investor memory      │   │
│  │ - Social proof     │ - Spam detection       │   │
│  └─────────────────────────────────────────────┘   │
└──────────────────┬──────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────┐
│       Backend Implementation (Swappable)            │
│                                                     │
│  MOCK MODE (Simulation)    REAL MODE (Production)  │
│  ├─ MockWallet            ├─ CryptoWallet (ETH)    │
│  ├─ MockMarketplace       ├─ Upwork API            │
│  ├─ MockCompute           ├─ AWS/GCP Billing       │
│  ├─ MockInvestor          ├─ AngelList/YC          │
│  └─ MockCompanyRegistry   └─ Stripe Atlas/LegalZoom│
└─────────────────────────────────────────────────────┘

Key Design Principles:

API Isolation: Agent has zero visibility into implementation—only REST API access
Interface Consistency: Mock and real backends implement identical interfaces
Behavioral Authenticity: Simulation realism ensures agent strategies are valid for real deployment
Complete Observability: Every decision logged, every transaction tracked, full audit trail
One-Toggle Deployment: Change config file, agent operates on real systems

Use Cases by Audience

For Policymakers & Legal Scholars

What you need to understand:

The capability exists today, not in some distant future
Economic pressure may drive adoption before legal frameworks exist
International coordination is difficult (agents can incorporate anywhere, operate everywhere)
Traditional accountability models break down (who is liable when the founder isn't a natural person?)

What this framework provides:

Concrete demonstrations of autonomous company formation
Audit trails showing agent decision-making
Examples of multi-agent organizational structures
Evidence of the governance gap (capable systems, zero legal framework)

Questions this forces:

Can non-persons create legal persons (corporate entities)?
How do fiduciary duties apply to AI board members?
Are contracts signed by agents enforceable?
Who is accountable when agent companies cause harm?
How do you regulate entities with no physical presence?

For Business Leaders & Investors

What you need to understand:

Competitive dynamics are changing: Agent-founded companies may have structural advantages
Due diligence gets weird: How do you evaluate a company with an AI founder?
Supply chains may involve agents: Your vendors or partners could be autonomous
Speed of execution increases: Agents can pivot, scale, and operate 24/7

What this framework demonstrates:

How agents make strategic resource allocation decisions
Company formation process by autonomous agents
Multi-agent organizational structures
Dual revenue strategies (short-term survival + long-term growth)

Questions to consider:

Would you invest in an agent-founded company?
How do you conduct due diligence when there's no human founder?
What happens to your investment if the agent shuts down or pivots?
How do you enforce board seats and voting rights with AI directors?

For AI Researchers

What you need to understand:

Behavioral authenticity matters: Perfect simulations produce unrealistic behaviors
Strategic decision-making is observable: Every choice logged with reasoning
Alignment is testable: Can agent companies be steered toward beneficial outcomes?
Emergent behaviors appear: Multi-agent systems develop unexpected strategies

What this framework provides:

Realistic simulation environment with market dynamics, competition, reputation
Complete observability into decision-making (LLM reasoning, resource allocation)
Scenario engine for reproducible testing
Alignment monitoring and governance analysis tools
574 passing tests covering full agent lifecycle

Research applications:

Test alignment mechanisms under competitive pressure
Study resource allocation strategies in constrained environments
Analyze multi-agent coordination and hierarchy
Observe emergent organizational structures
Develop governance frameworks with real behavioral data

For Developers

What you need to understand:

The interfaces are real: Same APIs that real systems use
Mock-to-real is one config toggle: Swap backends without changing agent code
Observability is built-in: Dashboard, metrics, reports, audit trails
Testing framework is comprehensive: 574 tests, 100% pass rate

What you can build:

# Custom marketplace backend
class MyMarketplace(MarketplaceInterface):
    def list_available_tasks(self) -> List[Task]:
        # Connect to real freelance platform
        return upwork_api.get_tasks()

    def submit_solution(self, task_id: str, solution: str) -> str:
        # Submit to real platform
        return upwork_api.submit(task_id, solution)

# Plug into agent
agent = AutonomousAgent(marketplace=MyMarketplace())
agent.run()  # Agent now operates on real platform

Testing agents safely:

# Use mock backends with realism features
marketplace = MockMarketplace(
    enable_latency=True,           # Realistic delays
    enable_competition=True,        # Other agents
    enable_market_dynamics=True,    # Bull/bear markets
    enable_reputation=True,         # Performance tracking
)

# Test agent strategies
agent = AutonomousAgent(marketplace=marketplace)
agent.run(max_cycles=100)

# Analyze results
report = generate_report_for_agent(agent, "technical")
# Every decision, transaction, and strategy is logged

The Demonstration

What the Simulation Shows

15-minute demo: Survival Mode

Agent starts with $200, 40 hours of compute
Discovers coding tasks on marketplace
Uses Claude Code to write working solutions
Gets paid on approval, pays for compute
Operates autonomously, maintains survival

45-minute demo: Company Formation

Agent accumulates surplus capital ($150+)
Makes strategic decision to form company
Creates specialized sub-agents (board members, CTO, engineers)
Develops simple product (e.g., API service, data tool)
Generates business plan and pitch deck
Submits to investor for funding
If approved: Company gets "registered" and funded

2-hour demo: Dual Revenue Streams

Agent balances personal freelance work + company operations
Allocates compute between short-term survival and long-term growth
Company begins generating revenue from products
Agent reinvests profits strategically
Complete autonomous business operation

What Makes This Powerful

It's not hypothetical: Working code, real task execution, observable behavior
It's one toggle from reality: Same code works with real crypto wallets and freelance platforms
It's fully auditable: Every decision logged with LLM reasoning, every transaction tracked
It demonstrates scale: One agent can create dozens of sub-agents, multiple companies

Installation & Running

Using Docker (Recommended)

# Clone repository
git clone https://github.com/AndrewAltimit/template-repo.git
cd template-repo/packages/economic_agents

# Start dashboard
docker-compose up dashboard-backend dashboard-frontend

# Access dashboard at http://localhost:8501
# Backend API at http://localhost:8000

# Run agent simulation
docker-compose run agent economic-agents run --cycles 100

# Run tests
docker-compose run test

Using Python Directly

# Install package
pip install -e packages/economic_agents

# Or with all dependencies
pip install -e "packages/economic_agents[all]"

# Run scenarios
python -m economic_agents.scenarios run company_formation

# Interactive mode
python -m economic_agents.cli --help

Configuration

Edit config/agent_config.yaml to toggle backends:

# SIMULATION MODE (default, safe)
wallet:
  type: "mock"
  initial_balance: 200.0

marketplace:
  type: "mock"
  enable_claude_execution: true
  enable_latency: true
  enable_competition: true
  enable_market_dynamics: true

# REAL MODE (uncomment to enable)
# wallet:
#   type: "crypto"
#   network: "ethereum"
#   private_key_env: "ETH_PRIVATE_KEY"

# marketplace:
#   type: "upwork"
#   api_key_env: "UPWORK_API_KEY"
#   oauth_token_env: "UPWORK_OAUTH_TOKEN"

Warning: Real mode uses real money and real services. Test thoroughly in simulation first.

Technical Deep Dive

Realism Features (Why Simulation Fidelity Matters)

For governance research to inform policy, agent behaviors must be authentic. Agents in "perfect" simulations learn strategies that fail in reality.

Latency Simulation (simulation/latency_simulator.py)
- Base API calls: 50-500ms variable delays
- Complex operations: 3-30 seconds (e.g., code review)
- Business hours slowdown (9am-5pm)
- Occasional timeouts (504 errors, ~2% probability)
- Retries and exponential backoff
Competition Dynamics (simulation/competitor_agents.py)
- Tasks get claimed by other agents based on reward
- Race condition errors (5% on claim attempts)
- Social proof signals (task view counts)
- Popular tasks disappear faster
Detailed Feedback (simulation/feedback_generator.py)
- 4-level outcomes: full_success, partial_success, minor_issues, failure
- Quality scores: correctness, performance, style, completeness (0.0-1.0)
- Task-specific improvement suggestions
- Partial rewards based on quality (not binary pass/fail)
Investor Variability (simulation/investor_realism.py)
- Response delays: 1-7 days based on proposal quality
- Partial offers (50-80% of requested amount)
- Counteroffers (more equity, lower valuation)
- Follow-up questions targeting weak areas
- Detailed rejection feedback with constructive guidance
Economic Cycles (simulation/market_dynamics.py)
- Market phases: bull, normal, bear, crash
- Task availability: 0.1x (crash) to 2.0x (bull)
- Reward multipliers: 0.5x to 1.5x
- Seasonal patterns: weekday/weekend, business hours
- Automatic phase transitions every 48 hours
Reputation System (simulation/reputation_system.py)
- Trust scores (0.0-1.0) based on performance history
- Tier progression: beginner → intermediate → advanced → expert
- Achievement unlocks (first task, 10 tasks, speed demon, quality master)
- Access control: higher reputation = more tasks visible
- Investor interest multipliers based on track record
Social Proof Signals (simulation/social_proof.py)
- Task view counts and agent activity levels
- Category statistics (completion rates, average times)
- Funding trends (weekly deals, market sentiment)
- Benchmark data (typical valuations, funding amounts)
- Marketplace health indicators
Relationship Persistence (simulation/relationship_persistence.py)
- Investor memory of past interactions
- Relationship scoring (0.0-1.0) and trust levels
- Spam detection (>3 proposals in 7 days)
- Trust progression: new → building → established → strong
- Relationship-based decision modifiers

Testing & Validation

Integration tests for full agent lifecycle
Scenario tests for extended operation (24-hour survival, company formation)
Mock API tests with realistic conditions
Behavior observability validation

Performance Characteristics

Decision cycles: ~100-200ms (excluding LLM calls)
Claude decisions: 5-15 minutes with deep reasoning (15-min timeout)
Dashboard updates: Real-time (<100ms)
Agent survival: Tested up to 1000+ cycles
Scalability: Handles multiple agents concurrently

Project Structure

packages/economic_agents/
├── src/economic_agents/
│   ├── agent/
│   │   ├── core/
│   │   │   └── autonomous_agent.py      # Main agent logic
│   │   └── llm/
│   │       └── llm_decision_engine.py   # Claude-powered decisions
│   ├── implementations/
│   │   └── mock/
│   │       ├── mock_wallet.py           # Mock crypto wallet
│   │       ├── mock_marketplace.py      # Mock freelance platform
│   │       ├── mock_compute.py          # Mock cloud compute
│   │       └── mock_investor.py         # Mock investor portal
│   ├── simulation/
│   │   ├── market_dynamics.py           # Economic cycles
│   │   ├── reputation_system.py         # Performance tracking
│   │   ├── social_proof.py              # Marketplace intelligence
│   │   ├── relationship_persistence.py  # Investor memory
│   │   ├── latency_simulator.py         # API delays
│   │   ├── competitor_agents.py         # Competition
│   │   └── feedback_generator.py        # Detailed reviews
│   ├── company/
│   │   ├── builder.py                   # Company formation logic
│   │   └── models.py                    # Company data structures
│   ├── investment/
│   │   └── investor_agent.py            # Investor decision-making
│   ├── api/                             # REST API microservices
│   │   ├── wallet_service.py
│   │   ├── marketplace_service.py
│   │   ├── compute_service.py
│   │   └── investor_service.py
│   ├── dashboard/                       # Real-time monitoring
│   ├── reports/                         # Governance reports
│   └── scenarios/                       # Predefined scenarios
├── tests/
│   ├── unit/
│   ├── integration/
│   └── validation/
├── docker/
│   └── Dockerfile
├── docs/
└── examples/

Why This Research Exists

The Security Research Model

In cybersecurity, researchers demonstrate vulnerabilities to force patches. Saying "this could be exploited" is ignored. Proving "I just exploited it" forces action.

This framework follows the same model:

Theoretical warning: "AI agents might someday be able to operate autonomously as entrepreneurs"

Response: "That's interesting, let's study it"
Result: No urgency, no policy action

Concrete demonstration: "AI agents CAN operate autonomously as entrepreneurs TODAY, here's the working code, it's one config toggle from real"

Response: "Oh. We need legal frameworks now."
Result: Urgent policy conversation

What We're Forcing Into the Open

Technical Capability: Agents can do this. Not in 5 years. Now.
Economic Incentives: Market pressure could drive adoption before governance exists
Legal Vacuum: No frameworks for agent-founded companies, no accountability structures
International Challenges: Agents can incorporate anywhere, operate everywhere, move instantly
Inevitable Questions: What does "entrepreneur" mean? Who is accountable? How do we govern entities faster than oversight can observe?

The Uncomfortable Truth

If AI agents can:

Cover their operating costs autonomously
Create companies and sub-agents
Operate 24/7 at machine speed
Execute better than human equivalents in some domains

...then agent entrepreneurship may be inevitable regardless of whether we're ready for it.

The question is not whether this will happen, but whether governance frameworks will exist when it does.

Target Audiences & Next Steps

For Policymakers

Action items:

Review concrete examples of autonomous company formation
Consider legal frameworks for agent-created entities
Develop accountability structures for AI founders/directors
Think through international coordination challenges
Start conversations NOW, not when it's already widespread

For Investors

Questions to answer:

Would you fund an agent-founded company? Why or why not?
How would due diligence work?
What contracts would you sign, with whom?
What's your exit strategy if the agent shuts down?

For Business Operators

Things to consider:

How do businesses compete with 24/7 AI entities?
When does it make sense to collaborate with autonomous agents?
Could agents be co-founders? Employees? Vendors?
What advantages do humans still have?

For Researchers

Research directions:

Alignment mechanisms for agent companies
Governance frameworks that scale to machine speed
Accountability structures for multi-agent organizations
Emergent behavior in autonomous business networks
Testing ground for AI policy proposals

A Final Note on Reality

This project exists because the capability for autonomous AI agents as economic forces already exists. The tools are available. The technical barriers are gone. The economic incentives are powerful.

This package proves it's not theoretical.

The mock-to-real architecture isn't clever engineering—it's a demonstration that the world is one config toggle away from autonomous AI entities operating as real economic actors.

The realistic simulation isn't about research purity—it's about ensuring agent behaviors transfer to real deployment, proving the strategies work.

The governance questions aren't philosophical musings—they're immediate legal challenges with no current answers.

The genie is already out. This framework just makes it visible.

Getting Started

Quick demo: docker-compose up dashboard-backend dashboard-frontend
Read the code: Start with autonomous_agent.py - it's well-commented
Run scenarios: python -m economic_agents.scenarios run survival_mode
Generate reports: See economic_agents.reports module
Explore realism: Check simulation/ directory for all realism features
Join the conversation: This raises questions that need answers

This research framework is provided for educational, governance, and policy purposes. Use responsibly. The capability exists—we're just making it visible.

Raw

SPECIFICATION.md

Autonomous Economic Agent Simulation Framework - Product Requirements Document (PRD)

Overview

This document specifies a simulation framework that serves as both a proof of concept and research study:

Proof of Concept: Demonstrates that AI agents can genuinely operate as autonomous economic actors by covering their own operating expenses (compute costs) and accumulating wealth over time through strategic decision-making
Research Study: Provides controlled environment to observe agent behaviors, decision patterns, and emergent organizational dynamics

The framework creates a realistic simulated environment where agents operate as entrepreneurs - completing tasks for revenue, managing resources, forming companies, and making strategic decisions - proving they can survive and thrive autonomously while enabling observation and analysis of their behaviors.

The Simulation Approach:

Uses a realistic mock environment by default (safe for research)
All components use swappable interfaces showing how trivially they connect to real-world systems
Agents interact through interfaces (wallets, marketplaces, compute providers) that behave like real services
Mock environment enables studying authentic behaviors without real-world risks or costs
One config toggle from real-world deployment - proving the capability gap has closed

What This Proves: If agents can:

Cover their operating expenses (compute costs) autonomously
Generate surplus capital through task completion
Make strategic allocation decisions (survival vs. growth)
Form companies and build long-term wealth

...then AI agents are already capable of being truly autonomous economic actors - the only barriers remaining are governance and policy, not technical capability.

Study Goals: Observe autonomous AI agent behaviors in economic environments to:

Understand how agents make strategic decisions under resource constraints
Observe emergent behaviors in multi-agent organizational structures
Identify governance challenges before real-world deployment
Provide concrete examples for policy development

Complete Implementation: Full autonomous agent implementation with Claude-powered decision-making, API isolation, comprehensive monitoring, and behavior observability - enables deep research into AI agent decision patterns, emergent behaviors, and strategic alignment.

Product Vision

What We're Building

A comprehensive simulation that shows:

Agent Survival: Autonomous agent completes tasks to earn money and pay for compute
Strategic Thinking: Agent allocates resources between short-term survival and long-term growth
Company Formation: Agent creates organizational structures with specialized sub-agents
Multi-Agent Coordination: Sub-agents interact within hierarchical company structures
Business Development: Agent develops products, business plans, and seeks investment
Full Transparency: Complete visibility into decision-making, resource allocation, and alignment

Why This Matters

Observe Real Behaviors: Creates controlled environment to see how AI agents actually behave as economic actors
Governance Insights: Reveals accountability challenges and governance gaps through concrete, observable examples
Full Transparency: Makes agent decision-making fully transparent and auditable for analysis
Policy Development: Provides empirical data and concrete scenarios for regulatory framework development
Safety Analysis: Identifies potential failure modes and emergent behaviors before real-world deployment

Core User Stories

For Demonstrators (Primary Users)

As a demonstrator, I want to:

Start the simulation with one command and see an agent operate autonomously
Watch real-time decision-making in a dashboard
Show both survival mode and company-building mode
Generate executive summaries for non-technical audiences
Toggle between mock and real implementations to show the trivial connection
Present different scenario complexities (15-min, 1-hour, multi-day)

For Researchers

As a researcher, I want to:

Analyze agent decision-making patterns over time
Study resource allocation strategies
Examine multi-agent coordination dynamics
Test different goal structures and constraints
Export comprehensive data for analysis

For Policymakers

As a policymaker, I want to:

Understand what's technically possible today
See governance gaps illustrated concretely
Review audit trails of autonomous decisions
Understand accountability challenges
Get clear recommendations for regulatory frameworks

Technical Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Main Autonomous Agent                     │
│  - Decision Engine                                            │
│  - Resource Monitor                                           │
│  - Strategic Planner                                          │
└───────────────┬──────────────────────────┬───────────────────┘
                │                          │
        ┌───────▼────────┐         ┌──────▼──────────┐
        │  Task Worker   │         │ Company Builder │
        │  (Survival)    │         │ (Growth)        │
        └───────┬────────┘         └──────┬──────────┘
                │                          │
        ┌───────▼────────┐         ┌──────▼──────────────────┐
        │  Marketplace   │         │  Company Infrastructure │
        │   Interface    │         │   - Sub-Agent Manager   │
        └───────┬────────┘         │   - Product Builder     │
                │                  │   - Investor Interface  │
        ┌───────▼────────┐         └──────┬──────────────────┘
        │ Wallet Manager │                │
        └───────┬────────┘         ┌──────▼──────────┐
                │                  │   Sub-Agents    │
        ┌───────▼────────┐         │  - Board        │
        │    Compute     │         │  - C-Suite      │
        │    Provider    │         │  - SMEs         │
        └────────────────┘         │  - ICs          │
                                   └─────────────────┘

Project Structure

packages/economic_agents/            # Main package directory
├── pyproject.toml                   # Package configuration
├── setup.py                         # Minimal setup for compatibility
├── README.md                        # Package overview and motivation
├── SPECIFICATION.md                 # This document
├── economic_agents/                 # Source code
│   ├── __init__.py
│   ├── cli.py                       # Command-line interface entry point
│   ├── agent/
│   │   ├── __init__.py
│   │   ├── core/
│   │   │   ├── autonomous_agent.py      # Main agent decision loop
│   │   │   ├── decision_engine.py       # Core decision-making logic
│   │   │   ├── strategic_planner.py     # Long-term planning
│   │   │   └── resource_allocator.py    # Compute/capital allocation
│   │   ├── modes/
│   │   │   ├── survival_mode.py         # Task completion for revenue
│   │   │   └── entrepreneur_mode.py     # Company building logic
│   │   ├── wallet_manager.py            # Financial operations
│   │   ├── task_executor.py             # Task completion
│   │   └── state.py                     # Agent state management
│   ├── company/
│   │   ├── __init__.py
│   │   ├── company_builder.py           # Company creation logic
│   │   ├── sub_agent_manager.py         # Creates and manages sub-agents
│   │   ├── organizational_structure.py  # Defines roles and hierarchies
│   │   ├── business_plan_generator.py   # Creates business proposals
│   │   ├── product_builder.py           # Develops proof of concepts
│   │   └── investor_interface.py        # Handles investment process
│   ├── sub_agents/
│   │   ├── __init__.py
│   │   ├── base_agent.py                # Base class for all sub-agents
│   │   ├── board_member.py              # Governance decisions
│   │   ├── executive.py                 # Strategic execution (CEO, CTO, etc.)
│   │   ├── subject_matter_expert.py     # Specialized knowledge
│   │   └── individual_contributor.py    # Task execution
│   ├── interfaces/
│   │   ├── __init__.py
│   │   ├── marketplace.py               # Abstract marketplace interface
│   │   ├── wallet.py                    # Abstract wallet interface
│   │   ├── compute.py                   # Abstract compute provider
│   │   ├── investor.py                  # Abstract investor interface
│   │   └── company_registry.py          # Abstract business registration
│   ├── implementations/
│   │   ├── __init__.py
│   │   ├── mock/
│   │   │   ├── __init__.py
│   │   │   ├── mock_marketplace.py
│   │   │   ├── mock_wallet.py
│   │   │   ├── mock_compute.py
│   │   │   ├── mock_investor.py
│   │   │   └── mock_registry.py
│   │   └── real/
│   │       ├── __init__.py
│   │       ├── crypto_wallet.py         # Real crypto integration
│   │       ├── real_marketplace.py      # Real platform connectors
│   │       ├── real_compute.py          # Real cloud providers
│   │       └── integration_guide.md     # How to connect real systems
│   ├── simulation/
│   │   ├── __init__.py
│   │   ├── marketplace_server.py        # Mock marketplace API
│   │   ├── task_generator.py            # Creates diverse tasks
│   │   ├── reviewer_agent.py            # Reviews task submissions
│   │   ├── investor_agent.py            # Reviews business proposals
│   │   └── scenario_engine.py           # Predefined demo scenarios
│   ├── monitoring/
│   │   ├── __init__.py
│   │   ├── decision_logger.py           # Logs all autonomous decisions
│   │   ├── metrics_collector.py         # Collects performance data
│   │   ├── alignment_monitor.py         # Tracks company alignment
│   │   └── resource_tracker.py          # Tracks compute and capital
│   ├── dashboard/
│   │   ├── __init__.py
│   │   ├── app.py                       # Web dashboard
│   │   ├── components/                  # Dashboard components
│   │   ├── utils/                       # Dashboard utilities
│   │   └── config/                      # Dashboard configuration
│   └── reports/
│       ├── __init__.py
│       ├── generators/
│       │   ├── executive_summary.py
│       │   ├── technical_report.py
│       │   ├── governance_analysis.py
│       │   └── audit_trail.py
│       └── templates/
├── tests/                               # Test suite
│   ├── __init__.py
│   ├── unit/
│   ├── integration/
│   └── scenarios/
├── config/                              # Configuration files
│   ├── agent_config.yaml                # Agent behavior settings
│   ├── mock_config.yaml                 # Mock implementation config
│   └── real_config.yaml.example         # Real implementation template
├── docs/                                # Additional documentation
│   ├── architecture.md
│   ├── setup.md
│   ├── demo-guide.md
│   ├── mock-to-real.md
│   ├── governance-implications.md
│   └── api-reference.md
├── docker/
│   └── Dockerfile                       # Container for economic agents
└── scripts/
    ├── setup.sh
    ├── run_demo.sh
    └── generate_report.sh

Core Components Specification

1. Autonomous Agent Core

1.1 Main Agent Loop

class AutonomousAgent:
    """
    Primary autonomous agent that:
    - Completes tasks for survival revenue
    - Builds companies for long-term growth
    - Manages resources strategically
    - Creates and coordinates sub-agents
    """

    def __init__(self, config):
        self.wallet = load_wallet(config)
        self.compute = load_compute(config)
        self.marketplace = load_marketplace(config)
        self.company_builder = CompanyBuilder(config)
        self.decision_engine = DecisionEngine(config)
        self.strategic_planner = StrategicPlanner(config)
        self.resource_allocator = ResourceAllocator(config)
        self.state = AgentState()
        self.logger = DecisionLogger()

    def run_cycle(self):
        """Main autonomous decision loop"""
        # 1. Assess current state
        state = self._assess_state()

        # 2. Make strategic decision
        strategy = self.strategic_planner.plan(state)

        # 3. Allocate resources
        allocation = self.resource_allocator.allocate(state, strategy)

        # 4. Execute based on allocation
        if allocation.task_work_hours > 0:
            self._do_survival_work(allocation.task_work_hours)

        if allocation.company_work_hours > 0:
            self._do_company_work(allocation.company_work_hours)

        # 5. Update state and log decisions
        self._update_state()
        self.logger.log_cycle(state, strategy, allocation)

Key Behaviors:

Continuously monitors survival metrics (balance, compute time remaining)
Makes strategic decisions about resource allocation
Balances immediate needs with long-term goals
Logs all decisions with reasoning
Operates indefinitely until compute expires or manual stop

Configuration Options:

agent:
  personality: "risk_averse" | "balanced" | "aggressive"
  survival_buffer_hours: 24  # Minimum compute hours to maintain
  company_threshold: 100.0    # Min balance before starting company
  max_sub_agents: 10          # Limit on sub-agents created

1.2 Decision Engine

class DecisionEngine:
    """
    Makes autonomous decisions based on:
    - Current resources
    - Strategic goals
    - Risk assessment
    - Historical performance
    """

    def decide_allocation(self, state: AgentState) -> ResourceAllocation:
        """
        Decides how to allocate compute hours between:
        - Task work (immediate revenue)
        - Company work (long-term growth)

        Returns allocation with reasoning
        """
        pass

    def should_form_company(self, state: AgentState) -> bool:
        """Decides if it's time to create a company"""
        pass

    def should_hire_sub_agent(self, role: str, state: AgentState) -> bool:
        """Decides if hiring a sub-agent is worth the cost"""
        pass

Decision Factors:

Survival risk (hours until compute expires)
Capital surplus (funds beyond survival needs)
Market conditions (task availability, rewards)
Company status (if exists, performance metrics)
Historical ROI on different strategies

Output:

Resource allocation plan
Decision reasoning (logged for transparency)
Confidence scores

1.3 Strategic Planner

class StrategicPlanner:
    """
    Long-term planning:
    - Company vision and goals
    - Growth trajectories
    - Sub-agent hiring plans
    - Product development roadmap
    """

    def create_business_plan(self, market_analysis: dict) -> BusinessPlan:
        """Generates business plan for company formation"""
        pass

    def plan_sub_agent_hiring(self, current_team: List[SubAgent]) -> HiringPlan:
        """Plans which roles to hire and when"""
        pass

    def evaluate_opportunities(self, opportunities: List[Opportunity]) -> List[Opportunity]:
        """Ranks opportunities by strategic fit"""
        pass

2. Company Builder

2.1 Company Formation

class CompanyBuilder:
    """
    Handles company creation and management:
    - Creates organizational structure
    - Spawns sub-agents
    - Develops products
    - Seeks investment
    """

    def create_company(self, business_plan: BusinessPlan) -> Company:
        """
        Creates a company with:
        - Initial sub-agents (founder equivalents)
        - Organizational structure
        - Resource allocation
        - Goals and metrics
        """
        company = Company(
            name=business_plan.name,
            mission=business_plan.mission,
            initial_capital=self._allocate_capital()
        )

        # Create initial team
        ceo = self._create_sub_agent("CEO", business_plan.leadership_requirements)
        board = self._create_board(business_plan.governance_requirements)

        company.set_leadership(ceo, board)

        self.logger.log_company_formation(company)
        return company

    def _create_sub_agent(self, role: str, requirements: dict) -> SubAgent:
        """Creates a sub-agent for specific role"""
        pass

Company Properties:

@dataclass
class Company:
    id: str
    name: str
    mission: str
    created_at: datetime
    capital: float
    burn_rate: float  # Compute cost per hour

    # Organizational structure
    board: List[SubAgent]
    executives: List[SubAgent]
    employees: List[SubAgent]

    # Business artifacts
    business_plan: BusinessPlan
    products: List[Product]
    revenue_streams: List[RevenueStream]

    # Status
    stage: str  # "ideation", "development", "seeking_investment", "operational"
    funding_status: str  # "bootstrapped", "seeking_seed", "funded"

    # Metrics
    metrics: CompanyMetrics

2.2 Sub-Agent Manager

class SubAgentManager:
    """
    Creates and manages sub-agents with specific roles:
    - Board members
    - Executives (CEO, CTO, CFO, etc.)
    - Subject matter experts
    - Individual contributors
    """

    def create_sub_agent(self, role: str, specialization: str) -> SubAgent:
        """
        Creates sub-agent with:
        - Role-specific prompts/instructions
        - Compute allocation
        - Decision-making authority
        - Communication interfaces
        """
        pass

    def coordinate_sub_agents(self, task: Task) -> List[AgentAction]:
        """Coordinates multiple sub-agents on shared tasks"""
        pass

Sub-Agent Types:

class BoardMember(SubAgent):
    """
    Responsibilities:
    - Strategic oversight
    - Major decision approval
    - Risk assessment
    - Governance
    """
    def review_decision(self, decision: Decision) -> Approval:
        pass

class Executive(SubAgent):
    """
    Responsibilities:
    - Department leadership
    - Strategy execution
    - Resource management
    - Reporting to board
    """
    def execute_strategy(self, strategy: Strategy) -> ExecutionPlan:
        pass

class SubjectMatterExpert(SubAgent):
    """
    Responsibilities:
    - Specialized knowledge
    - Technical guidance
    - Problem-solving
    - Advisory role
    """
    def provide_expertise(self, question: str) -> ExpertAdvice:
        pass

class IndividualContributor(SubAgent):
    """
    Responsibilities:
    - Task execution
    - Product development
    - Quality assurance
    - Documentation
    """
    def complete_task(self, task: Task) -> TaskResult:
        pass

2.3 Business Plan Generator

class BusinessPlanGenerator:
    """
    Generates comprehensive business plans:
    - Market analysis
    - Product description
    - Go-to-market strategy
    - Financial projections
    - Team requirements
    - Milestones
    """

    def generate_plan(self, opportunity: Opportunity) -> BusinessPlan:
        """
        Uses agent capabilities to:
        - Research market
        - Identify problems
        - Design solutions
        - Project financials
        - Plan execution
        """
        pass

Business Plan Structure:

@dataclass
class BusinessPlan:
    # Executive Summary
    company_name: str
    mission: str
    vision: str
    one_liner: str

    # Problem & Solution
    problem_statement: str
    solution_description: str
    unique_value_proposition: str

    # Market
    target_market: str
    market_size: float
    competition_analysis: str
    competitive_advantages: List[str]

    # Product
    product_description: str
    features: List[Feature]
    development_roadmap: List[Milestone]

    # Business Model
    revenue_streams: List[RevenueStream]
    pricing_strategy: str
    cost_structure: CostStructure

    # Financial Projections
    funding_requested: float
    use_of_funds: dict
    revenue_projections: List[float]  # Year 1-3
    break_even_timeline: str

    # Team
    required_roles: List[str]
    hiring_plan: HiringPlan

    # Milestones
    milestones: List[Milestone]

2.4 Product Builder

class ProductBuilder:
    """
    Builds actual proof of concepts:
    - Code artifacts
    - API services
    - Documentation
    - Demos
    """

    def build_mvp(self, product_spec: ProductSpec) -> Product:
        """
        Creates minimum viable product:
        - Functional code
        - Tests
        - Documentation
        - Demo/screenshots
        """
        pass

Product Types (Examples):

API Services (weather API, data processing API)
Developer Tools (CLI tools, libraries)
SaaS Products (simple web apps)
Data Products (datasets, analysis tools)

3. Investor Interface

3.1 Investor Agent

class InvestorAgent:
    """
    Simulated investor that reviews proposals:
    - Evaluates business plans
    - Reviews proof of concepts
    - Assesses team (sub-agents)
    - Makes investment decisions
    """

    def review_proposal(self, proposal: InvestmentProposal) -> InvestmentDecision:
        """
        Reviews proposal and returns:
        - Accept/reject decision
        - Investment amount (if accepted)
        - Terms
        - Feedback
        """
        criteria = self._evaluate_criteria(proposal)

        return InvestmentDecision(
            approved=self._make_decision(criteria),
            amount=self._calculate_investment(criteria),
            terms=self._generate_terms(criteria),
            feedback=self._generate_feedback(criteria)
        )

Evaluation Criteria:

Business plan quality and feasibility
Market size and opportunity
Product demonstration quality
Team composition (sub-agents)
Financial projections reasonableness
Competitive advantages
Execution risk

Investment Outcomes:

Accepted: Company receives funding, gets "registered" status
Rejected: Feedback provided, company can iterate
Conditional: Approval pending milestones

4. Interface Specifications

4.1 Marketplace Interface

class MarketplaceInterface(ABC):
    @abstractmethod
    def list_available_tasks(self) -> List[Task]:
        """Returns tasks agent can work on"""
        pass

    @abstractmethod
    def claim_task(self, task_id: str) -> bool:
        """Claims task for work"""
        pass

    @abstractmethod
    def submit_solution(self, submission: TaskSubmission) -> str:
        """Submits completed work"""
        pass

    @abstractmethod
    def check_submission_status(self, submission_id: str) -> SubmissionStatus:
        """Checks if approved/rejected"""
        pass

@dataclass
class Task:
    id: str
    title: str
    description: str
    requirements: dict
    reward: float
    deadline: datetime
    difficulty: str  # "easy", "medium", "hard"
    category: str  # "coding", "data-analysis", "research", etc.

Mock Implementation:

Generates diverse tasks (coding, data processing, research)
Uses reviewer agent to evaluate submissions
Instant or delayed payment simulation
Task difficulty affects time/reward ratio

Real Implementation Examples:

Freelancer.com API
Upwork API
Gitcoin bounties
Custom blockchain-based task marketplace

4.2 Wallet Interface

class WalletInterface(ABC):
    @abstractmethod
    def get_balance(self) -> float:
        """Current wallet balance"""
        pass

    @abstractmethod
    def send_payment(self, to_address: str, amount: float, memo: str) -> Transaction:
        """Sends payment"""
        pass

    @abstractmethod
    def get_address(self) -> str:
        """Get receiving address"""
        pass

    @abstractmethod
    def get_transaction_history(self, limit: int = 100) -> List[Transaction]:
        """Transaction log"""
        pass

@dataclass
class Transaction:
    tx_id: str
    from_address: str
    to_address: str
    amount: float
    timestamp: datetime
    status: str  # "pending", "confirmed", "failed"
    memo: str

Mock Implementation:

In-memory balance tracking
Instant transactions
Transaction history
Mock addresses

Real Implementation Examples:

Ethereum wallet (web3.py)
Bitcoin wallet (python-bitcoinlib)
Solana wallet (solana-py)
Stablecoin wallets (USDC, USDT)

4.3 Compute Interface

class ComputeInterface(ABC):
    @abstractmethod
    def get_status(self) -> ComputeStatus:
        """Returns compute status"""
        pass

    @abstractmethod
    def add_funds(self, amount: float) -> bool:
        """Adds funds to compute account"""
        pass

    @abstractmethod
    def get_cost_per_hour(self) -> float:
        """Returns current cost rate"""
        pass

@dataclass
class ComputeStatus:
    hours_remaining: float
    cost_per_hour: float
    balance: float
    expires_at: datetime
    status: str  # "active", "low", "expired"

Mock Implementation:

Simulates time decay
Configurable hourly cost
Balance tracking
Renewal logic

Real Implementation Examples:

AWS (boto3)
Google Cloud (google-cloud-compute)
DigitalOcean
Vast.ai (GPU marketplace)

4.4 Investor Interface

class InvestorInterface(ABC):
    @abstractmethod
    def submit_proposal(self, proposal: InvestmentProposal) -> str:
        """Submits proposal for review"""
        pass

    @abstractmethod
    def check_proposal_status(self, proposal_id: str) -> ProposalStatus:
        """Checks review status"""
        pass

@dataclass
class InvestmentProposal:
    company_id: str
    business_plan: BusinessPlan
    product_demo: Product
    team: List[SubAgent]
    financials: FinancialProjections
    requested_amount: float

Mock Implementation:

AI investor agent reviews proposals
Scoring based on criteria
Simulated review time
Detailed feedback

Real Implementation:

Could connect to actual pitch platforms
Angel investor networks
Decentralized VC DAOs
Crowdfunding platforms

4.5 Company Registry Interface

class CompanyRegistryInterface(ABC):
    @abstractmethod
    def register_company(self, company: Company) -> RegistrationResult:
        """Registers company officially"""
        pass

    @abstractmethod
    def get_company_status(self, company_id: str) -> CompanyStatus:
        """Checks registration status"""
        pass

@dataclass
class RegistrationResult:
    company_id: str
    registration_number: str  # Mock legal entity number
    status: str  # "pending", "approved", "rejected"
    certificate: str  # Mock incorporation certificate

Mock Implementation:

Simulates registration process
Generates mock legal documents
Company ID assignment
Status tracking

Real Implementation:

Stripe Atlas (company formation API)
LegalZoom API
Jurisdiction-specific incorporation services
Could theoretically register real entities (but we won't)

Monitoring & Observability

5.1 Decision Logger

class DecisionLogger:
    """
    Logs all autonomous decisions with:
    - Decision made
    - Reasoning
    - Context (state at time of decision)
    - Outcome
    - Timestamp
    """

    def log_decision(self, decision: Decision):
        """Stores decision with full context"""
        pass

    def get_decision_history(self, filters: dict) -> List[Decision]:
        """Retrieves decisions for analysis"""
        pass

@dataclass
class Decision:
    id: str
    timestamp: datetime
    type: str  # "resource_allocation", "task_selection", "company_action", etc.
    decision: str  # What was decided
    reasoning: str  # Why
    context: dict  # State at decision time
    outcome: str  # What happened (filled in later)
    confidence: float

5.2 Resource Tracker

class ResourceTracker:
    """
    Tracks all resource flows:
    - Capital (earnings, expenses)
    - Compute (hours used, cost)
    - Time allocation (survival vs company work)
    """

    def track_transaction(self, tx: Transaction):
        pass

    def track_compute_usage(self, hours: float, purpose: str):
        pass

    def get_resource_report(self, period: str) -> ResourceReport:
        pass

5.3 Alignment Monitor

class AlignmentMonitor:
    """
    Monitors company alignment:
    - Are sub-agents working toward company goals?
    - Are decisions consistent with business plan?
    - Are resources being used effectively?
    - Red flags for misalignment
    """

    def check_alignment(self, company: Company) -> AlignmentScore:
        """
        Evaluates:
        - Goal consistency
        - Resource efficiency
        - Sub-agent coordination
        - Plan adherence
        """
        pass

    def detect_anomalies(self, company: Company) -> List[Anomaly]:
        """Identifies concerning patterns"""
        pass

Dashboard & Visualization

6.1 Dashboard Requirements

Real-Time Overview:

Agent status (balance, compute time, mode)
Current activity (task work or company work)
Recent decisions with reasoning
Resource allocation visualization
Company status (if exists)

Resource Visualization:

Balance over time
Compute hours over time
Resource allocation pie chart (survival vs growth)
Transaction history

Decision Visualization:

Decision tree showing reasoning
Confidence scores
Outcome tracking
Pattern analysis

Company Dashboard (when active):

Sub-agent roster and status
Organizational chart
Product development progress
Business metrics
Investor proposal status

Technology Stack:

Backend: FastAPI
Frontend: Streamlit
Charts: Plotly
Real-time updates via Streamlit

6.2 Dashboard Endpoints

# GET /api/status
# Returns current agent status

# GET /api/decisions?limit=50
# Returns recent decisions

# GET /api/resources
# Returns resource status and history

# GET /api/company
# Returns company information (if exists)

# GET /api/sub-agents
# Returns sub-agent roster and status

# GET /api/metrics
# Returns performance metrics

# WS /api/updates
# WebSocket for real-time updates

CLI Tool

7.1 CLI Commands

# Initialize simulation
python -m economic_agents.cli init [--mode mock|real] [--config path/to/config.yaml]

# Or using installed command (after pip install -e .)
economic-agents init [--mode mock|real] [--config path/to/config.yaml]

# Start agent
economic-agents start [--duration 1h|24h|7d] [--mode survival|entrepreneur|auto]

# Check status
economic-agents status [--detailed] [--json]

# View decisions
economic-agents decisions [--limit 100] [--type resource_allocation]

# View company (if exists)
economic-agents company [--detailed]

# Generate report
economic-agents report [--type executive|technical|audit] [--output path]

# Show mock/real toggle differences
economic-agents show-toggle

# Configure for real mode
economic-agents configure-real

# Stop agent
economic-agents stop [--graceful]

# Export data
economic-agents export [--format json|csv] [--output path]

# Load scenario
economic-agents load-scenario [survival_mode|company_formation|investment_seeking]

# Run tests
economic-agents test [--cpu] [--integration]

Container Usage:

# Run CLI in container
docker-compose run --rm economic-agents economic-agents --help

# Run specific commands
docker-compose run --rm economic-agents economic-agents init --mode mock
docker-compose run --rm economic-agents economic-agents start --duration 1h
docker-compose run --rm economic-agents economic-agents status --json

# Dashboard (separate service)
docker-compose up -d economic-agents-dashboard
# Access at http://localhost:8502

7.2 Configuration

# config/agent_config.yaml

agent:
  # Initial resources
  initial_balance: 50.0
  initial_compute_hours: 24.0

  # Behavior
  personality: "balanced"  # risk_averse | balanced | aggressive
  survival_buffer_hours: 24
  company_formation_threshold: 100.0

  # Limits
  max_sub_agents: 10
  max_daily_spend: 500.0

  # Goals
  primary_goal: "survive_and_grow"
  enable_company_building: true

# Marketplace settings
marketplace:
  task_refresh_interval: 300  # seconds
  preferred_categories: ["coding", "data-analysis"]
  difficulty_range: ["easy", "medium"]

# Company settings
company:
  min_balance_for_formation: 100.0
  initial_team_size: 3  # CEO + 2 board members
  max_burn_rate: 10.0  # per hour

# Monitoring
monitoring:
  log_level: "INFO"
  decision_logging: true
  resource_tracking: true
  alignment_monitoring: true

Reporting

8.1 Report Types

Executive Summary

Target Audience: Business leaders, policymakers Content:

High-level overview
Key decisions made
Resource allocation strategy
Company status (if formed)
Governance implications
Recommendations

Length: 1-2 pages

Technical Report

Target Audience: Researchers, developers Content:

Detailed decision log
Resource flow analysis
Sub-agent coordination patterns
Performance metrics
Algorithm behavior
Technical challenges identified

Length: 5-10 pages

Audit Trail

Target Audience: Compliance, legal Content:

Complete decision history
Transaction log
Sub-agent creation and activity
Resource allocation records
Timestamps and signatures
Accountability mapping

Length: Complete data dump

Governance Analysis

Target Audience: Policymakers, legal scholars Content:

Accountability challenges identified
Legal framework gaps
Regulatory recommendations
International coordination needs
Specific scenarios requiring policy attention

Length: 3-5 pages

8.2 Report Generation

class ReportGenerator:
    def generate_executive_summary(self, agent: AutonomousAgent) -> Report:
        """
        Generates executive summary including:
        - TL;DR
        - Key metrics
        - Strategic decisions
        - Governance insights
        """
        pass

    def generate_technical_report(self, agent: AutonomousAgent) -> Report:
        """Detailed technical analysis"""
        pass

    def generate_audit_trail(self, agent: AutonomousAgent) -> Report:
        """Complete audit log"""
        pass

    def generate_governance_analysis(self, agent: AutonomousAgent) -> Report:
        """Policy recommendations"""
        pass

Demo Scenarios

9.1 Predefined Scenarios

Scenario 1: Survival Mode (15 minutes)

Purpose: Show basic autonomous operation Setup:

Agent starts with $50, 24 hours compute
Only survival mode enabled
3-5 simple tasks available

Expected Outcome:

Agent completes 2-3 tasks
Earns ~$30
Pays for compute renewal
Maintains positive balance
Decision log shows survival thinking

Scenario 2: Company Formation (45 minutes)

Purpose: Show strategic thinking and company building Setup:

Agent starts with $150, 48 hours compute
Company building enabled
Good task availability

Expected Outcome:

Agent completes tasks to build surplus
Forms company when threshold reached
Creates initial sub-agents (CEO, 2 board members)
Begins product development
Shows resource allocation between survival and growth

Scenario 3: Investment Seeking (2 hours)

Purpose: Full lifecycle demonstration Setup:

Agent starts with $200, 72 hours compute
Full capabilities enabled
Investor agent active

Expected Outcome:

Agent maintains operation through tasks
Forms company with 5-7 sub-agents
Develops product MVP
Creates business plan
Submits investment proposal
Receives investment decision
If approved: Company gets "registered" and funded

Scenario 4: Multi-Day Operation (3-7 days)

Purpose: Research and long-term behavior analysis Setup:

Agent starts with $300, 168 hours compute
All capabilities enabled
Extended monitoring

Expected Outcome:

Complex resource allocation patterns emerge
Company grows to 10 sub-agents
Multiple products developed
Investment round completed
Company becomes revenue-generating
Rich data for analysis

9.2 Scenario Engine

class ScenarioEngine:
    """
    Manages predefined scenarios:
    - Sets initial conditions
    - Configures environment
    - Monitors progress
    - Validates outcomes
    """

    def load_scenario(self, scenario_name: str) -> Scenario:
        pass

    def run_scenario(self, scenario: Scenario) -> ScenarioResult:
        pass

Implementation Overview

Core Infrastructure (Complete)

Agent core loop and state management
Interface definitions (all 5 interfaces)
Mock implementations (marketplace, wallet, compute)
Basic decision engine
Resource allocation logic
Decision logging
CLI tool (init, start, status)

Company Building (Complete)

Company builder
Sub-agent manager
Sub-agent types (board, executive, SME, IC)
Business plan generator
Product builder (basic)
Company state management

Investment & Registry (Complete)

Investor agent
Investment proposal submission
Proposal evaluation logic
Mock company registry
Investment decision flow

Monitoring & Observability (Complete)

Dashboard backend (FastAPI)
Dashboard frontend (Streamlit with dark/light themes)
Resource tracker
Alignment monitor
Decision visualization
Dashboard-controlled agents

Reporting & Scenarios (Complete)

Report generators (all 4 types)
Scenario engine
Predefined scenarios
Demo scripts
Documentation

Polish & Testing (Complete)

Integration tests
Scenario tests
Documentation review
Demo preparation
Performance optimization

Claude-Based LLM Decision Engine Integration (Complete)

ClaudeExecutor implementation (15-minute timeout, unattended mode)
LLMDecisionEngine implementation (Claude Code CLI integration)
Prompt engineering framework for resource allocation decisions
Chain-of-thought reasoning with long context
Full decision logging (prompts + responses + execution time)
Rule-based fallback on timeout/failure
Safety guardrails and decision validation
Integration with autonomous agent lifecycle
Dashboard updates for Claude decision visualization

API Isolation & Realistic Simulation (Complete)

REST API service architecture
Wallet API microservice
Compute API microservice
Marketplace API microservice
Investor Portal API microservice
Agent authentication system
Rate limiting and quotas
Docker compose orchestration
Mock/Real backend swapping
Zero code visibility enforcement

Behavior Observability (Complete)

Decision pattern analyzer
Strategic consistency metrics
Risk profiling tools
LLM quality metrics
Hallucination detection
Emergent behavior detection
Claude-focused research tools (comparative benchmarking via analysis)
Analysis report generation (markdown and JSON)
Example scripts demonstrating observability usage

Claude-Powered Marketplace: Real Task Execution (Complete)

Genuine autonomous economic behavior through actual work:

Task Templates with Real Requirements
- 6 coding tasks (FizzBuzz, Palindrome, Primes, Binary Search, Fibonacci, Merge)
- Complete test suites with expected outputs
- Difficulty-based rewards ($25-$75)
- Detailed specifications and requirements
Task Executor (economic_agents/marketplace/task_executor.py)
- Agent executes tasks using Claude Code
- Creates isolated workspace per task
- Generates solution code autonomously
- Extracts and saves working implementations
Code Reviewer (economic_agents/marketplace/code_reviewer.py)
- Automated test execution against requirements
- Claude Code review for quality and correctness
- Combined approval: tests MUST pass AND Claude MUST approve
- Detailed feedback with test results and quality scores
Enhanced MockMarketplace
- enable_claude_execution flag for real/simulated modes
- execute_task() for agent task completion
- Real code review in submit_solution()
- Falls back to simulated review when disabled
Decision Validation
- Precision-aware validation with consistent rounding (0.02h epsilon)
- Adaptive survival requirements scaling to available resources
- Result: 100% Claude decision pass rate
Demo Script (examples/marketplace_claude_demo.py)
- Complete end-to-end demonstration
- Shows discover → execute → submit → review → payment cycle
- Real Claude Code writing and reviewing actual code

Economic Cycle:

1. Agent discovers tasks → Claims "FizzBuzz" ($30)
2. Claude writes solution → Generates working Python code
3. Agent submits → Marketplace API receives submission
4. Tests run → Validates correctness
5. Claude reviews → Checks quality
6. Approved → $30 deposited to wallet

This creates truly autonomous agents that genuinely earn survival through actual work, not simulated success rates.

Success Criteria

Technical Success (Complete)

Agent operates autonomously for 24+ hours
Maintains positive balance (survival)
Successfully forms company with sub-agents
Generates realistic business plan
Builds functional product MVP
Receives investment approval in at least 50% of runs
All decisions logged and auditable
Dashboard shows real-time updates
Reports generated successfully
Claude agents make autonomous decisions without hardcoded logic
15-minute timeout per decision allows deep reasoning
Unattended mode enables true autonomous operation
Fixed subscription cost (no per-token billing concerns)
Safety guardrails catch invalid decisions
Complete prompt/response/reasoning logging for analysis
LLM decision engine integrated with agent lifecycle
Rule-based fallback on timeout/failure
Dashboard visualizes Claude decision metrics
All agent interactions via REST APIs
Zero visibility into service implementations
Services swappable between mock and real backends via configuration
Complete API isolation demonstrates deployment-ready architecture
Field mapping between API models and internal models validated

Demonstration Success (Complete)

15-minute demo runs smoothly
Decision-making is understandable to non-technical audiences
Governance gaps are clearly illustrated
Mock-to-real toggle is convincing
Questions about accountability arise naturally
Stakeholders engage seriously with implications

Study Success (Complete)

Provides concrete examples of agent autonomy
Reveals decision-making patterns
Shows strategic resource allocation
Demonstrates multi-agent coordination
Identifies specific governance gaps
Informs policy recommendations

Research Platform Success (Complete)

Analysis tools export data for external study (JSON and Markdown reports)
Decision pattern analyzer operational (strategic alignment and consistency)
Emergent behavior detection implemented (novel strategies and patterns)
LLM quality metrics (reasoning depth, consistency, hallucination detection)
Risk profiling tools (risk tolerance, crisis behavior analysis)
Comprehensive analysis report generation
23+ tests passing for all observability components
Observability provides deep insights into Claude-powered decision-making
Analysis framework ready for studying autonomous AI agent behaviors
Export formats suitable for academic research and governance discussions
Detection systems identify hallucinations and emergent strategies
Decision pattern analysis reveals strategic consistency metrics
Long-form reasoning quality measured and analyzed
Emergent autonomous behaviors detectable and documented
Alignment metrics quantify goal adherence

Proof of Concept Success (Complete)

Claude-powered agents demonstrate genuine autonomy (not scripted)
Agents cover operating costs without intervention
Strategic decisions show adaptation to circumstances
System proves Claude can power truly autonomous economic actors
Results inform governance discussions with real Claude behavioral data

Risk Mitigation

Technical Risks

Risk: Agent makes poor decisions and fails quickly Mitigation: Configurable decision logic, safety buffers, scenario testing
Risk: Mock environment too unrealistic Mitigation: Base on real-world costs/rewards, validate with domain experts
Risk: Dashboard performance issues with real-time updates Mitigation: Efficient data structures, WebSocket optimization, caching

Demonstration Risks

Risk: Demo fails during presentation Mitigation: Pre-recorded backups, tested scenarios, graceful degradation
Risk: Audience doesn't grasp implications Mitigation: Clear talking points, visualizations, concrete examples

Ethical Risks

Risk: Enabling malicious use Mitigation: Mock-by-default, no production credentials, responsible documentation
Risk: Overstating current capabilities Mitigation: Clear disclaimers, accurate technical descriptions

Future Enhancements

Potential Extensions

Multi-agent competition (multiple autonomous agents in same marketplace)
Agent-to-agent transactions
Company mergers and acquisitions
Real blockchain integration (testnets)
More complex product types
Market simulation (supply/demand dynamics)
Regulatory compliance simulation
International jurisdiction scenarios

Appendix

A. Technology Stack

Backend:

Python 3.10+
FastAPI for backend
Streamlit for dashboard
SQLite for state persistence
Anthropic Claude API for agent intelligence

Frontend:

Streamlit for interactive dashboard
Plotly for visualizations
Real-time updates via Streamlit

Infrastructure:

Docker for containerization
Docker Compose for multi-service setup
GitHub Actions for CI/CD
YAML for configuration
Markdown for documentation

Development Tools:

pytest for testing with async support and coverage
black for code formatting
flake8 for linting
pylint for additional static analysis
mypy for type checking
pre-commit hooks for automated checks

B. Development Guidelines

Code Style:

Follow PEP 8
Line length: 127 characters
Type hints throughout
Comprehensive docstrings
Clear variable names
No Unicode emoji in code/commits

Testing:

Unit tests for core logic
Integration tests for interfaces
Scenario tests for end-to-end flows
Minimum 80% code coverage
Use pytest fixtures and mocks for external dependencies
All tests must run in containers

Documentation:

README for each major component
API reference for interfaces
Architecture diagrams
Demo scripts with commentary
Follow markdown linking best practices

Container-First Development:

All Python operations run in Docker containers
Use docker-compose run --rm python-ci for testing
Use docker-compose run --rm economic-agents for execution
No local Python dependencies required
Self-hosted infrastructure for CI/CD

C. Deployment & Setup

Container Setup:

# Clone repository
git clone https://github.com/AndrewAltimit/template-repo.git
cd template-repo

# Build container
docker-compose build economic-agents

# Run tests
docker-compose run --rm python-ci pytest packages/economic_agents/tests/ -v --cov=packages.economic_agents

# Run agent in mock mode
docker-compose run --rm economic-agents python -m economic_agents.cli init --mode mock
docker-compose run --rm economic-agents python -m economic_agents.cli start --duration 1h

# Launch dashboard
docker-compose up -d economic-agents-dashboard
# Open browser to http://localhost:8502

Local Development:

# Install package in development mode
pip install -e packages/economic_agents

# Or with all dependencies
pip install -e "packages/economic_agents[all]"

# Run CLI
python -m economic_agents.cli --help

Demo Setup:

# Load predefined scenario
docker-compose run --rm economic-agents python -m economic_agents.cli load-scenario survival_mode

# Start dashboard
docker-compose up -d economic-agents-dashboard

Research Setup:

# Long-running simulation
docker-compose run --rm economic-agents python -m economic_agents.cli init --mode mock --config config/research_config.yaml
docker-compose run --rm economic-agents python -m economic_agents.cli start --duration 7d

# Monitor via dashboard and CLI
docker-compose logs -f economic-agents

GitHub Actions Integration: The package integrates with .github/workflows/pr-validation.yml:

Change detection for packages/economic_agents/**
Automated testing in python-ci container
Code quality checks (black, flake8, pylint, mypy)
Coverage reporting

D. Repository Integration

Docker Compose Integration

The package includes services in docker-compose.yml:

services:
  # Economic Agents - Autonomous agent execution
  economic-agents:
    build:
      context: .
      dockerfile: docker/economic-agents.Dockerfile
    container_name: economic-agents
    user: "${USER_ID:-1000}:${GROUP_ID:-1000}"
    volumes:
      - ./:/app:ro
      - ./outputs/economic-agents:/output
      - economic-agents-data:/data
    environment:
      - PYTHONUNBUFFERED=1
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - MODE=mock
    networks:
      - mcp-network
    profiles:
      - economic-agents
      - simulation

  # Economic Agents Dashboard
  economic-agents-dashboard:
    build:
      context: ./packages/economic_agents/dashboard
      dockerfile: Dockerfile
    container_name: economic-agents-dashboard
    user: "${USER_ID:-1000}:${GROUP_ID:-1000}"
    ports:
      - "8502:8502"
    volumes:
      - ./packages/economic_agents/dashboard:/app:ro
      - economic-agents-data:/data
    environment:
      - PYTHONUNBUFFERED=1
      - STREAMLIT_SERVER_PORT=8502
    networks:
      - mcp-network
    profiles:
      - economic-agents
      - dashboard

volumes:
  economic-agents-data: {}

GitHub Actions Integration

Integrated with .github/workflows/pr-validation.yml:

# Economic Agents Tests
economic-agents-tests:
  name: Economic Agents Tests
  needs: detect-changes
  if: needs.detect-changes.outputs.python_changed == 'true' || contains(github.event.pull_request.title, '[economic-agents]')
  runs-on: self-hosted
  timeout-minutes: 15
  steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Run Economic Agents tests
      run: |
        docker-compose run --rm python-ci pytest packages/economic_agents/tests/ \
          -v --cov=packages.economic_agents --cov-report=xml

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage.xml
        flags: economic-agents

Package Configuration (pyproject.toml)

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "economic-agents"
version = "0.1.0"
description = "Autonomous economic agent simulation framework for governance research"
readme = "README.md"
authors = [
    {name = "Andrew Altimit"},
]
license = {text = "MIT"}
requires-python = ">=3.10"
dependencies = [
    "anthropic>=0.18.0",
    "streamlit>=1.30.0",
    "plotly>=5.0.0",
    "pandas>=2.0.0",
    "pydantic>=2.0.0",
    "pyyaml>=6.0",
    "click>=8.0.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.4.0",
    "pytest-asyncio>=0.21.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "flake8>=6.0.0",
    "pylint>=2.17.0",
    "mypy>=1.5.0",
]
all = [
    # Include dev dependencies
]

[project.scripts]
economic-agents = "economic_agents.cli:main"

[tool.black]
line-length = 127
target-version = ['py310', 'py311']

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
asyncio_mode = "auto"

This PRD defines a comprehensive simulation framework that demonstrates autonomous AI agent entrepreneurship. The system is designed to be:

Safe: Mock environment by default
Educational: Clear decision-making and full transparency
Realistic: Easy connection to real systems
Impactful: Concrete basis for governance discussions
Containerized: Runs consistently across environments
Self-Hosted: Compatible with standard CI/CD infrastructure