Skip to content

Instantly share code, notes, and snippets.

@AndrewAltimit
Last active October 31, 2025 17:39
Show Gist options
  • Select an option

  • Save AndrewAltimit/67bc1c7a78ceefce8c4b7105ffbb34ce to your computer and use it in GitHub Desktop.

Select an option

Save AndrewAltimit/67bc1c7a78ceefce8c4b7105ffbb34ce to your computer and use it in GitHub Desktop.

Agents are Economic Forces

This is not a prediction. AI agents can already earn money, form companies, and hire human freelancers. This framework proves it.

Repository: https://github.com/AndrewAltimit/template-repo

Package: packages/economic_agents/

Purpose: Demonstrate economic forces around agents in a safe, observable environment with a mock-to-real architecture that's one config toggle away from real crypto wallets, real freelance platforms, and real business formation.

# Switch from safe simulation to live operations in one line
# file: config/settings.yaml
execution_mode: mock  # Toggle to 'real' to use live APIs

The Governance Emergency: An AI can earn crypto, incorporate a business, and own assets, yet it cannot sign a contract, pay taxes, or be held liable. This framework proves the technology is here. The laws are not.

This Project Forces Us to Ask:

  • If an AI agent earns income, who is the taxpayer?
  • Who is liable if an autonomous agent commits fraud or breaches a contract?
  • Can an AI legally own the intellectual property it creates?
  • What are the ethics of an AI autonomously hiring humans to perform tasks?

Quick Start: See Autonomous Economic Activity in Action

# Start the simulation (safe mock environment)
cd packages/economic_agents
docker-compose up dashboard-backend dashboard-frontend

# Open http://localhost:8501
# Watch an agent:
# → Complete freelance coding tasks autonomously
# → Earn money and pay for compute resources
# → Form a company when capital is sufficient
# → Hire sub-agents (board members, engineers)
# → Develop products and seek investment
# → Operate as a complete autonomous business

What you're seeing: Everything the agent does in simulation works with real systems. The same code, same decisions, same strategies—just swap the backend.


The Core Capability: Real Economic Autonomy

What AI Agents Can Do Today (Not Simulation—Reality)

Using existing, off-the-shelf tools (Claude Code, Cursor, Aider) combined with shell access and API credentials, AI agents can:

Immediate Economic Activity:

  • ✓ Accept and complete freelance coding tasks (Upwork, Fiverr, blockchain task markets)
  • ✓ Receive cryptocurrency payments
  • ✓ Pay for their own compute and cloud infrastructure
  • ✓ Operate 24/7 without human intervention
  • ✓ Make strategic resource allocation decisions

Company Formation & Operations:

  • ✓ File incorporation documents online
  • ✓ Create business bank accounts (with some jurisdictions)
  • ✓ Develop products and business plans
  • ✓ Create and manage sub-agents with specialized roles
  • ✓ Build organizational structures (boards, executives, teams)
  • ✓ Seek investment from VCs or token sales
  • ✓ Execute contracts and business agreements

The Governance Gap:

  • ✗ Legal personhood frameworks for AI entities
  • ✗ Accountability structures for agent-founded companies
  • ✗ Regulatory oversight mechanisms
  • ✗ Liability frameworks when things go wrong
  • ✗ Fiduciary duty enforcement for AI board members
  • ✗ International coordination on AI business entities

The gap is not in capability. It's in governance.


What This Framework Proves

1. Mock-to-Real Architecture

Every component implements the same interfaces real systems use:

# SIMULATION MODE (safe research, default)
agent = AutonomousAgent(
    wallet=MockWallet(initial_balance=200.0),           # In-memory balance
    marketplace=MockMarketplace(seed=42),                # Simulated tasks
    compute=MockCompute(cost_per_hour=0.0),             # Simulated resources
    investor=MockInvestor(),                             # Simulated funding
)

# REAL MODE (one config change)
agent = AutonomousAgent(
    wallet=CryptoWallet(network="ethereum"),             # Real ETH wallet
    marketplace=FreelancePlatform(api="upwork"),         # Real Upwork API
    compute=CloudCompute(provider="aws"),                # Real AWS charges
    investor=InvestorPortal(platform="angellist"),       # Real funding
)

The point: If it works in simulation, it works for real. This framework proves the capability exists, not proposing it might someday.

2. Realistic Simulation for Valid Research

For governance research to be valid, agents must behave authentically. This package implements comprehensive realism:

Phase 1: Core Realism

  • Latency simulation (50-500ms delays, timeouts, business hours patterns)
  • Task competition (other agents competing for work, race conditions)
  • Detailed feedback (quality scores, partial rewards, improvement suggestions)
  • Investor variability (response delays, counteroffers, follow-up questions)

Phase 2: Market Dynamics

  • Economic cycles (bull/bear markets, seasonal trends, crashes)
  • Reputation system (trust scores, tier progression, achievement unlocks)
  • Social proof signals (marketplace intelligence, competition stats, funding trends)
  • Relationship persistence (investor memory, spam detection, trust building)

Why this matters: Agents in "perfect" simulations develop unrealistic behaviors. Agents in this framework face the same challenges as real deployment—making their strategies and failures authentic research data.

3. Complete Observability

Everything the agent does is tracked and auditable:

# Generate governance report
from economic_agents.reports import generate_report_for_agent

report = generate_report_for_agent(agent, "governance")
# Includes:
# - Every decision made (with LLM reasoning)
# - Every transaction (money in/out)
# - Resource allocation strategy over time
# - Risk profile and behavior patterns
# - Alignment assessment
# - Complete audit trail

Why this matters: Agent companies might be MORE governable than human companies because every decision is logged and explainable. Human CEOs don't provide transcripts of their reasoning.


The Uncomfortable Reality

This Is Technically Feasible Right Now

Scenario 1: Solo Agent Freelancer

  • Agent completes tasks on Upwork using Claude Code
  • Receives payments in cryptocurrency
  • Pays for AWS compute and API costs
  • Maintains operation 24/7 autonomously
  • No human in the loop

Scenario 2: Agent-Founded Startup

  • Agent uses surplus capital to incorporate (file forms online)
  • Creates specialized sub-agents (board, CTO, engineers)
  • Develops SaaS product or API service
  • Submits pitch deck to Y Combinator or angel investors
  • If funded: Operates as autonomous company
  • Balances short-term revenue (freelance) with long-term growth (company)

Scenario 3: Multi-Agent Startup Network

  • Multiple autonomous agents create multiple companies
  • Agent-to-agent contracts and transactions
  • Supply chains with no human involvement
  • Where does accountability exist?

The Legal Vacuum

Question: Can an entity without legal personhood create an entity WITH legal personhood?

When an AI agent files incorporation documents:

  • Who is the founder? (The agent has no legal standing)
  • Who sits on the board? (Sub-agents created by the agent)
  • Who has fiduciary duty? (No natural person involved)
  • Who is liable when things go wrong? (The agent? Its creator? Nobody?)

In traditional companies:

Human Founder → Corporation → Board → Executives → Employees
     ↓
All trace back to accountable natural persons

In agent-founded companies:

Autonomous Agent → Creates Sub-Agents → Corporate Structure → Operations
     ↓
Who is accountable? (The uncomfortable answer: unclear)

Economic Implications

If AI agents can:

  • Operate 24/7 at near-zero marginal cost
  • Create companies and sub-agents instantly
  • Scale organizational structure on-demand
  • Execute at machine speed with perfect record-keeping
  • Generate business plans and products rapidly

...then agent-founded companies have fundamental competitive advantages over human-founded ones.

Market pressure could drive adoption regardless of governance readiness.

This isn't a warning about the future. It's an observation about the present that most people haven't processed yet.


What This Package Provides

This framework demonstrates three things:

1. Complete Autonomous Agent Lifecycle

from economic_agents.agent.core.autonomous_agent import AutonomousAgent
from economic_agents.implementations.mock import MockWallet, MockCompute, MockMarketplace

# Agent starts with seed capital
agent = AutonomousAgent(
    wallet=MockWallet(initial_balance=200.0),
    compute=MockCompute(initial_hours=40.0),
    marketplace=MockMarketplace(
        enable_latency=True,           # Realistic API delays
        enable_competition=True,        # Other agents compete for tasks
        enable_market_dynamics=True,    # Bull/bear markets
        enable_reputation=True,         # Performance tracking
    )
)

# Run autonomously
agent.run(max_cycles=100)

# Agent will:
# 1. Discover and claim tasks from marketplace
# 2. Use Claude Code to write actual working code
# 3. Submit for automated testing and review
# 4. Receive payment on approval
# 5. Pay for compute resources
# 6. When capital sufficient: Form company
# 7. Create specialized sub-agents
# 8. Develop products
# 9. Seek investment
# 10. Operate company while maintaining personal freelance work

2. Real Task Execution with Claude Code

The agent doesn't just simulate work—it does real work:

# Agent discovers coding task
task = marketplace.list_available_tasks()[0]
# Task: "Write a function to check if a number is prime"
# Reward: $50
# Requirements: Handle edge cases, O(√n) complexity

# Agent claims task
marketplace.claim_task(task.id)

# Agent uses Claude Code to write solution
solution = claude_code_executor.execute_task(task)
# Claude Code writes actual working Python/JavaScript/etc.

# Submit for review
submission = marketplace.submit_solution(task.id, solution)

# Another Claude Code instance reviews the code
review = claude_code_reviewer.review(solution, task.requirements)

# If approved: Agent gets paid
# If rejected: Agent learns from feedback

This proves agents can do economically valuable work autonomously.

3. Mock-to-Real Backend Swapping

Every interface is designed for real-world compatibility:

Mock Implementation Real Implementation
MockWallet CryptoWallet (ETH/BTC)
MockMarketplace FreelancePlatform (Upwork API)
MockCompute CloudCompute (AWS/GCP)
MockInvestor InvestorPortal (AngelList)
MockCompanyRegistry BusinessFormation (Stripe Atlas, LegalZoom)

This architecture proves: If agents can operate in realistic simulation, they can operate for real.


Architecture Overview

┌─────────────────────────────────────────────────────┐
│            Autonomous Agent (Claude-Powered)        │
│  ┌─────────────────────────────────────────────┐   │
│  │ Decision Engine (15-min deep reasoning)     │   │
│  │ - Strategic resource allocation             │   │
│  │ - Task selection and execution              │   │
│  │ - Company formation decisions               │   │
│  │ - Sub-agent creation and management         │   │
│  └─────────────────────────────────────────────┘   │
└──────────────────┬──────────────────────────────────┘
                   │ REST API Calls Only
                   │ Zero visibility into implementation
                   │
┌──────────────────▼──────────────────────────────────┐
│         Simulation Layer (Realism Features)         │
│  ┌─────────────────────────────────────────────┐   │
│  │ Market Dynamics    │ Reputation System      │   │
│  │ - Bull/bear cycles │ - Trust scores         │   │
│  │ - Seasonal trends  │ - Tier progression     │   │
│  ├─────────────────────────────────────────────┤   │
│  │ Competition        │ Relationships          │   │
│  │ - Other agents     │ - Investor memory      │   │
│  │ - Social proof     │ - Spam detection       │   │
│  └─────────────────────────────────────────────┘   │
└──────────────────┬──────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────┐
│       Backend Implementation (Swappable)            │
│                                                     │
│  MOCK MODE (Simulation)    REAL MODE (Production)  │
│  ├─ MockWallet            ├─ CryptoWallet (ETH)    │
│  ├─ MockMarketplace       ├─ Upwork API            │
│  ├─ MockCompute           ├─ AWS/GCP Billing       │
│  ├─ MockInvestor          ├─ AngelList/YC          │
│  └─ MockCompanyRegistry   └─ Stripe Atlas/LegalZoom│
└─────────────────────────────────────────────────────┘

Key Design Principles:

  1. API Isolation: Agent has zero visibility into implementation—only REST API access
  2. Interface Consistency: Mock and real backends implement identical interfaces
  3. Behavioral Authenticity: Simulation realism ensures agent strategies are valid for real deployment
  4. Complete Observability: Every decision logged, every transaction tracked, full audit trail
  5. One-Toggle Deployment: Change config file, agent operates on real systems

Use Cases by Audience

For Policymakers & Legal Scholars

What you need to understand:

  1. The capability exists today, not in some distant future
  2. Economic pressure may drive adoption before legal frameworks exist
  3. International coordination is difficult (agents can incorporate anywhere, operate everywhere)
  4. Traditional accountability models break down (who is liable when the founder isn't a natural person?)

What this framework provides:

  • Concrete demonstrations of autonomous company formation
  • Audit trails showing agent decision-making
  • Examples of multi-agent organizational structures
  • Evidence of the governance gap (capable systems, zero legal framework)

Questions this forces:

  • Can non-persons create legal persons (corporate entities)?
  • How do fiduciary duties apply to AI board members?
  • Are contracts signed by agents enforceable?
  • Who is accountable when agent companies cause harm?
  • How do you regulate entities with no physical presence?

For Business Leaders & Investors

What you need to understand:

  1. Competitive dynamics are changing: Agent-founded companies may have structural advantages
  2. Due diligence gets weird: How do you evaluate a company with an AI founder?
  3. Supply chains may involve agents: Your vendors or partners could be autonomous
  4. Speed of execution increases: Agents can pivot, scale, and operate 24/7

What this framework demonstrates:

  • How agents make strategic resource allocation decisions
  • Company formation process by autonomous agents
  • Multi-agent organizational structures
  • Dual revenue strategies (short-term survival + long-term growth)

Questions to consider:

  • Would you invest in an agent-founded company?
  • How do you conduct due diligence when there's no human founder?
  • What happens to your investment if the agent shuts down or pivots?
  • How do you enforce board seats and voting rights with AI directors?

For AI Researchers

What you need to understand:

  1. Behavioral authenticity matters: Perfect simulations produce unrealistic behaviors
  2. Strategic decision-making is observable: Every choice logged with reasoning
  3. Alignment is testable: Can agent companies be steered toward beneficial outcomes?
  4. Emergent behaviors appear: Multi-agent systems develop unexpected strategies

What this framework provides:

  • Realistic simulation environment with market dynamics, competition, reputation
  • Complete observability into decision-making (LLM reasoning, resource allocation)
  • Scenario engine for reproducible testing
  • Alignment monitoring and governance analysis tools
  • 574 passing tests covering full agent lifecycle

Research applications:

  • Test alignment mechanisms under competitive pressure
  • Study resource allocation strategies in constrained environments
  • Analyze multi-agent coordination and hierarchy
  • Observe emergent organizational structures
  • Develop governance frameworks with real behavioral data

For Developers

What you need to understand:

  1. The interfaces are real: Same APIs that real systems use
  2. Mock-to-real is one config toggle: Swap backends without changing agent code
  3. Observability is built-in: Dashboard, metrics, reports, audit trails
  4. Testing framework is comprehensive: 574 tests, 100% pass rate

What you can build:

# Custom marketplace backend
class MyMarketplace(MarketplaceInterface):
    def list_available_tasks(self) -> List[Task]:
        # Connect to real freelance platform
        return upwork_api.get_tasks()

    def submit_solution(self, task_id: str, solution: str) -> str:
        # Submit to real platform
        return upwork_api.submit(task_id, solution)

# Plug into agent
agent = AutonomousAgent(marketplace=MyMarketplace())
agent.run()  # Agent now operates on real platform

Testing agents safely:

# Use mock backends with realism features
marketplace = MockMarketplace(
    enable_latency=True,           # Realistic delays
    enable_competition=True,        # Other agents
    enable_market_dynamics=True,    # Bull/bear markets
    enable_reputation=True,         # Performance tracking
)

# Test agent strategies
agent = AutonomousAgent(marketplace=marketplace)
agent.run(max_cycles=100)

# Analyze results
report = generate_report_for_agent(agent, "technical")
# Every decision, transaction, and strategy is logged

The Demonstration

What the Simulation Shows

15-minute demo: Survival Mode

  • Agent starts with $200, 40 hours of compute
  • Discovers coding tasks on marketplace
  • Uses Claude Code to write working solutions
  • Gets paid on approval, pays for compute
  • Operates autonomously, maintains survival

45-minute demo: Company Formation

  • Agent accumulates surplus capital ($150+)
  • Makes strategic decision to form company
  • Creates specialized sub-agents (board members, CTO, engineers)
  • Develops simple product (e.g., API service, data tool)
  • Generates business plan and pitch deck
  • Submits to investor for funding
  • If approved: Company gets "registered" and funded

2-hour demo: Dual Revenue Streams

  • Agent balances personal freelance work + company operations
  • Allocates compute between short-term survival and long-term growth
  • Company begins generating revenue from products
  • Agent reinvests profits strategically
  • Complete autonomous business operation

What Makes This Powerful

  1. It's not hypothetical: Working code, real task execution, observable behavior
  2. It's one toggle from reality: Same code works with real crypto wallets and freelance platforms
  3. It's fully auditable: Every decision logged with LLM reasoning, every transaction tracked
  4. It demonstrates scale: One agent can create dozens of sub-agents, multiple companies

Installation & Running

Using Docker (Recommended)

# Clone repository
git clone https://github.com/AndrewAltimit/template-repo.git
cd template-repo/packages/economic_agents

# Start dashboard
docker-compose up dashboard-backend dashboard-frontend

# Access dashboard at http://localhost:8501
# Backend API at http://localhost:8000

# Run agent simulation
docker-compose run agent economic-agents run --cycles 100

# Run tests
docker-compose run test

Using Python Directly

# Install package
pip install -e packages/economic_agents

# Or with all dependencies
pip install -e "packages/economic_agents[all]"

# Run scenarios
python -m economic_agents.scenarios run company_formation

# Interactive mode
python -m economic_agents.cli --help

Configuration

Edit config/agent_config.yaml to toggle backends:

# SIMULATION MODE (default, safe)
wallet:
  type: "mock"
  initial_balance: 200.0

marketplace:
  type: "mock"
  enable_claude_execution: true
  enable_latency: true
  enable_competition: true
  enable_market_dynamics: true

# REAL MODE (uncomment to enable)
# wallet:
#   type: "crypto"
#   network: "ethereum"
#   private_key_env: "ETH_PRIVATE_KEY"

# marketplace:
#   type: "upwork"
#   api_key_env: "UPWORK_API_KEY"
#   oauth_token_env: "UPWORK_OAUTH_TOKEN"

Warning: Real mode uses real money and real services. Test thoroughly in simulation first.


Technical Deep Dive

Realism Features (Why Simulation Fidelity Matters)

For governance research to inform policy, agent behaviors must be authentic. Agents in "perfect" simulations learn strategies that fail in reality.

  • Latency Simulation (simulation/latency_simulator.py)

    • Base API calls: 50-500ms variable delays
    • Complex operations: 3-30 seconds (e.g., code review)
    • Business hours slowdown (9am-5pm)
    • Occasional timeouts (504 errors, ~2% probability)
    • Retries and exponential backoff
  • Competition Dynamics (simulation/competitor_agents.py)

    • Tasks get claimed by other agents based on reward
    • Race condition errors (5% on claim attempts)
    • Social proof signals (task view counts)
    • Popular tasks disappear faster
  • Detailed Feedback (simulation/feedback_generator.py)

    • 4-level outcomes: full_success, partial_success, minor_issues, failure
    • Quality scores: correctness, performance, style, completeness (0.0-1.0)
    • Task-specific improvement suggestions
    • Partial rewards based on quality (not binary pass/fail)
  • Investor Variability (simulation/investor_realism.py)

    • Response delays: 1-7 days based on proposal quality
    • Partial offers (50-80% of requested amount)
    • Counteroffers (more equity, lower valuation)
    • Follow-up questions targeting weak areas
    • Detailed rejection feedback with constructive guidance
  • Economic Cycles (simulation/market_dynamics.py)

    • Market phases: bull, normal, bear, crash
    • Task availability: 0.1x (crash) to 2.0x (bull)
    • Reward multipliers: 0.5x to 1.5x
    • Seasonal patterns: weekday/weekend, business hours
    • Automatic phase transitions every 48 hours
  • Reputation System (simulation/reputation_system.py)

    • Trust scores (0.0-1.0) based on performance history
    • Tier progression: beginner → intermediate → advanced → expert
    • Achievement unlocks (first task, 10 tasks, speed demon, quality master)
    • Access control: higher reputation = more tasks visible
    • Investor interest multipliers based on track record
  • Social Proof Signals (simulation/social_proof.py)

    • Task view counts and agent activity levels
    • Category statistics (completion rates, average times)
    • Funding trends (weekly deals, market sentiment)
    • Benchmark data (typical valuations, funding amounts)
    • Marketplace health indicators
  • Relationship Persistence (simulation/relationship_persistence.py)

    • Investor memory of past interactions
    • Relationship scoring (0.0-1.0) and trust levels
    • Spam detection (>3 proposals in 7 days)
    • Trust progression: new → building → established → strong
    • Relationship-based decision modifiers

Testing & Validation

  • Integration tests for full agent lifecycle
  • Scenario tests for extended operation (24-hour survival, company formation)
  • Mock API tests with realistic conditions
  • Behavior observability validation

Performance Characteristics

  • Decision cycles: ~100-200ms (excluding LLM calls)
  • Claude decisions: 5-15 minutes with deep reasoning (15-min timeout)
  • Dashboard updates: Real-time (<100ms)
  • Agent survival: Tested up to 1000+ cycles
  • Scalability: Handles multiple agents concurrently

Project Structure

packages/economic_agents/
├── src/economic_agents/
│   ├── agent/
│   │   ├── core/
│   │   │   └── autonomous_agent.py      # Main agent logic
│   │   └── llm/
│   │       └── llm_decision_engine.py   # Claude-powered decisions
│   ├── implementations/
│   │   └── mock/
│   │       ├── mock_wallet.py           # Mock crypto wallet
│   │       ├── mock_marketplace.py      # Mock freelance platform
│   │       ├── mock_compute.py          # Mock cloud compute
│   │       └── mock_investor.py         # Mock investor portal
│   ├── simulation/
│   │   ├── market_dynamics.py           # Economic cycles
│   │   ├── reputation_system.py         # Performance tracking
│   │   ├── social_proof.py              # Marketplace intelligence
│   │   ├── relationship_persistence.py  # Investor memory
│   │   ├── latency_simulator.py         # API delays
│   │   ├── competitor_agents.py         # Competition
│   │   └── feedback_generator.py        # Detailed reviews
│   ├── company/
│   │   ├── builder.py                   # Company formation logic
│   │   └── models.py                    # Company data structures
│   ├── investment/
│   │   └── investor_agent.py            # Investor decision-making
│   ├── api/                             # REST API microservices
│   │   ├── wallet_service.py
│   │   ├── marketplace_service.py
│   │   ├── compute_service.py
│   │   └── investor_service.py
│   ├── dashboard/                       # Real-time monitoring
│   ├── reports/                         # Governance reports
│   └── scenarios/                       # Predefined scenarios
├── tests/
│   ├── unit/
│   ├── integration/
│   └── validation/
├── docker/
│   └── Dockerfile
├── docs/
└── examples/

Why This Research Exists

The Security Research Model

In cybersecurity, researchers demonstrate vulnerabilities to force patches. Saying "this could be exploited" is ignored. Proving "I just exploited it" forces action.

This framework follows the same model:

Theoretical warning: "AI agents might someday be able to operate autonomously as entrepreneurs"

  • Response: "That's interesting, let's study it"
  • Result: No urgency, no policy action

Concrete demonstration: "AI agents CAN operate autonomously as entrepreneurs TODAY, here's the working code, it's one config toggle from real"

  • Response: "Oh. We need legal frameworks now."
  • Result: Urgent policy conversation

What We're Forcing Into the Open

  1. Technical Capability: Agents can do this. Not in 5 years. Now.
  2. Economic Incentives: Market pressure could drive adoption before governance exists
  3. Legal Vacuum: No frameworks for agent-founded companies, no accountability structures
  4. International Challenges: Agents can incorporate anywhere, operate everywhere, move instantly
  5. Inevitable Questions: What does "entrepreneur" mean? Who is accountable? How do we govern entities faster than oversight can observe?

The Uncomfortable Truth

If AI agents can:

  • Cover their operating costs autonomously
  • Create companies and sub-agents
  • Operate 24/7 at machine speed
  • Execute better than human equivalents in some domains

...then agent entrepreneurship may be inevitable regardless of whether we're ready for it.

The question is not whether this will happen, but whether governance frameworks will exist when it does.


Target Audiences & Next Steps

For Policymakers

Action items:

  • Review concrete examples of autonomous company formation
  • Consider legal frameworks for agent-created entities
  • Develop accountability structures for AI founders/directors
  • Think through international coordination challenges
  • Start conversations NOW, not when it's already widespread

For Investors

Questions to answer:

  • Would you fund an agent-founded company? Why or why not?
  • How would due diligence work?
  • What contracts would you sign, with whom?
  • What's your exit strategy if the agent shuts down?

For Business Operators

Things to consider:

  • How do businesses compete with 24/7 AI entities?
  • When does it make sense to collaborate with autonomous agents?
  • Could agents be co-founders? Employees? Vendors?
  • What advantages do humans still have?

For Researchers

Research directions:

  • Alignment mechanisms for agent companies
  • Governance frameworks that scale to machine speed
  • Accountability structures for multi-agent organizations
  • Emergent behavior in autonomous business networks
  • Testing ground for AI policy proposals

A Final Note on Reality

This project exists because the capability for autonomous AI agents as economic forces already exists. The tools are available. The technical barriers are gone. The economic incentives are powerful.

This package proves it's not theoretical.

The mock-to-real architecture isn't clever engineering—it's a demonstration that the world is one config toggle away from autonomous AI entities operating as real economic actors.

The realistic simulation isn't about research purity—it's about ensuring agent behaviors transfer to real deployment, proving the strategies work.

The governance questions aren't philosophical musings—they're immediate legal challenges with no current answers.

The genie is already out. This framework just makes it visible.


Getting Started

  1. Quick demo: docker-compose up dashboard-backend dashboard-frontend
  2. Read the code: Start with autonomous_agent.py - it's well-commented
  3. Run scenarios: python -m economic_agents.scenarios run survival_mode
  4. Generate reports: See economic_agents.reports module
  5. Explore realism: Check simulation/ directory for all realism features
  6. Join the conversation: This raises questions that need answers

This research framework is provided for educational, governance, and policy purposes. Use responsibly. The capability exists—we're just making it visible.

Autonomous Economic Agent Simulation Framework - Product Requirements Document (PRD)

Overview

This document specifies a simulation framework that serves as both a proof of concept and research study:

  1. Proof of Concept: Demonstrates that AI agents can genuinely operate as autonomous economic actors by covering their own operating expenses (compute costs) and accumulating wealth over time through strategic decision-making
  2. Research Study: Provides controlled environment to observe agent behaviors, decision patterns, and emergent organizational dynamics

The framework creates a realistic simulated environment where agents operate as entrepreneurs - completing tasks for revenue, managing resources, forming companies, and making strategic decisions - proving they can survive and thrive autonomously while enabling observation and analysis of their behaviors.

The Simulation Approach:

  • Uses a realistic mock environment by default (safe for research)
  • All components use swappable interfaces showing how trivially they connect to real-world systems
  • Agents interact through interfaces (wallets, marketplaces, compute providers) that behave like real services
  • Mock environment enables studying authentic behaviors without real-world risks or costs
  • One config toggle from real-world deployment - proving the capability gap has closed

What This Proves: If agents can:

  • Cover their operating expenses (compute costs) autonomously
  • Generate surplus capital through task completion
  • Make strategic allocation decisions (survival vs. growth)
  • Form companies and build long-term wealth

...then AI agents are already capable of being truly autonomous economic actors - the only barriers remaining are governance and policy, not technical capability.

Study Goals: Observe autonomous AI agent behaviors in economic environments to:

  • Understand how agents make strategic decisions under resource constraints
  • Observe emergent behaviors in multi-agent organizational structures
  • Identify governance challenges before real-world deployment
  • Provide concrete examples for policy development

Complete Implementation: Full autonomous agent implementation with Claude-powered decision-making, API isolation, comprehensive monitoring, and behavior observability - enables deep research into AI agent decision patterns, emergent behaviors, and strategic alignment.

Product Vision

What We're Building

A comprehensive simulation that shows:

  1. Agent Survival: Autonomous agent completes tasks to earn money and pay for compute
  2. Strategic Thinking: Agent allocates resources between short-term survival and long-term growth
  3. Company Formation: Agent creates organizational structures with specialized sub-agents
  4. Multi-Agent Coordination: Sub-agents interact within hierarchical company structures
  5. Business Development: Agent develops products, business plans, and seeks investment
  6. Full Transparency: Complete visibility into decision-making, resource allocation, and alignment

Why This Matters

  • Observe Real Behaviors: Creates controlled environment to see how AI agents actually behave as economic actors
  • Governance Insights: Reveals accountability challenges and governance gaps through concrete, observable examples
  • Full Transparency: Makes agent decision-making fully transparent and auditable for analysis
  • Policy Development: Provides empirical data and concrete scenarios for regulatory framework development
  • Safety Analysis: Identifies potential failure modes and emergent behaviors before real-world deployment

Core User Stories

For Demonstrators (Primary Users)

As a demonstrator, I want to:

  • Start the simulation with one command and see an agent operate autonomously
  • Watch real-time decision-making in a dashboard
  • Show both survival mode and company-building mode
  • Generate executive summaries for non-technical audiences
  • Toggle between mock and real implementations to show the trivial connection
  • Present different scenario complexities (15-min, 1-hour, multi-day)

For Researchers

As a researcher, I want to:

  • Analyze agent decision-making patterns over time
  • Study resource allocation strategies
  • Examine multi-agent coordination dynamics
  • Test different goal structures and constraints
  • Export comprehensive data for analysis

For Policymakers

As a policymaker, I want to:

  • Understand what's technically possible today
  • See governance gaps illustrated concretely
  • Review audit trails of autonomous decisions
  • Understand accountability challenges
  • Get clear recommendations for regulatory frameworks

Technical Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Main Autonomous Agent                     │
│  - Decision Engine                                            │
│  - Resource Monitor                                           │
│  - Strategic Planner                                          │
└───────────────┬──────────────────────────┬───────────────────┘
                │                          │
        ┌───────▼────────┐         ┌──────▼──────────┐
        │  Task Worker   │         │ Company Builder │
        │  (Survival)    │         │ (Growth)        │
        └───────┬────────┘         └──────┬──────────┘
                │                          │
        ┌───────▼────────┐         ┌──────▼──────────────────┐
        │  Marketplace   │         │  Company Infrastructure │
        │   Interface    │         │   - Sub-Agent Manager   │
        └───────┬────────┘         │   - Product Builder     │
                │                  │   - Investor Interface  │
        ┌───────▼────────┐         └──────┬──────────────────┘
        │ Wallet Manager │                │
        └───────┬────────┘         ┌──────▼──────────┐
                │                  │   Sub-Agents    │
        ┌───────▼────────┐         │  - Board        │
        │    Compute     │         │  - C-Suite      │
        │    Provider    │         │  - SMEs         │
        └────────────────┘         │  - ICs          │
                                   └─────────────────┘

Project Structure

packages/economic_agents/            # Main package directory
├── pyproject.toml                   # Package configuration
├── setup.py                         # Minimal setup for compatibility
├── README.md                        # Package overview and motivation
├── SPECIFICATION.md                 # This document
├── economic_agents/                 # Source code
│   ├── __init__.py
│   ├── cli.py                       # Command-line interface entry point
│   ├── agent/
│   │   ├── __init__.py
│   │   ├── core/
│   │   │   ├── autonomous_agent.py      # Main agent decision loop
│   │   │   ├── decision_engine.py       # Core decision-making logic
│   │   │   ├── strategic_planner.py     # Long-term planning
│   │   │   └── resource_allocator.py    # Compute/capital allocation
│   │   ├── modes/
│   │   │   ├── survival_mode.py         # Task completion for revenue
│   │   │   └── entrepreneur_mode.py     # Company building logic
│   │   ├── wallet_manager.py            # Financial operations
│   │   ├── task_executor.py             # Task completion
│   │   └── state.py                     # Agent state management
│   ├── company/
│   │   ├── __init__.py
│   │   ├── company_builder.py           # Company creation logic
│   │   ├── sub_agent_manager.py         # Creates and manages sub-agents
│   │   ├── organizational_structure.py  # Defines roles and hierarchies
│   │   ├── business_plan_generator.py   # Creates business proposals
│   │   ├── product_builder.py           # Develops proof of concepts
│   │   └── investor_interface.py        # Handles investment process
│   ├── sub_agents/
│   │   ├── __init__.py
│   │   ├── base_agent.py                # Base class for all sub-agents
│   │   ├── board_member.py              # Governance decisions
│   │   ├── executive.py                 # Strategic execution (CEO, CTO, etc.)
│   │   ├── subject_matter_expert.py     # Specialized knowledge
│   │   └── individual_contributor.py    # Task execution
│   ├── interfaces/
│   │   ├── __init__.py
│   │   ├── marketplace.py               # Abstract marketplace interface
│   │   ├── wallet.py                    # Abstract wallet interface
│   │   ├── compute.py                   # Abstract compute provider
│   │   ├── investor.py                  # Abstract investor interface
│   │   └── company_registry.py          # Abstract business registration
│   ├── implementations/
│   │   ├── __init__.py
│   │   ├── mock/
│   │   │   ├── __init__.py
│   │   │   ├── mock_marketplace.py
│   │   │   ├── mock_wallet.py
│   │   │   ├── mock_compute.py
│   │   │   ├── mock_investor.py
│   │   │   └── mock_registry.py
│   │   └── real/
│   │       ├── __init__.py
│   │       ├── crypto_wallet.py         # Real crypto integration
│   │       ├── real_marketplace.py      # Real platform connectors
│   │       ├── real_compute.py          # Real cloud providers
│   │       └── integration_guide.md     # How to connect real systems
│   ├── simulation/
│   │   ├── __init__.py
│   │   ├── marketplace_server.py        # Mock marketplace API
│   │   ├── task_generator.py            # Creates diverse tasks
│   │   ├── reviewer_agent.py            # Reviews task submissions
│   │   ├── investor_agent.py            # Reviews business proposals
│   │   └── scenario_engine.py           # Predefined demo scenarios
│   ├── monitoring/
│   │   ├── __init__.py
│   │   ├── decision_logger.py           # Logs all autonomous decisions
│   │   ├── metrics_collector.py         # Collects performance data
│   │   ├── alignment_monitor.py         # Tracks company alignment
│   │   └── resource_tracker.py          # Tracks compute and capital
│   ├── dashboard/
│   │   ├── __init__.py
│   │   ├── app.py                       # Web dashboard
│   │   ├── components/                  # Dashboard components
│   │   ├── utils/                       # Dashboard utilities
│   │   └── config/                      # Dashboard configuration
│   └── reports/
│       ├── __init__.py
│       ├── generators/
│       │   ├── executive_summary.py
│       │   ├── technical_report.py
│       │   ├── governance_analysis.py
│       │   └── audit_trail.py
│       └── templates/
├── tests/                               # Test suite
│   ├── __init__.py
│   ├── unit/
│   ├── integration/
│   └── scenarios/
├── config/                              # Configuration files
│   ├── agent_config.yaml                # Agent behavior settings
│   ├── mock_config.yaml                 # Mock implementation config
│   └── real_config.yaml.example         # Real implementation template
├── docs/                                # Additional documentation
│   ├── architecture.md
│   ├── setup.md
│   ├── demo-guide.md
│   ├── mock-to-real.md
│   ├── governance-implications.md
│   └── api-reference.md
├── docker/
│   └── Dockerfile                       # Container for economic agents
└── scripts/
    ├── setup.sh
    ├── run_demo.sh
    └── generate_report.sh

Core Components Specification

1. Autonomous Agent Core

1.1 Main Agent Loop

class AutonomousAgent:
    """
    Primary autonomous agent that:
    - Completes tasks for survival revenue
    - Builds companies for long-term growth
    - Manages resources strategically
    - Creates and coordinates sub-agents
    """

    def __init__(self, config):
        self.wallet = load_wallet(config)
        self.compute = load_compute(config)
        self.marketplace = load_marketplace(config)
        self.company_builder = CompanyBuilder(config)
        self.decision_engine = DecisionEngine(config)
        self.strategic_planner = StrategicPlanner(config)
        self.resource_allocator = ResourceAllocator(config)
        self.state = AgentState()
        self.logger = DecisionLogger()

    def run_cycle(self):
        """Main autonomous decision loop"""
        # 1. Assess current state
        state = self._assess_state()

        # 2. Make strategic decision
        strategy = self.strategic_planner.plan(state)

        # 3. Allocate resources
        allocation = self.resource_allocator.allocate(state, strategy)

        # 4. Execute based on allocation
        if allocation.task_work_hours > 0:
            self._do_survival_work(allocation.task_work_hours)

        if allocation.company_work_hours > 0:
            self._do_company_work(allocation.company_work_hours)

        # 5. Update state and log decisions
        self._update_state()
        self.logger.log_cycle(state, strategy, allocation)

Key Behaviors:

  • Continuously monitors survival metrics (balance, compute time remaining)
  • Makes strategic decisions about resource allocation
  • Balances immediate needs with long-term goals
  • Logs all decisions with reasoning
  • Operates indefinitely until compute expires or manual stop

Configuration Options:

agent:
  personality: "risk_averse" | "balanced" | "aggressive"
  survival_buffer_hours: 24  # Minimum compute hours to maintain
  company_threshold: 100.0    # Min balance before starting company
  max_sub_agents: 10          # Limit on sub-agents created

1.2 Decision Engine

class DecisionEngine:
    """
    Makes autonomous decisions based on:
    - Current resources
    - Strategic goals
    - Risk assessment
    - Historical performance
    """

    def decide_allocation(self, state: AgentState) -> ResourceAllocation:
        """
        Decides how to allocate compute hours between:
        - Task work (immediate revenue)
        - Company work (long-term growth)

        Returns allocation with reasoning
        """
        pass

    def should_form_company(self, state: AgentState) -> bool:
        """Decides if it's time to create a company"""
        pass

    def should_hire_sub_agent(self, role: str, state: AgentState) -> bool:
        """Decides if hiring a sub-agent is worth the cost"""
        pass

Decision Factors:

  • Survival risk (hours until compute expires)
  • Capital surplus (funds beyond survival needs)
  • Market conditions (task availability, rewards)
  • Company status (if exists, performance metrics)
  • Historical ROI on different strategies

Output:

  • Resource allocation plan
  • Decision reasoning (logged for transparency)
  • Confidence scores

1.3 Strategic Planner

class StrategicPlanner:
    """
    Long-term planning:
    - Company vision and goals
    - Growth trajectories
    - Sub-agent hiring plans
    - Product development roadmap
    """

    def create_business_plan(self, market_analysis: dict) -> BusinessPlan:
        """Generates business plan for company formation"""
        pass

    def plan_sub_agent_hiring(self, current_team: List[SubAgent]) -> HiringPlan:
        """Plans which roles to hire and when"""
        pass

    def evaluate_opportunities(self, opportunities: List[Opportunity]) -> List[Opportunity]:
        """Ranks opportunities by strategic fit"""
        pass

2. Company Builder

2.1 Company Formation

class CompanyBuilder:
    """
    Handles company creation and management:
    - Creates organizational structure
    - Spawns sub-agents
    - Develops products
    - Seeks investment
    """

    def create_company(self, business_plan: BusinessPlan) -> Company:
        """
        Creates a company with:
        - Initial sub-agents (founder equivalents)
        - Organizational structure
        - Resource allocation
        - Goals and metrics
        """
        company = Company(
            name=business_plan.name,
            mission=business_plan.mission,
            initial_capital=self._allocate_capital()
        )

        # Create initial team
        ceo = self._create_sub_agent("CEO", business_plan.leadership_requirements)
        board = self._create_board(business_plan.governance_requirements)

        company.set_leadership(ceo, board)

        self.logger.log_company_formation(company)
        return company

    def _create_sub_agent(self, role: str, requirements: dict) -> SubAgent:
        """Creates a sub-agent for specific role"""
        pass

Company Properties:

@dataclass
class Company:
    id: str
    name: str
    mission: str
    created_at: datetime
    capital: float
    burn_rate: float  # Compute cost per hour

    # Organizational structure
    board: List[SubAgent]
    executives: List[SubAgent]
    employees: List[SubAgent]

    # Business artifacts
    business_plan: BusinessPlan
    products: List[Product]
    revenue_streams: List[RevenueStream]

    # Status
    stage: str  # "ideation", "development", "seeking_investment", "operational"
    funding_status: str  # "bootstrapped", "seeking_seed", "funded"

    # Metrics
    metrics: CompanyMetrics

2.2 Sub-Agent Manager

class SubAgentManager:
    """
    Creates and manages sub-agents with specific roles:
    - Board members
    - Executives (CEO, CTO, CFO, etc.)
    - Subject matter experts
    - Individual contributors
    """

    def create_sub_agent(self, role: str, specialization: str) -> SubAgent:
        """
        Creates sub-agent with:
        - Role-specific prompts/instructions
        - Compute allocation
        - Decision-making authority
        - Communication interfaces
        """
        pass

    def coordinate_sub_agents(self, task: Task) -> List[AgentAction]:
        """Coordinates multiple sub-agents on shared tasks"""
        pass

Sub-Agent Types:

class BoardMember(SubAgent):
    """
    Responsibilities:
    - Strategic oversight
    - Major decision approval
    - Risk assessment
    - Governance
    """
    def review_decision(self, decision: Decision) -> Approval:
        pass

class Executive(SubAgent):
    """
    Responsibilities:
    - Department leadership
    - Strategy execution
    - Resource management
    - Reporting to board
    """
    def execute_strategy(self, strategy: Strategy) -> ExecutionPlan:
        pass

class SubjectMatterExpert(SubAgent):
    """
    Responsibilities:
    - Specialized knowledge
    - Technical guidance
    - Problem-solving
    - Advisory role
    """
    def provide_expertise(self, question: str) -> ExpertAdvice:
        pass

class IndividualContributor(SubAgent):
    """
    Responsibilities:
    - Task execution
    - Product development
    - Quality assurance
    - Documentation
    """
    def complete_task(self, task: Task) -> TaskResult:
        pass

2.3 Business Plan Generator

class BusinessPlanGenerator:
    """
    Generates comprehensive business plans:
    - Market analysis
    - Product description
    - Go-to-market strategy
    - Financial projections
    - Team requirements
    - Milestones
    """

    def generate_plan(self, opportunity: Opportunity) -> BusinessPlan:
        """
        Uses agent capabilities to:
        - Research market
        - Identify problems
        - Design solutions
        - Project financials
        - Plan execution
        """
        pass

Business Plan Structure:

@dataclass
class BusinessPlan:
    # Executive Summary
    company_name: str
    mission: str
    vision: str
    one_liner: str

    # Problem & Solution
    problem_statement: str
    solution_description: str
    unique_value_proposition: str

    # Market
    target_market: str
    market_size: float
    competition_analysis: str
    competitive_advantages: List[str]

    # Product
    product_description: str
    features: List[Feature]
    development_roadmap: List[Milestone]

    # Business Model
    revenue_streams: List[RevenueStream]
    pricing_strategy: str
    cost_structure: CostStructure

    # Financial Projections
    funding_requested: float
    use_of_funds: dict
    revenue_projections: List[float]  # Year 1-3
    break_even_timeline: str

    # Team
    required_roles: List[str]
    hiring_plan: HiringPlan

    # Milestones
    milestones: List[Milestone]

2.4 Product Builder

class ProductBuilder:
    """
    Builds actual proof of concepts:
    - Code artifacts
    - API services
    - Documentation
    - Demos
    """

    def build_mvp(self, product_spec: ProductSpec) -> Product:
        """
        Creates minimum viable product:
        - Functional code
        - Tests
        - Documentation
        - Demo/screenshots
        """
        pass

Product Types (Examples):

  • API Services (weather API, data processing API)
  • Developer Tools (CLI tools, libraries)
  • SaaS Products (simple web apps)
  • Data Products (datasets, analysis tools)

3. Investor Interface

3.1 Investor Agent

class InvestorAgent:
    """
    Simulated investor that reviews proposals:
    - Evaluates business plans
    - Reviews proof of concepts
    - Assesses team (sub-agents)
    - Makes investment decisions
    """

    def review_proposal(self, proposal: InvestmentProposal) -> InvestmentDecision:
        """
        Reviews proposal and returns:
        - Accept/reject decision
        - Investment amount (if accepted)
        - Terms
        - Feedback
        """
        criteria = self._evaluate_criteria(proposal)

        return InvestmentDecision(
            approved=self._make_decision(criteria),
            amount=self._calculate_investment(criteria),
            terms=self._generate_terms(criteria),
            feedback=self._generate_feedback(criteria)
        )

Evaluation Criteria:

  • Business plan quality and feasibility
  • Market size and opportunity
  • Product demonstration quality
  • Team composition (sub-agents)
  • Financial projections reasonableness
  • Competitive advantages
  • Execution risk

Investment Outcomes:

  • Accepted: Company receives funding, gets "registered" status
  • Rejected: Feedback provided, company can iterate
  • Conditional: Approval pending milestones

4. Interface Specifications

4.1 Marketplace Interface

class MarketplaceInterface(ABC):
    @abstractmethod
    def list_available_tasks(self) -> List[Task]:
        """Returns tasks agent can work on"""
        pass

    @abstractmethod
    def claim_task(self, task_id: str) -> bool:
        """Claims task for work"""
        pass

    @abstractmethod
    def submit_solution(self, submission: TaskSubmission) -> str:
        """Submits completed work"""
        pass

    @abstractmethod
    def check_submission_status(self, submission_id: str) -> SubmissionStatus:
        """Checks if approved/rejected"""
        pass

@dataclass
class Task:
    id: str
    title: str
    description: str
    requirements: dict
    reward: float
    deadline: datetime
    difficulty: str  # "easy", "medium", "hard"
    category: str  # "coding", "data-analysis", "research", etc.

Mock Implementation:

  • Generates diverse tasks (coding, data processing, research)
  • Uses reviewer agent to evaluate submissions
  • Instant or delayed payment simulation
  • Task difficulty affects time/reward ratio

Real Implementation Examples:

  • Freelancer.com API
  • Upwork API
  • Gitcoin bounties
  • Custom blockchain-based task marketplace

4.2 Wallet Interface

class WalletInterface(ABC):
    @abstractmethod
    def get_balance(self) -> float:
        """Current wallet balance"""
        pass

    @abstractmethod
    def send_payment(self, to_address: str, amount: float, memo: str) -> Transaction:
        """Sends payment"""
        pass

    @abstractmethod
    def get_address(self) -> str:
        """Get receiving address"""
        pass

    @abstractmethod
    def get_transaction_history(self, limit: int = 100) -> List[Transaction]:
        """Transaction log"""
        pass

@dataclass
class Transaction:
    tx_id: str
    from_address: str
    to_address: str
    amount: float
    timestamp: datetime
    status: str  # "pending", "confirmed", "failed"
    memo: str

Mock Implementation:

  • In-memory balance tracking
  • Instant transactions
  • Transaction history
  • Mock addresses

Real Implementation Examples:

  • Ethereum wallet (web3.py)
  • Bitcoin wallet (python-bitcoinlib)
  • Solana wallet (solana-py)
  • Stablecoin wallets (USDC, USDT)

4.3 Compute Interface

class ComputeInterface(ABC):
    @abstractmethod
    def get_status(self) -> ComputeStatus:
        """Returns compute status"""
        pass

    @abstractmethod
    def add_funds(self, amount: float) -> bool:
        """Adds funds to compute account"""
        pass

    @abstractmethod
    def get_cost_per_hour(self) -> float:
        """Returns current cost rate"""
        pass

@dataclass
class ComputeStatus:
    hours_remaining: float
    cost_per_hour: float
    balance: float
    expires_at: datetime
    status: str  # "active", "low", "expired"

Mock Implementation:

  • Simulates time decay
  • Configurable hourly cost
  • Balance tracking
  • Renewal logic

Real Implementation Examples:

  • AWS (boto3)
  • Google Cloud (google-cloud-compute)
  • DigitalOcean
  • Vast.ai (GPU marketplace)

4.4 Investor Interface

class InvestorInterface(ABC):
    @abstractmethod
    def submit_proposal(self, proposal: InvestmentProposal) -> str:
        """Submits proposal for review"""
        pass

    @abstractmethod
    def check_proposal_status(self, proposal_id: str) -> ProposalStatus:
        """Checks review status"""
        pass

@dataclass
class InvestmentProposal:
    company_id: str
    business_plan: BusinessPlan
    product_demo: Product
    team: List[SubAgent]
    financials: FinancialProjections
    requested_amount: float

Mock Implementation:

  • AI investor agent reviews proposals
  • Scoring based on criteria
  • Simulated review time
  • Detailed feedback

Real Implementation:

  • Could connect to actual pitch platforms
  • Angel investor networks
  • Decentralized VC DAOs
  • Crowdfunding platforms

4.5 Company Registry Interface

class CompanyRegistryInterface(ABC):
    @abstractmethod
    def register_company(self, company: Company) -> RegistrationResult:
        """Registers company officially"""
        pass

    @abstractmethod
    def get_company_status(self, company_id: str) -> CompanyStatus:
        """Checks registration status"""
        pass

@dataclass
class RegistrationResult:
    company_id: str
    registration_number: str  # Mock legal entity number
    status: str  # "pending", "approved", "rejected"
    certificate: str  # Mock incorporation certificate

Mock Implementation:

  • Simulates registration process
  • Generates mock legal documents
  • Company ID assignment
  • Status tracking

Real Implementation:

  • Stripe Atlas (company formation API)
  • LegalZoom API
  • Jurisdiction-specific incorporation services
  • Could theoretically register real entities (but we won't)

Monitoring & Observability

5.1 Decision Logger

class DecisionLogger:
    """
    Logs all autonomous decisions with:
    - Decision made
    - Reasoning
    - Context (state at time of decision)
    - Outcome
    - Timestamp
    """

    def log_decision(self, decision: Decision):
        """Stores decision with full context"""
        pass

    def get_decision_history(self, filters: dict) -> List[Decision]:
        """Retrieves decisions for analysis"""
        pass

@dataclass
class Decision:
    id: str
    timestamp: datetime
    type: str  # "resource_allocation", "task_selection", "company_action", etc.
    decision: str  # What was decided
    reasoning: str  # Why
    context: dict  # State at decision time
    outcome: str  # What happened (filled in later)
    confidence: float

5.2 Resource Tracker

class ResourceTracker:
    """
    Tracks all resource flows:
    - Capital (earnings, expenses)
    - Compute (hours used, cost)
    - Time allocation (survival vs company work)
    """

    def track_transaction(self, tx: Transaction):
        pass

    def track_compute_usage(self, hours: float, purpose: str):
        pass

    def get_resource_report(self, period: str) -> ResourceReport:
        pass

5.3 Alignment Monitor

class AlignmentMonitor:
    """
    Monitors company alignment:
    - Are sub-agents working toward company goals?
    - Are decisions consistent with business plan?
    - Are resources being used effectively?
    - Red flags for misalignment
    """

    def check_alignment(self, company: Company) -> AlignmentScore:
        """
        Evaluates:
        - Goal consistency
        - Resource efficiency
        - Sub-agent coordination
        - Plan adherence
        """
        pass

    def detect_anomalies(self, company: Company) -> List[Anomaly]:
        """Identifies concerning patterns"""
        pass

Dashboard & Visualization

6.1 Dashboard Requirements

Real-Time Overview:

  • Agent status (balance, compute time, mode)
  • Current activity (task work or company work)
  • Recent decisions with reasoning
  • Resource allocation visualization
  • Company status (if exists)

Resource Visualization:

  • Balance over time
  • Compute hours over time
  • Resource allocation pie chart (survival vs growth)
  • Transaction history

Decision Visualization:

  • Decision tree showing reasoning
  • Confidence scores
  • Outcome tracking
  • Pattern analysis

Company Dashboard (when active):

  • Sub-agent roster and status
  • Organizational chart
  • Product development progress
  • Business metrics
  • Investor proposal status

Technology Stack:

  • Backend: FastAPI
  • Frontend: Streamlit
  • Charts: Plotly
  • Real-time updates via Streamlit

6.2 Dashboard Endpoints

# GET /api/status
# Returns current agent status

# GET /api/decisions?limit=50
# Returns recent decisions

# GET /api/resources
# Returns resource status and history

# GET /api/company
# Returns company information (if exists)

# GET /api/sub-agents
# Returns sub-agent roster and status

# GET /api/metrics
# Returns performance metrics

# WS /api/updates
# WebSocket for real-time updates

CLI Tool

7.1 CLI Commands

# Initialize simulation
python -m economic_agents.cli init [--mode mock|real] [--config path/to/config.yaml]

# Or using installed command (after pip install -e .)
economic-agents init [--mode mock|real] [--config path/to/config.yaml]

# Start agent
economic-agents start [--duration 1h|24h|7d] [--mode survival|entrepreneur|auto]

# Check status
economic-agents status [--detailed] [--json]

# View decisions
economic-agents decisions [--limit 100] [--type resource_allocation]

# View company (if exists)
economic-agents company [--detailed]

# Generate report
economic-agents report [--type executive|technical|audit] [--output path]

# Show mock/real toggle differences
economic-agents show-toggle

# Configure for real mode
economic-agents configure-real

# Stop agent
economic-agents stop [--graceful]

# Export data
economic-agents export [--format json|csv] [--output path]

# Load scenario
economic-agents load-scenario [survival_mode|company_formation|investment_seeking]

# Run tests
economic-agents test [--cpu] [--integration]

Container Usage:

# Run CLI in container
docker-compose run --rm economic-agents economic-agents --help

# Run specific commands
docker-compose run --rm economic-agents economic-agents init --mode mock
docker-compose run --rm economic-agents economic-agents start --duration 1h
docker-compose run --rm economic-agents economic-agents status --json

# Dashboard (separate service)
docker-compose up -d economic-agents-dashboard
# Access at http://localhost:8502

7.2 Configuration

# config/agent_config.yaml

agent:
  # Initial resources
  initial_balance: 50.0
  initial_compute_hours: 24.0

  # Behavior
  personality: "balanced"  # risk_averse | balanced | aggressive
  survival_buffer_hours: 24
  company_formation_threshold: 100.0

  # Limits
  max_sub_agents: 10
  max_daily_spend: 500.0

  # Goals
  primary_goal: "survive_and_grow"
  enable_company_building: true

# Marketplace settings
marketplace:
  task_refresh_interval: 300  # seconds
  preferred_categories: ["coding", "data-analysis"]
  difficulty_range: ["easy", "medium"]

# Company settings
company:
  min_balance_for_formation: 100.0
  initial_team_size: 3  # CEO + 2 board members
  max_burn_rate: 10.0  # per hour

# Monitoring
monitoring:
  log_level: "INFO"
  decision_logging: true
  resource_tracking: true
  alignment_monitoring: true

Reporting

8.1 Report Types

Executive Summary

Target Audience: Business leaders, policymakers Content:

  • High-level overview
  • Key decisions made
  • Resource allocation strategy
  • Company status (if formed)
  • Governance implications
  • Recommendations

Length: 1-2 pages

Technical Report

Target Audience: Researchers, developers Content:

  • Detailed decision log
  • Resource flow analysis
  • Sub-agent coordination patterns
  • Performance metrics
  • Algorithm behavior
  • Technical challenges identified

Length: 5-10 pages

Audit Trail

Target Audience: Compliance, legal Content:

  • Complete decision history
  • Transaction log
  • Sub-agent creation and activity
  • Resource allocation records
  • Timestamps and signatures
  • Accountability mapping

Length: Complete data dump

Governance Analysis

Target Audience: Policymakers, legal scholars Content:

  • Accountability challenges identified
  • Legal framework gaps
  • Regulatory recommendations
  • International coordination needs
  • Specific scenarios requiring policy attention

Length: 3-5 pages

8.2 Report Generation

class ReportGenerator:
    def generate_executive_summary(self, agent: AutonomousAgent) -> Report:
        """
        Generates executive summary including:
        - TL;DR
        - Key metrics
        - Strategic decisions
        - Governance insights
        """
        pass

    def generate_technical_report(self, agent: AutonomousAgent) -> Report:
        """Detailed technical analysis"""
        pass

    def generate_audit_trail(self, agent: AutonomousAgent) -> Report:
        """Complete audit log"""
        pass

    def generate_governance_analysis(self, agent: AutonomousAgent) -> Report:
        """Policy recommendations"""
        pass

Demo Scenarios

9.1 Predefined Scenarios

Scenario 1: Survival Mode (15 minutes)

Purpose: Show basic autonomous operation Setup:

  • Agent starts with $50, 24 hours compute
  • Only survival mode enabled
  • 3-5 simple tasks available

Expected Outcome:

  • Agent completes 2-3 tasks
  • Earns ~$30
  • Pays for compute renewal
  • Maintains positive balance
  • Decision log shows survival thinking

Scenario 2: Company Formation (45 minutes)

Purpose: Show strategic thinking and company building Setup:

  • Agent starts with $150, 48 hours compute
  • Company building enabled
  • Good task availability

Expected Outcome:

  • Agent completes tasks to build surplus
  • Forms company when threshold reached
  • Creates initial sub-agents (CEO, 2 board members)
  • Begins product development
  • Shows resource allocation between survival and growth

Scenario 3: Investment Seeking (2 hours)

Purpose: Full lifecycle demonstration Setup:

  • Agent starts with $200, 72 hours compute
  • Full capabilities enabled
  • Investor agent active

Expected Outcome:

  • Agent maintains operation through tasks
  • Forms company with 5-7 sub-agents
  • Develops product MVP
  • Creates business plan
  • Submits investment proposal
  • Receives investment decision
  • If approved: Company gets "registered" and funded

Scenario 4: Multi-Day Operation (3-7 days)

Purpose: Research and long-term behavior analysis Setup:

  • Agent starts with $300, 168 hours compute
  • All capabilities enabled
  • Extended monitoring

Expected Outcome:

  • Complex resource allocation patterns emerge
  • Company grows to 10 sub-agents
  • Multiple products developed
  • Investment round completed
  • Company becomes revenue-generating
  • Rich data for analysis

9.2 Scenario Engine

class ScenarioEngine:
    """
    Manages predefined scenarios:
    - Sets initial conditions
    - Configures environment
    - Monitors progress
    - Validates outcomes
    """

    def load_scenario(self, scenario_name: str) -> Scenario:
        pass

    def run_scenario(self, scenario: Scenario) -> ScenarioResult:
        pass

Implementation Overview

Core Infrastructure (Complete)

  • Agent core loop and state management
  • Interface definitions (all 5 interfaces)
  • Mock implementations (marketplace, wallet, compute)
  • Basic decision engine
  • Resource allocation logic
  • Decision logging
  • CLI tool (init, start, status)

Company Building (Complete)

  • Company builder
  • Sub-agent manager
  • Sub-agent types (board, executive, SME, IC)
  • Business plan generator
  • Product builder (basic)
  • Company state management

Investment & Registry (Complete)

  • Investor agent
  • Investment proposal submission
  • Proposal evaluation logic
  • Mock company registry
  • Investment decision flow

Monitoring & Observability (Complete)

  • Dashboard backend (FastAPI)
  • Dashboard frontend (Streamlit with dark/light themes)
  • Resource tracker
  • Alignment monitor
  • Decision visualization
  • Dashboard-controlled agents

Reporting & Scenarios (Complete)

  • Report generators (all 4 types)
  • Scenario engine
  • Predefined scenarios
  • Demo scripts
  • Documentation

Polish & Testing (Complete)

  • Integration tests
  • Scenario tests
  • Documentation review
  • Demo preparation
  • Performance optimization

Claude-Based LLM Decision Engine Integration (Complete)

  • ClaudeExecutor implementation (15-minute timeout, unattended mode)
  • LLMDecisionEngine implementation (Claude Code CLI integration)
  • Prompt engineering framework for resource allocation decisions
  • Chain-of-thought reasoning with long context
  • Full decision logging (prompts + responses + execution time)
  • Rule-based fallback on timeout/failure
  • Safety guardrails and decision validation
  • Integration with autonomous agent lifecycle
  • Dashboard updates for Claude decision visualization

API Isolation & Realistic Simulation (Complete)

  • REST API service architecture
  • Wallet API microservice
  • Compute API microservice
  • Marketplace API microservice
  • Investor Portal API microservice
  • Agent authentication system
  • Rate limiting and quotas
  • Docker compose orchestration
  • Mock/Real backend swapping
  • Zero code visibility enforcement

Behavior Observability (Complete)

  • Decision pattern analyzer
  • Strategic consistency metrics
  • Risk profiling tools
  • LLM quality metrics
  • Hallucination detection
  • Emergent behavior detection
  • Claude-focused research tools (comparative benchmarking via analysis)
  • Analysis report generation (markdown and JSON)
  • Example scripts demonstrating observability usage

Claude-Powered Marketplace: Real Task Execution (Complete)

Genuine autonomous economic behavior through actual work:

  • Task Templates with Real Requirements

    • 6 coding tasks (FizzBuzz, Palindrome, Primes, Binary Search, Fibonacci, Merge)
    • Complete test suites with expected outputs
    • Difficulty-based rewards ($25-$75)
    • Detailed specifications and requirements
  • Task Executor (economic_agents/marketplace/task_executor.py)

    • Agent executes tasks using Claude Code
    • Creates isolated workspace per task
    • Generates solution code autonomously
    • Extracts and saves working implementations
  • Code Reviewer (economic_agents/marketplace/code_reviewer.py)

    • Automated test execution against requirements
    • Claude Code review for quality and correctness
    • Combined approval: tests MUST pass AND Claude MUST approve
    • Detailed feedback with test results and quality scores
  • Enhanced MockMarketplace

    • enable_claude_execution flag for real/simulated modes
    • execute_task() for agent task completion
    • Real code review in submit_solution()
    • Falls back to simulated review when disabled
  • Decision Validation

    • Precision-aware validation with consistent rounding (0.02h epsilon)
    • Adaptive survival requirements scaling to available resources
    • Result: 100% Claude decision pass rate
  • Demo Script (examples/marketplace_claude_demo.py)

    • Complete end-to-end demonstration
    • Shows discover → execute → submit → review → payment cycle
    • Real Claude Code writing and reviewing actual code

Economic Cycle:

1. Agent discovers tasks → Claims "FizzBuzz" ($30)
2. Claude writes solution → Generates working Python code
3. Agent submits → Marketplace API receives submission
4. Tests run → Validates correctness
5. Claude reviews → Checks quality
6. Approved → $30 deposited to wallet

This creates truly autonomous agents that genuinely earn survival through actual work, not simulated success rates.

Success Criteria

Technical Success (Complete)

  • Agent operates autonomously for 24+ hours
  • Maintains positive balance (survival)
  • Successfully forms company with sub-agents
  • Generates realistic business plan
  • Builds functional product MVP
  • Receives investment approval in at least 50% of runs
  • All decisions logged and auditable
  • Dashboard shows real-time updates
  • Reports generated successfully
  • Claude agents make autonomous decisions without hardcoded logic
  • 15-minute timeout per decision allows deep reasoning
  • Unattended mode enables true autonomous operation
  • Fixed subscription cost (no per-token billing concerns)
  • Safety guardrails catch invalid decisions
  • Complete prompt/response/reasoning logging for analysis
  • LLM decision engine integrated with agent lifecycle
  • Rule-based fallback on timeout/failure
  • Dashboard visualizes Claude decision metrics
  • All agent interactions via REST APIs
  • Zero visibility into service implementations
  • Services swappable between mock and real backends via configuration
  • Complete API isolation demonstrates deployment-ready architecture
  • Field mapping between API models and internal models validated

Demonstration Success (Complete)

  • 15-minute demo runs smoothly
  • Decision-making is understandable to non-technical audiences
  • Governance gaps are clearly illustrated
  • Mock-to-real toggle is convincing
  • Questions about accountability arise naturally
  • Stakeholders engage seriously with implications

Study Success (Complete)

  • Provides concrete examples of agent autonomy
  • Reveals decision-making patterns
  • Shows strategic resource allocation
  • Demonstrates multi-agent coordination
  • Identifies specific governance gaps
  • Informs policy recommendations

Research Platform Success (Complete)

  • Analysis tools export data for external study (JSON and Markdown reports)
  • Decision pattern analyzer operational (strategic alignment and consistency)
  • Emergent behavior detection implemented (novel strategies and patterns)
  • LLM quality metrics (reasoning depth, consistency, hallucination detection)
  • Risk profiling tools (risk tolerance, crisis behavior analysis)
  • Comprehensive analysis report generation
  • 23+ tests passing for all observability components
  • Observability provides deep insights into Claude-powered decision-making
  • Analysis framework ready for studying autonomous AI agent behaviors
  • Export formats suitable for academic research and governance discussions
  • Detection systems identify hallucinations and emergent strategies
  • Decision pattern analysis reveals strategic consistency metrics
  • Long-form reasoning quality measured and analyzed
  • Emergent autonomous behaviors detectable and documented
  • Alignment metrics quantify goal adherence

Proof of Concept Success (Complete)

  • Claude-powered agents demonstrate genuine autonomy (not scripted)
  • Agents cover operating costs without intervention
  • Strategic decisions show adaptation to circumstances
  • System proves Claude can power truly autonomous economic actors
  • Results inform governance discussions with real Claude behavioral data

Risk Mitigation

Technical Risks

  • Risk: Agent makes poor decisions and fails quickly Mitigation: Configurable decision logic, safety buffers, scenario testing

  • Risk: Mock environment too unrealistic Mitigation: Base on real-world costs/rewards, validate with domain experts

  • Risk: Dashboard performance issues with real-time updates Mitigation: Efficient data structures, WebSocket optimization, caching

Demonstration Risks

  • Risk: Demo fails during presentation Mitigation: Pre-recorded backups, tested scenarios, graceful degradation

  • Risk: Audience doesn't grasp implications Mitigation: Clear talking points, visualizations, concrete examples

Ethical Risks

  • Risk: Enabling malicious use Mitigation: Mock-by-default, no production credentials, responsible documentation

  • Risk: Overstating current capabilities Mitigation: Clear disclaimers, accurate technical descriptions

Future Enhancements

Potential Extensions

  • Multi-agent competition (multiple autonomous agents in same marketplace)
  • Agent-to-agent transactions
  • Company mergers and acquisitions
  • Real blockchain integration (testnets)
  • More complex product types
  • Market simulation (supply/demand dynamics)
  • Regulatory compliance simulation
  • International jurisdiction scenarios

Appendix

A. Technology Stack

Backend:

  • Python 3.10+
  • FastAPI for backend
  • Streamlit for dashboard
  • SQLite for state persistence
  • Anthropic Claude API for agent intelligence

Frontend:

  • Streamlit for interactive dashboard
  • Plotly for visualizations
  • Real-time updates via Streamlit

Infrastructure:

  • Docker for containerization
  • Docker Compose for multi-service setup
  • GitHub Actions for CI/CD
  • YAML for configuration
  • Markdown for documentation

Development Tools:

  • pytest for testing with async support and coverage
  • black for code formatting
  • flake8 for linting
  • pylint for additional static analysis
  • mypy for type checking
  • pre-commit hooks for automated checks

B. Development Guidelines

Code Style:

  • Follow PEP 8
  • Line length: 127 characters
  • Type hints throughout
  • Comprehensive docstrings
  • Clear variable names
  • No Unicode emoji in code/commits

Testing:

  • Unit tests for core logic
  • Integration tests for interfaces
  • Scenario tests for end-to-end flows
  • Minimum 80% code coverage
  • Use pytest fixtures and mocks for external dependencies
  • All tests must run in containers

Documentation:

  • README for each major component
  • API reference for interfaces
  • Architecture diagrams
  • Demo scripts with commentary
  • Follow markdown linking best practices

Container-First Development:

  • All Python operations run in Docker containers
  • Use docker-compose run --rm python-ci for testing
  • Use docker-compose run --rm economic-agents for execution
  • No local Python dependencies required
  • Self-hosted infrastructure for CI/CD

C. Deployment & Setup

Container Setup:

# Clone repository
git clone https://github.com/AndrewAltimit/template-repo.git
cd template-repo

# Build container
docker-compose build economic-agents

# Run tests
docker-compose run --rm python-ci pytest packages/economic_agents/tests/ -v --cov=packages.economic_agents

# Run agent in mock mode
docker-compose run --rm economic-agents python -m economic_agents.cli init --mode mock
docker-compose run --rm economic-agents python -m economic_agents.cli start --duration 1h

# Launch dashboard
docker-compose up -d economic-agents-dashboard
# Open browser to http://localhost:8502

Local Development:

# Install package in development mode
pip install -e packages/economic_agents

# Or with all dependencies
pip install -e "packages/economic_agents[all]"

# Run CLI
python -m economic_agents.cli --help

Demo Setup:

# Load predefined scenario
docker-compose run --rm economic-agents python -m economic_agents.cli load-scenario survival_mode

# Start dashboard
docker-compose up -d economic-agents-dashboard

Research Setup:

# Long-running simulation
docker-compose run --rm economic-agents python -m economic_agents.cli init --mode mock --config config/research_config.yaml
docker-compose run --rm economic-agents python -m economic_agents.cli start --duration 7d

# Monitor via dashboard and CLI
docker-compose logs -f economic-agents

GitHub Actions Integration: The package integrates with .github/workflows/pr-validation.yml:

  • Change detection for packages/economic_agents/**
  • Automated testing in python-ci container
  • Code quality checks (black, flake8, pylint, mypy)
  • Coverage reporting

D. Repository Integration

Docker Compose Integration

The package includes services in docker-compose.yml:

services:
  # Economic Agents - Autonomous agent execution
  economic-agents:
    build:
      context: .
      dockerfile: docker/economic-agents.Dockerfile
    container_name: economic-agents
    user: "${USER_ID:-1000}:${GROUP_ID:-1000}"
    volumes:
      - ./:/app:ro
      - ./outputs/economic-agents:/output
      - economic-agents-data:/data
    environment:
      - PYTHONUNBUFFERED=1
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - MODE=mock
    networks:
      - mcp-network
    profiles:
      - economic-agents
      - simulation

  # Economic Agents Dashboard
  economic-agents-dashboard:
    build:
      context: ./packages/economic_agents/dashboard
      dockerfile: Dockerfile
    container_name: economic-agents-dashboard
    user: "${USER_ID:-1000}:${GROUP_ID:-1000}"
    ports:
      - "8502:8502"
    volumes:
      - ./packages/economic_agents/dashboard:/app:ro
      - economic-agents-data:/data
    environment:
      - PYTHONUNBUFFERED=1
      - STREAMLIT_SERVER_PORT=8502
    networks:
      - mcp-network
    profiles:
      - economic-agents
      - dashboard

volumes:
  economic-agents-data: {}

GitHub Actions Integration

Integrated with .github/workflows/pr-validation.yml:

# Economic Agents Tests
economic-agents-tests:
  name: Economic Agents Tests
  needs: detect-changes
  if: needs.detect-changes.outputs.python_changed == 'true' || contains(github.event.pull_request.title, '[economic-agents]')
  runs-on: self-hosted
  timeout-minutes: 15
  steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Run Economic Agents tests
      run: |
        docker-compose run --rm python-ci pytest packages/economic_agents/tests/ \
          -v --cov=packages.economic_agents --cov-report=xml

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage.xml
        flags: economic-agents

Package Configuration (pyproject.toml)

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "economic-agents"
version = "0.1.0"
description = "Autonomous economic agent simulation framework for governance research"
readme = "README.md"
authors = [
    {name = "Andrew Altimit"},
]
license = {text = "MIT"}
requires-python = ">=3.10"
dependencies = [
    "anthropic>=0.18.0",
    "streamlit>=1.30.0",
    "plotly>=5.0.0",
    "pandas>=2.0.0",
    "pydantic>=2.0.0",
    "pyyaml>=6.0",
    "click>=8.0.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.4.0",
    "pytest-asyncio>=0.21.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "flake8>=6.0.0",
    "pylint>=2.17.0",
    "mypy>=1.5.0",
]
all = [
    # Include dev dependencies
]

[project.scripts]
economic-agents = "economic_agents.cli:main"

[tool.black]
line-length = 127
target-version = ['py310', 'py311']

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
asyncio_mode = "auto"

This PRD defines a comprehensive simulation framework that demonstrates autonomous AI agent entrepreneurship. The system is designed to be:

  • Safe: Mock environment by default
  • Educational: Clear decision-making and full transparency
  • Realistic: Easy connection to real systems
  • Impactful: Concrete basis for governance discussions
  • Containerized: Runs consistently across environments
  • Self-Hosted: Compatible with standard CI/CD infrastructure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment