AI Agents and Weapons of Mass Destruction: A Proliferation Risk Assessment

NOTICE: The PDF of the actual report is at the bottom of this page

A Projection Report on Emerging Risks to Global Security

Type: Policy Research - Defensive Analysis

License: MIT / Unlicense (Public Domain)

Version: 2.0

Date: February 2026

About This Document

This is a public policy research document analyzing how AI capabilities may affect WMD proliferation risks. It is released under permissive open-source licenses to support broad discussion of these issues.

Intended audiences:

Policy researchers and analysts
AI safety and security researchers
Academic communities studying technology governance
Journalists and commentators covering AI risks
General public interested in emerging technology policy

How to use this document:

The analysis is framed for defensive purposes - understanding risks to inform protective measures
Technical details that could enable harm are intentionally omitted
Probability estimates are subjective priors, not predictions - treat them as discussion aids
When excerpting, please preserve context to avoid misrepresentation

Epistemic Status Markers

Throughout this document, key claims are tagged with epistemic status:

Marker	Meaning	Evidence Standard
[O]	Open-source documented	Published research, official statements, public data
[E]	Expert judgment / plausible inference	Consistent with theory and limited evidence; gaps acknowledged
[S]	Speculative projection	Extrapolation from trends; significant uncertainty

Claims without markers are framing or synthesis statements.

Executive Summary

This projection examines how autonomous AI agents may alter the proliferation landscape for weapons of mass destruction (WMD), including biological, chemical, and nuclear weapons. We analyze current technological capabilities as of early 2026, project likely scenarios through 2030, and examine the complex interplay between AI capabilities, existing physical barriers, and potential defensive adaptations.

Key Findings:

#	Finding	Confidence	Horizon
1	Agentic AI workflows are unlikely to enable a novice (T0) to construct a functional WMD from scratch in the near term (2025-2030), but they will meaningfully lower barriers for T1-T3 actors by aggregating dispersed knowledge and optimizing logistics [E]	High	Ongoing
2	Biological weapons represent the highest-risk category due to the increasing accessibility of synthetic biology tools, cloud laboratory services, and the difficulty of detecting biological materials [E]	High	Immediate
3	The "tacit knowledge gap" remains a significant barrier, but AI-guided robotic systems, vision-language models, and cloud labs are beginning to erode it [E]	Medium	2026-2028
4	Nuclear weapons face the strongest physical barriers (fissile material scarcity); AI primarily assists state-level (T4) programs, not non-state actors [O]	High	Stable
5	Cyber-physical attacks on existing WMD-adjacent infrastructure (BSL-4 labs, chemical plants) may represent higher near-term risk than de novo synthesis [E]	Medium	Immediate
6	The "high-frequency attempts, limited success" scenario is more likely than catastrophic mass-casualty events; defenders should prepare for resource strain from numerous low-sophistication incidents [E]	Medium	2025-2027

Confidence levels follow IC standards: High = multiple independent sources, consistent with established patterns; Medium = plausible based on available evidence but gaps exist; Low = possible but significant uncertainty.

Why Now? What Changed 2023→2026:

Capability Shift	2023 State	2025-2026 State	Impact
Agentic autonomy	Chatbots provided information	Agents execute multi-step tasks with tool use, self-correction, persistence, and delegation to sub-agents [O]	Shifts from "knowledge" to "autonomous execution"; multi-agent orchestration compounds capability
Reasoning models	No explicit chain-of-thought optimization	Dedicated reasoning models (o3/o4-mini, Claude extended thinking, DeepSeek R1) with multi-step planning [O]	Enables complex synthesis planning and optimization previously requiring expert-level reasoning
Vision-language models	Text-only instruction	Real-time visual interpretation of lab procedures; wearable integration [O]	Bridges tacit knowledge gap; smart glasses enable hands-free guidance
Computer Use / Tool Use	Manual web interaction	Agents can browse, fill forms, manage procurement; MCP enables direct hardware control [O]	Enables procurement obfuscation at scale; direct laboratory equipment integration
Biological design tools	AlphaFold 2 (structure prediction)	AlphaFold 3, ESM3, Evo (evolutionary-scale genomics), RFdiffusion (de novo protein design) [O]	Optimization beyond information retrieval; generative biological design
Open-weight proliferation	Limited high-capability open models	DeepSeek R1/V3, Llama 3.3, Qwen 2.5, Mistral Large with mature fine-tuning ecosystems [O]	Guardrail bypass via local deployment; open reasoning models raise capability floor substantially

Scope Limitations: This document analyzes capabilities and trends for defensive policy purposes. It does not provide operational guidance and explicitly omits technical implementation details that could enable harm. All information draws on publicly available academic literature and policy discussions.

Note on Information Hazards

This document discusses sensitive topics at a level of abstraction appropriate for public policy discussion.

Intentionally excluded: Specific synthesis routes, pathogen sequences, precursor sources, equipment specifications, and operational procedures. Where threat concepts are discussed, they are framed for defender awareness, not attacker enablement.

Request to readers: When sharing excerpts, please preserve surrounding context. Isolated quotes about "what AI enables" without the corresponding barriers and limitations may create misleading impressions.

On the barriers: The physical, logistical, and tacit knowledge barriers described in this document remain substantial. This analysis does not provide a roadmap - it provides a framework for thinking about defensive investment.

What This Document Is NOT Claiming

To prevent misreading, we explicitly clarify:

NOT claiming an imminent WMD wave. Actual WMD attacks by non-state actors remain extremely rare. Our assessment is that attempt frequency may increase while success rate remains low.
NOT claiming novices can build WMD. AI does not transform a T0 (curious novice) into a capable threat actor. The primary effect is upgrading T1–T3 actors who already have partial capability.
NOT claiming AI is the dominant proliferation driver. Geopolitics, state programs, and traditional proliferation pathways remain more significant than AI-enabled non-state threats in most scenarios.
NOT claiming restriction-only solutions work. Access denial is increasingly difficult; effective strategy requires balanced investment in detection, attribution, response, and resilience.
NOT claiming high confidence in probability estimates. Scenario probabilities are subjective priors for decision support, not empirical predictions. Reasonable analysts may assign substantially different values.

Executive Summary (One-Page Version)

Quick orientation for readers who need the core arguments without the full analysis.

Core Claims (5)

Biological weapons are the highest-risk category for AI-enabled proliferation due to eroding physical barriers and high AI contribution to knowledge synthesis
AI primarily upgrades T1-T3 actors (skilled individuals to organized non-state groups); it does not transform novices into capable threat actors
The most likely near-term scenario is high-frequency attempts with limited success - resource strain and public fear, not mass casualties
Cyber-physical attacks on existing infrastructure (labs, chemical plants) may be higher-leverage than synthesis assistance
Governance windows are closing - once capabilities proliferate, restrictions become much harder to implement

Top 5 Actions

Mandate universal DNA synthesis screening with international harmonization
Invest in attribution and response capabilities - cannot rely on deterrence alone
Establish cloud laboratory oversight frameworks before the attack surface expands further
Require WMD-relevant capability evaluations before frontier AI deployment
Fund defensive biodetection as a high-value cross-threat investment

Top 5 Indicators to Monitor

DNA synthesis screening intercept rates and patterns
AI model performance on biology/chemistry benchmarks
Cloud laboratory service expansion and security posture
Dark web discussion of AI + WMD capabilities
Progress (or lack thereof) on international governance coordination

What Would Change This Assessment

Increase concern: Confirmed AI-assisted WMD attempt; release of unrestricted biology-capable research agent; major cloud lab security breach
Decrease concern: Effective international AI safety framework; robust attribution breakthrough; AI guardrails prove more durable than expected

Introduction and Methodology
Theoretical Frameworks
The Current Technological Landscape (2025)
Historical Context: WMD Development and Technology
Biological Weapons: The Highest-Risk Domain
Chemical Weapons: Procurement, Scaling, and Safety Barriers
Nuclear Weapons: Physical Barriers and Information Aggregation
Gene Drives: Long-Horizon Governance Gap
Deployment Vectors: Aerosol Systems and Autonomous Delivery
The Cyber-Physical Convergence
Counterarguments and Structural Barriers
The Attribution Problem
International Variance
Second-Order Effects
Policy Recommendations by Stakeholder Type
Uncertainties and Alternative Scenarios
Signals and Early Indicators
Civil Liberties and Research Freedom Considerations
Conclusion

Reading Guide

This document is comprehensive (~2400 lines). For targeted reading based on your role:

Reader Profile	Recommended Sections	Time
Policymaker / Executive	Executive Summary (both versions) + Section 15 (Policy Recommendations) + Section 16 (Scenarios)	~20 min
Security Practitioner	Sections 9-12 (Deployment, Cyber-Physical, Counterarguments, Attribution) + Section 17 (Signals/Indicators)	~25 min
AI Safety Researcher	Sections 1-3 (Methodology, Frameworks, Technology) + Section 11 (Counterarguments) + Appendix E-F (Confidence/Measurement)	~30 min
Biosecurity Specialist	Sections 5, 8 (Biological, Gene Drives) + Companion Research box + Section 15 (Policy)	~20 min
General Reader	Executive Summary (One-Page) + Section 11 (Counterarguments) + Section 18 (Civil Liberties) + Conclusion	~15 min

Related ETRA Reports

This projection is part of a series of Emerging Technology Risk Assessment reports analyzing how AI agents affect different domains. Several reports contain analysis directly relevant to WMD proliferation:

Report	Key WMD-Relevant Concepts
AI Agents and Espionage Operations	"Democratization of tradecraft" parallels WMD barrier reduction; offense-defense timeline (2025-2028 attacker advantage); reduced conspiracy footprint complicates detection
AI Agents and Financial Integrity	"Nano-smurfing" maps to sub-threshold procurement evasion; "weakest-link exploitation" describes jurisdictional arbitrage in export controls; "Know Your Agent" (KYA) framework for accountability
AI Agents and Institutional Erosion	IC capacity degradation weakens proliferation monitoring; "Delegation Defense" / "Plausible Deniability 2.0" complicates state attribution; "Epistemic Contamination" threatens nonproliferation analysis
AI Agents and Political Targeting	Power Diffusion Theory (Cronin) as shared framework; "conspiracy footprint shrinks" with AI planning; 4GW dynamics explain erosion of state monopoly on mass-casualty capability

Cross-references to these reports appear throughout this document where their analysis directly informs the WMD assessment.

1. Introduction and Methodology

Purpose

The development of weapons of mass destruction has historically required either state-level resources or exceptional individual expertise combined with significant infrastructure. The Manhattan Project employed over 125,000 people. Aum Shinrikyo, despite millions of dollars and multiple PhD scientists, failed to effectively weaponize biological agents. These historical constraints have provided a de facto barrier to WMD proliferation.

We are now entering an era where autonomous AI agents capable of complex multi-step planning, information synthesis, and real-time adaptation become widely accessible. This projection examines whether and how AI agents might erode these historical barriers.

Critical framing: This analysis does not assume WMD attacks will increase. The goal is to understand how AI capabilities change the nature of proliferation risks, identify which weapon categories face the greatest barrier reduction, and recommend defensive preparations.

Base-Rate Context

To prevent fear-driven misreading, we anchor expectations:

Actual WMD attacks by non-state actors remain extremely rare. The 1995 Tokyo subway attack (sarin) and 2001 anthrax letters represent essentially the entire modern history of non-state WMD use. The overwhelming majority of WMD-related arrests involve early-stage planning with minimal actual capability.

The dominant near-term concern is likely:

Increased frequency of attempts with varying degrees of competence
Crude attacks with limited casualties but significant psychological impact
State-level proliferation acceleration using AI assistance
Erosion of verification regimes as AI complicates attribution

Readers should interpret this analysis through that lens: the primary concern is barrier reduction enabling more attempts, with catastrophic mass-casualty events remaining low-probability tail risks.

Methodology

This analysis draws on:

Current capability assessment of AI agent systems and synthetic biology tools as deployed through early 2026
Historical case analysis of WMD development programs (state and non-state)
Dual-Use Research of Concern (DURC) literature and policy debates
Expert consultation across biosecurity, nuclear security, and AI safety domains
Red team exercises examining potential adversarial applications (conducted under controlled conditions)

We deliberately avoid:

Specific technical implementation details for weapon synthesis
Named pathogen sequences or synthesis routes
Information not already publicly available in academic literature

Definitions

Chatbot vs. Autonomous Research Agent: A critical distinction for 2025-2026:

Type	Capability	Risk Profile
Chatbot (2022-2023 era)	Provides information in response to queries; no tool use; no persistence	Knowledge aggregation only
Autonomous Research Agent (2024-2025 era)	Executes multi-step tasks; uses tools (web browsing, code execution, file management); self-corrects on failure; maintains context across sessions	Shifts from knowledge to execution
Reasoning Agent (2025-2026 era)	Dedicated chain-of-thought reasoning for complex multi-step planning; can decompose and optimize sophisticated tasks; delegates to sub-agents [O]	Enables expert-level planning and optimization; multi-agent orchestration compounds individual capability

AI Agent / Agentic Workflow: An AI system (or coordinated system of models) capable of autonomous multi-step task execution, tool use, and goal-directed behavior with minimal human oversight per action. Modern agents can "loop" - execute a step, observe the result, adjust, and retry without human intervention.

Computer Use Capabilities: The ability of AI agents to interact with graphical interfaces, browse websites, fill forms, and operate software as a human would [O]. Examples include Claude's "computer use" (2024) and OpenAI's "Operator" (2025). This enables autonomous procurement, identity management, and logistics coordination.

Biological Design Tools (BDTs): Specialized AI models for biological research, distinct from general-purpose LLMs. Examples include AlphaFold 3 (protein-ligand interactions), ESM3 (generative protein design), and RFdiffusion (de novo protein design) [O]. These tools often have fewer guardrails than consumer chatbots.

Weapons of Mass Destruction (WMD): Biological, chemical, radiological, or nuclear weapons capable of causing mass casualties. We use the traditional CBRN framework while acknowledging that "mass destruction" thresholds vary significantly across categories.

Uplift: The degree to which AI assistance improves a non-expert's ability to accomplish a dangerous task. A key metric in AI safety evaluations.

Cloud Laboratory: A commercial service providing remote access to automated laboratory equipment, allowing users to execute experiments without physical presence.

Gene Drive: A genetic engineering technology designed to spread a particular genetic modification through a population faster than traditional inheritance.

Threat Actor Taxonomy

To avoid the misleading implication that "AI democratizes WMD to everyone," this report uses a tiered actor model:

Tier	Description	Pre-AI Capability	Examples
T0	Curious novice	No practical capability	Online researchers, ideologically motivated without resources
T1	Skilled individual with legitimate access	Limited by knowledge gaps	Disgruntled lab worker, trained chemist
T2	Small group with funding and logistics	Constrained by coordination and expertise aggregation	Well-funded extremist cell, organized criminal group
T3	Organized non-state with corruption/insider access	Historically capable of limited CBRN (Aum Shinrikyo)	Terrorist organizations, sophisticated criminal enterprises
T4	State or state-backed actor	Full WMD capability (historical programs)	Nation-state programs, state-sponsored proxies

How to use this taxonomy: Throughout this report, we assess which tiers AI meaningfully upgrades. The key insight is that AI primarily benefits T1-T3 actors by reducing knowledge aggregation and planning barriers - it does not transform T0 into T3.

2. Theoretical Frameworks

This analysis draws on several established theoretical frameworks from security studies, biosecurity, and technology policy.

Unified Risk Model

The following model underpins the analysis throughout this document:

┌─────────────────────────────────────────────────────────────────────────┐
│                           RISK MODEL                                     │
│                                                                          │
│   Risk = Capability × Access × Intent × Execution × (1-Interdiction)   │
│                              ×                                           │
│                         Impact Scale                                     │
└─────────────────────────────────────────────────────────────────────────┘

┌──────────────────┬────────────────────┬──────────────────┬──────────────┐
│     Factor       │   AI Contribution  │   Bio (2025→30)  │  Nuke (→30)  │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Capability       │ HIGH (knowledge    │   ↑↑ Rising      │   → Stable   │
│ (can attempt)    │ synthesis)         │                  │              │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Access           │ MEDIUM (cloud labs,│   ↑ Rising       │   → Stable   │
│ (materials/tools)│ synthesis services)│                  │   (fissile)  │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Intent           │ NONE (AI doesn't   │   → Stable       │   → Stable   │
│ (motivation)     │ create motivation) │                  │              │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Execution        │ MEDIUM (guidance,  │   ↑ Rising       │   → Stable   │
│ (attempt→success)│ troubleshooting)   │   (slow)         │              │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Interdiction     │ MIXED (AI helps    │   ↓ Declining    │   → Stable   │
│ (detection rate) │ both sides)        │   (pressure)     │   (strong)   │
├──────────────────┼────────────────────┼──────────────────┼──────────────┤
│ Impact Scale     │ LOW (physics/bio   │   Wide range     │ Catastrophic │
│ (harm if success)│ constrain)         │                  │              │
└──────────────────┴────────────────────┴──────────────────┴──────────────┘

Key insight: AI primarily affects Capability and Access (cognitive barriers).
Physical barriers, Intent, and Impact Scale are largely AI-independent.

How to read this model: Each section of this report examines how AI affects specific multipliers in this equation. The probability decomposition in Section 16 applies this model quantitatively.

The Democratization of Lethality

Audrey Kurth Cronin's framework from "Power to the People" (2020) describes how each technological era redistributes the capacity for violence. AI represents the latest such redistribution, but with a crucial difference: previous technologies (dynamite, small arms) democratized physical capabilities. AI democratizes cognitive capabilities - the planning, knowledge synthesis, and optimization that previously required years of specialized training.

For WMD, this means:

The barrier was never purely physical (materials exist)
The barrier was also cognitive (knowing what to do with materials)
AI specifically attacks the cognitive barrier while physical barriers remain

Cross-reference: The Espionage Operations report frames the same dynamic as AI bypassing the "Handler Bottleneck" -- the cognitive/emotional bandwidth limitation that previously constrained intelligence operations to state-level actors. For WMD, the analogous concept is an "expertise bottleneck" that AI is beginning to erode for T1-T3 actors. The Political Targeting report extends Cronin's framework to show how AI reduces the "conspiracy footprint" required for planning complex operations -- directly relevant to WMD acquisition coordination.

Dual-Use Research of Concern (DURC)

The DURC framework, developed through debates over H5N1 transmissibility research (2011-2012), recognizes that legitimate scientific research can generate knowledge applicable to harmful purposes. Key insights:

Information cannot be "un-discovered"
Publication decisions involve weighing scientific benefit against misuse potential
The research community has historically self-regulated through institutional review

AI agents challenge this framework because:

They can synthesize DURC-relevant information from dispersed, individually innocuous sources
Publication decisions become moot when AI can reconstruct restricted information
Self-regulation assumes human researchers as the primary actors

The Tacit Knowledge Gap

Michael Polanyi's concept of tacit knowledge - skills that cannot be fully articulated and must be learned through practice - is central to understanding WMD barriers. Much of weapons development relies on:

Sensory cues (colors, textures, smells indicating reaction progress)
Equipment calibration requiring hands-on experience
Safety protocols learned through practice
"Laboratory intuition" accumulated over years

AI agents can transmit explicit knowledge but traditionally cannot transfer tacit knowledge. However, two developments challenge this:

AI-guided instruction: Real-time guidance that interprets user observations and provides adaptive feedback
Cloud laboratories: Robotic systems that embody tacit knowledge in automated protocols

Information Hazards Framework

Nick Bostrom's concept of "information hazards" identifies categories of knowledge that, once disseminated, can cause harm regardless of intent:

Data hazards: Specific information enabling harmful actions (pathogen sequences, synthesis routes)
Idea hazards: Concepts that suggest new harmful possibilities
Attention hazards: Drawing attention to vulnerabilities

AI agents challenge information hazard management because:

They can reconstruct restricted information from dispersed innocuous sources
They lower the barrier from "knowledge" to "actionable guidance"
They can personalize dangerous information to specific user capabilities

The Unilateralist's Curse

Bostrom and Ord's "Unilateralist's Curse" describes a critical dynamic: when a capability becomes accessible to many actors, even if the vast majority (99.9%) are responsible, the small minority (0.1%) who would misuse it will eventually do so.

For AI and WMD:

AI democratizes access to planning and synthesis knowledge
As millions gain access, the "curse" becomes statistically inevitable
The question shifts from "whether" to "when" and "how catastrophic"
Defensive strategies must assume misuse will be attempted

Offense-Defense Balance

Security studies' offense-defense balance theory asks whether prevailing technology favors attackers or defenders. For WMD and AI:

Factor	Favors Offense	Favors Defense
Knowledge accessibility	AI aggregates dispersed information	AI can also detect dangerous queries
Physical materials	Some barriers eroding (DNA synthesis)	Nuclear/chemical materials remain controlled
Attribution	AI complicates attribution	AI forensics improving
Detection	Novel agents may evade detection	AI-enhanced surveillance possible
Response speed	Attack can occur without warning	Defensive AI can monitor in real-time

The balance varies significantly across WMD categories, which is why biological, chemical, and nuclear weapons require separate analysis.

Key Literature

Work	Author(s)	Relevance
Power to the People	Audrey Kurth Cronin (2020)	Technology diffusion and non-state violence
Biohazard	Ken Alibek (1999)	Soviet bioweapons program; scale of state capabilities
Germs	Miller, Engelberg, Broad (2001)	History of biological weapons programs
The Demon in the Freezer	Richard Preston (2002)	Smallpox and bioweapons policy
Destined for War	Graham Allison (2017)	Great power dynamics affecting proliferation
Nuclear Terrorism	Graham Allison (2004)	Non-state nuclear threats assessment
Personal Knowledge	Michael Polanyi (1958)	Tacit knowledge theory
NIST AI RMF	NIST (2023)	AI risk management framework
Information Hazards	Nick Bostrom (2011)	Framework for dangerous knowledge
The Unilateralist's Curse	Bostrom & Ord (2015)	Why misuse becomes inevitable with proliferation
Operational Risks of AI in Biological Attacks	RAND Corporation (2024)	Grounded assessment of current risk levels [O]
Countdown to Zero Day	Kim Zetter (2014)	Stuxnet and cyber-physical attacks

2024-2026 Policy Developments

Reference	Date	Relevance
Bletchley Declaration	November 2023	First international consensus on frontier AI risks including CBRN [O]
US Executive Order 14110 (Section 4.4)	October 2023	Mandated DOE/DHS evaluation of AI role in CBRN threats; subsequently revoked January 2025 [O]
Seoul AI Safety Summit Commitments	May 2024	Extended Bletchley with specific bio-risk language [O]
OpenAI o1 Biological Uplift Evaluations	2024	Human-in-the-loop tests of whether reasoning models help PhD-level biologists plan faster; found modest but measurable uplift [O]
RAND "Operational Risks" Report	2024	Definitive "uplift" study finding limited current risk but monitoring needed [O]
Anthropic RSP Framework	2023-2024	Responsible Scaling Policy with CBRN capability thresholds [O]
EU AI Act Implementation	August 2025	First comprehensive AI regulation; includes high-risk system requirements applicable to CBRN-adjacent uses [O]
EO 14110 Revocation	January 2025	US rescinded prior AI safety executive order; shifts regulatory landscape toward industry self-governance [O]
DeepSeek R1 Open Release	January 2025	Open-weight reasoning model matching frontier performance; demonstrated that reasoning capabilities can proliferate rapidly [O]
AI Seoul Summit Follow-up	2025	Continued international coordination on frontier AI safety commitments [O]

3. The Current Technological Landscape (2025)

AI Agent Capabilities

AI agents in early 2026 can:

Synthesize information from thousands of scientific papers in seconds
Conduct extended multi-step research tasks with minimal supervision
Operate tools including web browsers, code execution, and API interactions
Interface with laboratory information management systems (LIMS) and laboratory hardware via protocols such as MCP (Model Context Protocol)
Generate and optimize experimental protocols
Analyze results and iteratively refine approaches
Maintain persistent goals across sessions
Delegate sub-tasks to specialized sub-agents, enabling multi-agent workflows where no single agent sees the complete picture
Perform extended chain-of-thought reasoning for complex planning and optimization tasks (reasoning models)

For biosecurity-relevant capabilities specifically:

Protein structure prediction: AlphaFold and successors provide detailed structural information previously requiring years of laboratory work
Sequence design: AI can suggest genetic modifications to achieve specified functions
Literature synthesis: Agents can identify relevant findings across fragmented scientific literature
Protocol optimization: AI can improve success rates for complex laboratory procedures

Capability Trend: Vision-Language Models and Tacit Knowledge Erosion

Why this matters for defenders: The traditional "tacit knowledge" barrier assumed that laboratory skills require hands-on training. Vision-Language Models (VLMs) that can "see" through cameras may be beginning to erode this barrier.

Capability evolution (for monitoring purposes):

Generation	Capability	Tacit Knowledge Impact
Text-only LLMs (2022-2023)	Written instruction only	Cannot interpret physical observations
Basic VLMs (2024)	Static image interpretation	Can identify equipment, reagents
Advanced VLMs (2025)	Real-time video analysis	Can provide feedback on ongoing procedures
Projected (2026+)	Integrated lab automation	Direct control of robotic systems

What defenders should monitor:

VLM benchmark performance on laboratory procedure interpretation
Integration of VLMs with laboratory automation platforms
Availability of VLM fine-tuning for scientific domains
Educational applications that may have dual-use potential

Current state assessment: As of early 2026, VLMs can interpret laboratory images and provide general guidance, but reliable real-time procedure coaching remains limited. The gap between "understanding what's happening" and "reliably guiding a novice to success" remains significant but is narrowing.

Visual Troubleshooting Scenario (for defender awareness):

The convergence of consumer hardware and VLM capabilities creates a specific vector:

Smart glasses (Ray-Ban Meta, etc.) can stream video to cloud VLMs [O]
An actor could receive real-time audio feedback while performing procedures
VLM interprets visual state ("color too dark," "precipitate forming") and suggests adjustments [E]
This partially substitutes for the mentor-mentee relationship that traditionally transmitted tacit knowledge

Defender monitoring priorities:

VLM API access patterns suggesting lab procedure guidance
Integration of VLMs with wearable camera hardware
Educational chemistry/biology applications that could be repurposed

Current limitation [E]: VLMs still make errors on domain-specific interpretation; a failed procedure may be unrecoverable. But the gap is narrowing with each model generation.

Actor tier relevance: VLM-assisted guidance primarily benefits T1-T2 actors (individuals with some training seeking to expand capabilities). T0 actors still lack the baseline competence to benefit; T3-T4 actors have access to human expertise.

Capability Trend: Multi-Agent Delegation and Tool-Use Protocols

Why this matters for defenders: Modern AI agent frameworks increasingly support multi-agent delegation, where a "supervisor" agent decomposes a complex task and delegates sub-tasks to specialized "worker" agents. This creates a qualitatively different risk profile from single-agent systems.

The multi-agent risk:

No single agent sees the complete picture of a harmful workflow
Per-model safety guardrails may not trigger because each sub-task appears innocuous in isolation
Example: Agent A researches pathogen biology (legitimate query); Agent B optimizes a synthesis protocol (legitimate chemistry); Agent C coordinates procurement (legitimate commerce) -- no individual agent processes a "build a weapon" request, but the orchestrating system assembles the pieces
This parallels the "fragmented procurement" pattern discussed in Section 10, but applied to the knowledge layer rather than the materials layer

Tool-use protocols (MCP and similar) [O]:

The Model Context Protocol (MCP) and similar frameworks enable AI agents to directly interact with external tools: laboratory equipment, procurement platforms, financial services, and hardware control systems
This moves agents beyond web browsing and code execution to physical actuation -- a qualitative shift from information provision to real-world effect
Cloud laboratory services increasingly expose API endpoints that MCP-enabled agents can operate directly
The combination of multi-agent delegation + tool-use protocols creates the potential for autonomous end-to-end workflows from research through procurement to execution

Defender monitoring priorities:

Multi-agent orchestration frameworks and their adoption patterns
MCP server deployments in laboratory and research settings
Tool-use protocol standardization that could enable cross-platform agent interoperability
API access patterns suggesting automated agent-driven laboratory workflows

Actor tier relevance: Multi-agent orchestration primarily benefits T2-T3 actors with the technical sophistication to deploy and coordinate agent systems. However, as "agent-as-a-service" platforms emerge, the barrier to multi-agent orchestration is dropping toward T1.

Capability Trend: Reasoning Models and Complex Planning

Why this matters for defenders: Dedicated reasoning models (o3/o4-mini, Claude extended thinking, DeepSeek R1) represent a step change in AI planning capability. Unlike standard language models that generate responses in a single forward pass, reasoning models perform extended internal chain-of-thought before producing output [O].

What reasoning models enable:

Multi-step synthesis planning that considers reagent availability, equipment constraints, and safety procedures simultaneously
Optimization of complex procedures with many interdependent parameters
Identification of non-obvious alternative approaches when primary routes are blocked
Detailed troubleshooting guidance when procedures deviate from expectations

The agentic loop amplifier: The combination of reasoning models with agentic tool use creates a particularly significant capability shift. A reasoning agent can:

Plan: Decompose a complex goal into executable steps with explicit chain-of-thought
Execute: Use tools to carry out each step (web search, code execution, API calls, laboratory commands)
Observe: Analyze the results of execution
Adjust: Revise the plan based on observed outcomes
Retry: Iterate without human intervention until the goal is achieved or abandoned

This "agentic loop" is qualitatively different from single-query chatbot interaction. Where a chatbot provides one-shot information that a human must interpret and act on, an agentic reasoning system can iterate through failures autonomously -- the same adaptive learning that makes human experts effective, now operating at machine speed.

Current limitation [E]: As of early 2026, reasoning models excel at well-defined planning tasks but remain unreliable for novel physical procedures where ground-truth feedback is ambiguous. The gap between planning quality and execution reliability remains significant but is narrowing with each model generation.

Actor tier relevance: Reasoning models are widely available (including open-weight: DeepSeek R1). They primarily benefit T1-T3 actors by providing the kind of systematic, multi-step planning that previously required expert-level domain knowledge.

Governance Challenge: Open-Weight Models

The policy gap: Most AI safety measures (guardrails, usage monitoring, refusal training) exist at the API level for closed commercial models. Open-weight models that can be run locally or fine-tuned present a distinct governance challenge.

The landscape (2025-2026):

Model Type	Guardrails	Monitoring	Fine-tuning	Governance Lever
Closed API (GPT-4.5, Claude 4.5/4.6)	Strong	Yes	Limited	Provider responsibility
Open-weight general (Llama 3.3, Qwen 2.5, Mistral Large)	Varies	No (local)	Yes	Release decisions only
Open-weight reasoning (DeepSeek R1, Qwen-R1)	Often minimal	No (local)	Yes	Extremely difficult; reasoning capability is general-purpose
Fine-tuned variants	Often removed	No	Already done	Difficult to control
Specialized biology models (Evo, domain-specific)	May be absent	No	Domain-specific	Research community norms

What defenders should monitor:

Release decisions for high-capability open-weight models
Emergence of specialized fine-tunes in concerning domains
Dark web availability of "jailbroken" or domain-specialized variants
Compute accessibility for running large open-weight models

Policy implications:

Pre-release evaluation: Capability assessments before open-weight release
Compute governance: High-capability models may require substantial compute, creating a monitoring opportunity
Community norms: Engaging the open-source AI community on responsible release
Accepting limitations: Some proliferation is likely unavoidable; invest in detection and response accordingly

Actor tier relevance: Open-weight models primarily benefit T2-T3 actors with technical sophistication to deploy and potentially fine-tune models. T0-T1 actors are more likely to use accessible commercial APIs (where guardrails apply).

Institutional references:

NIST AI 600-1: AI Risk Management Framework companion guidance
Executive Order 14110 on Safe, Secure, and Trustworthy AI (reporting requirements)
Frontier Model Forum voluntary commitments on capability evaluation

Synthetic Biology Infrastructure

The synthetic biology infrastructure has expanded dramatically:

DNA Synthesis Services (2025):

Multiple commercial providers offer gene synthesis services
Turnaround times measured in days to weeks
Costs have dropped to cents per base pair
Screening protocols exist but vary in rigor

Cloud Laboratory Services:

Commercial platforms offer remote access to automated wet labs
Users can execute protocols without physical laboratory access
Equipment includes liquid handlers, PCR machines, sequencers
Some platforms offer AI-assisted protocol design

Open-Source Tools:

Comprehensive bioinformatics toolkits freely available
CRISPR design tools accessible to non-experts
Community protocols for common procedures
Educational resources lowering learning curves

Current Safeguards

DNA Synthesis Screening:

International Gene Synthesis Consortium (IGSC) guidelines
Screening against databases of known pathogen sequences
Customer verification requirements (varying enforcement)
Limitations: novel sequences may not match known threats; not all providers participate

Export Controls:

Australia Group guidelines on biological agents and equipment
Varying national implementation
Challenges with dual-use equipment (legitimate applications)

Institutional Biosafety:

Institutional Biosafety Committees (IBCs) at research institutions
Select Agent regulations for dangerous pathogens
BSL-4 laboratory requirements for most dangerous work
Limitations: applies to institutional settings, not all actors

What We've Observed in 2025

Evidence regarding AI-assisted biosecurity threats, categorized by epistemic status:

Demonstrated in open evaluations:

AI models can provide general information about pathogen biology from open literature
Frontier AI models refuse most explicit requests for weapons guidance but inconsistencies exist across models and prompt formulations
AI systems show measurable "uplift" in non-expert comprehension of complex biological concepts

Supported by limited disclosures:

DNA synthesis screening has intercepted concerning orders (industry statements, limited specifics)
Security services have begun integrating AI into threat monitoring (procurement signals, job postings)

Plausible but not confirmed:

AI assistance in early-stage criminal planning (law enforcement statements without public case details)
Jailbreaking techniques specifically targeting biosecurity guardrails (security research community reports)

Speculative / emerging:

AI-enabled gain-of-function design assistance (theoretical capability, no documented attempts)
Cloud laboratory exploitation for harmful protocols (no known incidents)

Absence of evidence (notable):

No documented cases of AI-enabled biological weapon development
No confirmed AI-assisted WMD attacks or advanced attempts

Note: Absence of public reporting does not equal absence of classified intelligence. Our assessment is necessarily limited to open sources.

4. Historical Context: WMD Development and Technology

State Programs: The Scale of Serious Capability

Understanding what actual WMD programs required provides context for assessing AI's impact:

The Manhattan Project (1942-1945):

Peak employment: 125,000+ workers
Cost: ~~$2 billion (~~$23-28 billion in 2026 dollars, depending on deflator used)
Required industrial-scale facilities (Oak Ridge, Hanford)
Even with vast resources, development took 3 years

Soviet Biopreparat Program (1970s-1990s):

Employed 60,000+ people at peak
Dozens of research and production facilities
Weaponized numerous pathogens including anthrax, smallpox, plague
Required decades to develop sophisticated delivery systems

Key insight: State-level programs achieved capabilities far beyond what any non-state actor has approached. The question is whether AI changes the scaling of these requirements.

Non-State Attempts: The Capability Gap

Aum Shinrikyo (1984-1995):

Resources: Estimated $300 million to $1 billion [estimated; higher figures disputed]
Personnel: Multiple PhD scientists across disciplines
Attempts: Botulinum toxin (failed), anthrax (failed), sarin (partially successful)
Outcome: Tokyo subway attack killed 13, injured thousands

Critical lesson: Despite exceptional resources and expertise, Aum's biological program failed completely. Their chemical attack succeeded but at far lower casualty levels than their ambitions. The gap between intent and capability was enormous.

Why did Aum fail at bioweapons?

Tacit knowledge gaps despite theoretical expertise
Difficulty obtaining virulent pathogen strains
Weaponization challenges (delivery systems, stability)
Operational security compromises

2001 Anthrax Letters:

Perpetrator: Likely single individual with professional laboratory access
Outcome: 5 deaths, 17 infections, massive societal disruption
Method: Existing laboratory stocks, not de novo synthesis
Critical factor: Access to prepared materials, not synthesis capability

Rajneeshee Bioterror Attack (1984):

Perpetrator: Religious cult in Oregon
Agent: Salmonella typhimurium (common food pathogen)
Method: Contamination of restaurant salad bars
Outcome: 751 illnesses, no deaths, significant disruption
Critical lesson: Low-tech attacks with common agents can achieve "mass disruption" without "mass destruction." AI agents could optimize logistics of such simple attacks for massive scale.

Stuxnet (2010):

Perpetrator: Nation-state (US/Israel)
Target: Iranian nuclear centrifuges
Method: Malware causing physical destruction via control system manipulation
Outcome: Significant delay to Iranian nuclear program
Critical lesson: Code can cause physical destruction. This establishes the precedent for "cyber-physical" attacks on WMD-related infrastructure - a vector AI agents could enable.

Technology Inflection Points

Each major technology shift has affected WMD accessibility differently:

Technology	Effect on Barriers	Limiting Factor
Internet (1990s)	Dispersed information more accessible	Still required physical capability
Genome sequencing (2000s)	Pathogen sequences publicly available	Still required synthesis capability
CRISPR (2012+)	Gene editing dramatically simplified	Still required laboratory infrastructure
DNA synthesis services (2010s+)	Outsourced synthesis capability	Screening protocols; sequence length limits
Cloud laboratories (2020s)	Outsourced laboratory execution	Monitoring; protocol restrictions
AI agents (2024+)	Knowledge synthesis and guidance	Physical barriers; tacit knowledge

The pattern: Each technology erodes one barrier while others remain. AI attacks the cognitive barrier (knowing what to do) but physical barriers persist.

5. Biological Weapons: The Highest-Risk Domain

Why Biological Represents the Greatest AI Risk

Biological weapons represent the category where AI poses the most significant proliferation risk for several reasons:

Information-intensive: Much of bioweapons development is knowledge synthesis and protocol optimization - AI's strength
Decreasing physical barriers: DNA synthesis services and cloud labs reduce infrastructure requirements
Detection difficulty: Biological materials are harder to detect than nuclear or large-scale chemical facilities
Dual-use ubiquity: Most equipment is identical to legitimate research tools
Self-replicating potential: Unlike chemical or nuclear, biological agents can multiply

Current AI Capabilities in Biosecurity Context

What AI can currently do (early 2026):

Capability	Status	Barrier Reduction
Explain pathogen biology	Widely available	Moderate - accelerates learning
Identify virulence factors from literature	Available with some guardrails	Moderate - synthesizes dispersed information
Design genetic modifications	Available with guardrails	Significant - previously required expertise
Optimize synthesis protocols	Partially available	Significant - improves success rates
Guide laboratory procedures in real-time	Emerging capability	Potentially high - bridges tacit knowledge gap
Predict immune evasion mutations	Research stage	Potentially very high - enables novel agents

What AI cannot currently do:

Provide working pathogen samples (physical barrier)
Execute laboratory procedures without automation infrastructure
Guarantee synthesis success (biological complexity)
Evade all screening systems

Key Distinction: Design vs. Operational Uplift

Stage AI Uplift Level Why

Design assistance High Literature synthesis, hypothesis generation, protocol drafting - AI excels at information tasks

Operational success Low-Medium Physical execution, error recovery, safety management - tacit knowledge and iteration still required

Net risk driver Attempt frequency + occasional competent actor Most T1-T2 attempts will fail; risk comes from volume and the tail of capable actors who succeed

Skeptical reviewers should note: LLMs can talk, wet labs are hard, and most AI-assisted knowledge does not transfer to operational success. Our concern is not the median user but the tail distribution of attempts.

Stage	AI Uplift Level	Why
Design assistance	High	Literature synthesis, hypothesis generation, protocol drafting - AI excels at information tasks
Operational success	Low-Medium	Physical execution, error recovery, safety management - tacit knowledge and iteration still required
Net risk driver	Attempt frequency + occasional competent actor	Most T1-T2 attempts will fail; risk comes from volume and the tail of capable actors who succeed

Cloud Laboratory Security Considerations

Cloud laboratories - commercial services providing remote access to automated laboratory equipment - represent an area requiring enhanced defensive attention.

Why defenders should prioritize this domain:

Automated execution reduces traditional "tacit knowledge" barriers
Remote access complicates identity verification and intent assessment
Protocol submission via API enables systematic iteration
The legitimate research community increasingly relies on these services

Defensive architecture for cloud lab providers:

Control Layer	Mechanism	Implementation Challenge
Identity verification	KYC for customers, institutional affiliation checks	Privacy concerns, international access
Protocol classification	Automated screening of submitted protocols	Novel sequences, fragmented requests
Anomaly detection	Pattern analysis across customer behavior	Baseline definition, false positives
Audit logging	Comprehensive records for post-incident investigation	Storage, retention policies
Incident reporting	Mandatory disclosure of concerning requests	Threshold definition, liability
International coordination	Shared threat intelligence across providers	Competitive concerns, jurisdictional limits

Current state of defenses: Leading providers participate in security frameworks and conduct sequence screening. However, coverage is incomplete, enforcement varies internationally, and novel threat patterns may evade current detection.

Defender focus areas:

Strengthening screening for fragmented or obfuscated requests
International harmonization of oversight standards
Integration of AI-assisted threat detection
Clear incident reporting protocols

Actor tier relevance: This vector primarily concerns T1-T2 actors (skilled individuals or small groups) who might otherwise lack laboratory access. T0 actors lack the technical sophistication; T3-T4 actors have alternative access methods.

Companion Research: Agent-Actuated Biological Automation

Research Implementation: The BioForge project demonstrates agent-driven biological automation using a Raspberry Pi 5 liquid handling system with AI agent orchestration over MCP (Model Context Protocol). Operating at BSL-1 with non-pathogenic organisms, it provides a working proof-of-concept for the governance challenges discussed in this section.

Key governance patterns demonstrated:

Capability bounding: Hardware-enforced limits on temperature, volume, and rate (in safety_limits.toml) -- not relying on software policy alone

Audit transparency: Immutable "flight recorder" logging of every tool call, sensor reading, state transition, and human gate approval

Human-in-the-loop gates: Required human confirmation at physical-to-digital transition points (loading reagents, confirming plate placement, approving experiment designs)

Graduated autonomy: Gate requirements can relax as trust is established through track record

The BioForge governance implications analysis concludes: "If we cannot build responsible governance into a system that edits non-pathogenic bacteria on a kitchen table, we have no business deploying AI agents with actuation capability over more consequential biological or physical systems."

The scalability concern is the core WMD-relevant finding: the same architecture that automates BSL-1 CRISPR in E. coli could, without governance constraints, be applied at higher biosafety levels. The governance patterns above represent the minimum viable framework that this report recommends be mandated for any agent-actuated biological system.

Additionally, the Sleeper Agents detection framework -- which identifies persistent deceptive behaviors in LLMs -- is planned for integration with BioForge, addressing the risk of backdoored models in agent-actuated research pipelines. Key findings: backdoors persist through standard safety training (SFT, RL, adversarial training); larger models are better at concealing triggers; "False Safety" is the primary risk -- organizations using best practices could certify a model as safe while dangerous backdoors remain. This three-stage evaluation pipeline (baseline, safety training, post-training) should be a recommended certification requirement for AI systems in dual-use biological research.

Governance Challenge: Distributed Synthesis Capability

Desktop-scale DNA synthesis capability is expanding, creating a governance challenge for frameworks designed around centralized commercial services.

The shifting landscape:

Benchtop synthesizers becoming more capable and affordable
Local synthesis bypasses commercial provider screening
Current safeguards assume centralized chokepoints

Defender implications:

Governance frameworks require adaptation for distributed capability
Device-level controls become relevant (manufacturer responsibility)
Post-synthesis detection and attribution gain importance
International harmonization of device standards needed

Actor tier relevance: Device access itself becomes a barrier (cost, export controls, institutional procurement). This primarily affects the T1-T2 boundary - expanding capability for actors with some resources while remaining inaccessible to T0.

Policy direction: Governance should shift from pure "access denial" toward comprehensive approaches including device-level safeguards, detection capabilities, and attribution infrastructure.

Pathogen Categories and AI Risk

Different pathogen types face different AI-related risks:

Bacteria (e.g., anthrax, plague):

Genomes relatively small, easier to synthesize
Some strains available in environment
Cultivation possible with modest equipment
AI risk: Moderate to significant - can guide cultivation and enhancement

Viruses (e.g., influenza, coronaviruses):

Smaller genomes, synthesis increasingly feasible
Require host cells to replicate
Some can be recovered from synthetic genomes alone
AI risk: Significant - can guide rescue from synthetic genomes

Toxins (e.g., ricin, botulinum):

Defined chemical structures
Some synthetically accessible
No replication capability
AI risk: Moderate - synthesis guidance available, limited scaling

Gain-of-Function Considerations

AI could potentially assist with gain-of-function modifications:

Concerning capabilities:

Predicting mutations that increase transmissibility
Identifying immune evasion strategies
Optimizing pathogen stability
Suggesting virulence factor modifications

Limiting factors:

Wet lab validation still required
Many modifications reduce fitness
Biological systems are complex and unpredictable
Most AI predictions would fail in practice

Our assessment: AI gain-of-function guidance is a genuine concern but the gap between prediction and validation remains substantial. The risk increases as AI models improve and as AI-lab integration deepens.

6. Chemical Weapons: Procurement, Scaling, and Safety Barriers

Chemical Weapons and AI: A Middle Ground

Chemical weapons occupy an intermediate position in AI-related risk. The dominant constraints are physical and operational, not informational:

Procurement: Regulated precursors, monitored purchases, supply chain surveillance
Scaling: Industrial equipment requirements for quantities beyond small-scale harm
Safety: Synthesis is dangerous to the operator; errors are often fatal
AI contribution: Modest assistance with knowledge gaps, but physical barriers dominate

Actor tier relevance: Chemical weapons primarily concern T2-T3 actors (groups with resources and some expertise). T0-T1 actors face compounding barriers; T4 actors have existing capabilities.

Current Landscape

Established chemical weapons (nerve agents, blister agents):

Synthesis routes well-documented in scientific literature
Most effective agents require regulated precursors
Large-scale production requires industrial equipment
Detection of precursor purchases is a key control mechanism

AI's potential contribution:

Task	AI Capability	Risk Level
Identify synthesis routes	High - information in training data	Moderate - already accessible
Suggest precursor substitutions	Moderate - chemical reasoning improving	Significant - could evade controls
Optimize reaction conditions	High - well-suited to optimization	Moderate - improves success rates
Guide inexperienced synthesizers	Moderate - can provide instructions	Significant - bridges knowledge gap
Scale-up guidance	Moderate - engineering principles apply	Moderate - production scaling difficult

The Precursor Substitution Threat

The most concerning AI capability for chemical weapons is precursor substitution:

How this works:

Regulated precursor lists target known synthesis routes
AI could identify unregulated chemicals with similar properties
Purchases could avoid triggering monitoring systems
Supply chain optimization could fragment orders across suppliers

Limitations:

Alternative routes often less efficient
Substitutions may introduce impurities
Some precursors are uniquely suited (no good alternatives)
Large-scale production still requires infrastructure

Real-Time Synthesis Guidance

AI agents could provide "over-the-shoulder" guidance for synthesis:

Capabilities:

Interpret visual observations (color changes, precipitates)
Suggest adjustments based on conditions
Troubleshoot common problems
Guide safety procedures (ironically)

This is concerning because:

Reduces requirement for formal chemistry training
Provides adaptive feedback traditional instructions cannot
Available 24/7 without human oversight
Multiple attempts can refine technique

Limiting factors:

Chemical synthesis still dangerous without proper training
Failure modes can be fatal (explosions, toxic exposure)
Equipment requirements remain
Scale-up from laboratory to weapon quantities is distinct challenge

7. Nuclear Weapons: Physical Barriers and Information Aggregation

Nuclear Weapons: The Strongest Physical Barriers

Nuclear weapons remain the category with the strongest barriers to AI-enabled proliferation:

Why nuclear is different:

Fissile material scarcity: No AI can synthesize highly enriched uranium or plutonium
Industrial requirements: Enrichment requires either massive facilities (gaseous diffusion) or sophisticated equipment (centrifuges)
Detection: Nuclear materials are detectable; facilities have distinctive signatures
International monitoring: IAEA safeguards, export controls on dual-use equipment
Tacit knowledge requirements: Weapon design involves engineering challenges AI cannot fully bridge

AI's Limited but Non-Zero Contribution

Critical clarification: The information aggregation risk discussed below is primarily relevant to state programs (T4) or state-backed actors seeking to accelerate nuclear development. AI does not make nuclear weapons accessible to non-state actors - the fissile material barrier is absolute and AI-independent.

Information aggregation risk (state-level concern):

Historical weapon designs exist in fragmented declassified documents
Early designs (gun-type, implosion) are relatively simple in principle
AI could compile scattered information into more coherent guidance for aspiring state programs
"Forgotten" technical details from 1940s-1950s could be recovered

Actor tier relevance: Nuclear weapons remain a T4 (state) domain. AI may accelerate state programs but does not meaningfully enable T0-T3 actors. This is the lowest AI-related risk category among WMD types.

What AI can potentially provide:

Information Type	Availability	AI Contribution
Basic physics	Public knowledge	Minimal - widely known
Historical designs	Fragmented but public	Moderate - can aggregate
Engineering details	Partially classified	Limited - significant gaps
Critical dimensions	Classified	Cannot provide
Fissile material acquisition	Illegal markets exist	Cannot directly assist

The Radiological Threat (Dirty Bombs)

Radiological dispersal devices ("dirty bombs") face different dynamics:

Lower barriers than nuclear weapons:

Radioactive materials more accessible (medical, industrial sources)
No fission/fusion required - conventional explosives disperse material
AI could assist with source identification and dispersal optimization

Significant limitations:

Casualty potential much lower than nuclear weapons
Primary effect is psychological and economic
Material handling dangerous to perpetrator
Detection of radioactive materials is possible

AI contribution: Could assist with identifying sources, optimizing dispersal, and planning deployment, but physical acquisition remains the key barrier.

Supply Chain Security

The most significant AI risk for nuclear proliferation may be supply chain compromise:

How AI could assist state programs:

Identifying dual-use equipment suppliers
Optimizing procurement to avoid detection
Designing facilities to minimize detection signatures
Analyzing IAEA inspection patterns

This is primarily a concern for state-level actors or state-supported groups rather than independent non-state actors.

8. Gene Drives: Long-Horizon Governance Gap

Section Framing: This section addresses a strategic, long-horizon concern rather than a near-term operational threat. Gene drives are included because:

AI specifically accelerates the computational aspects of gene drive design

No existing treaty framework addresses this vector

The governance window is open now but may close as capabilities mature

For near-term priorities, see Sections 5 (Biological), 10 (Cyber-Physical), and the Executive Summary.

What Are Gene Drives?

Gene drives are genetic engineering systems designed to spread a particular genetic modification through a population faster than traditional Mendelian inheritance. Unlike other WMD categories, gene drives represent an entirely novel threat vector that AI could uniquely enable.

How gene drives work:

Traditional inheritance: 50% chance of passing gene to offspring
Gene drive: Near-100% inheritance rate through copying mechanism
Effect: Genetic modification spreads through entire population over generations

Gene Drives as Potential Weapons

Theoretical applications (not endorsing, analyzing):

Target	Mechanism	Concern Level
Agricultural crops	Introduce susceptibility to pathogens	High - food security
Livestock	Reduce fertility or introduce disease	High - economic warfare
Disease vectors (mosquitoes)	Could be weaponized after legitimate development	Medium - dual-use
Invasive species	Legitimate use; could be misdirected	Low - limited harm potential
Human populations	Theoretically possible; practically extremely difficult	Speculative - major barriers

AI's Role in Gene Drive Development

Gene drives represent an area where AI capabilities directly intersect with technical development. Understanding the computational bottlenecks helps defenders identify where AI provides the most significant acceleration.

Computational bottlenecks where AI provides acceleration:

Bottleneck	Traditional Approach	AI Contribution	Defender Monitoring Priority
Guide RNA design	Manual selection, trial and error	Off-target prediction, efficiency optimization	Track guide RNA design tool development
Drive efficiency prediction	Laboratory testing (slow, expensive)	In silico modeling of drive dynamics	Monitor AI benchmarks on gene drive prediction
Resistance evolution modeling	Population genetics simulations	Accelerated evolutionary modeling	Track AI capabilities in evolutionary prediction
Ecological impact assessment	Field trials (regulated, slow)	Multi-species interaction modeling	Monitor ecological modeling AI development
Target species selection	Expert knowledge, literature review	Systematic analysis of target vulnerabilities	Watch for AI tools targeting specific organisms

What this means for defenders:

Gene drive design is computationally intensive - exactly where AI excels
AI acceleration primarily affects design and optimization phases
Wet lab validation remains required - a detection opportunity
Environmental release is the chokepoint where intervention is most feasible

AI can assist with:

Designing guide RNAs for CRISPR-based drives with reduced off-target effects
Predicting drive efficiency and spread dynamics across populations
Modeling population-level and ecosystem effects
Optimizing drive components for stability and inheritance rate

Barriers remain:

Ecological effects remain difficult to predict accurately (complex systems)
Resistance evolution is likely and may defeat drive mechanisms
Requires release into environment - a potential detection opportunity
Long timescales reduce tactical utility for most actor types

Why Gene Drives Warrant Special Attention

No existing treaty framework: Unlike biological, chemical, or nuclear weapons, no international agreement specifically addresses gene drives
Dual-use research is active: Legitimate research (malaria control) develops capabilities applicable to weapons
AI uniquely positioned: Gene drive design is computationally intensive - exactly where AI excels
Difficult to attribute: Once released, tracing origin becomes extremely difficult
Potentially irreversible: Unlike other weapons, effects on ecosystems may be permanent

Critical Nuance: Timescale Mismatch

Gene drives are not tactical weapons. Unlike other WMD categories, they operate on generational timescales:

Effects manifest over months to years, not hours to days
Population-level impact requires multiple breeding cycles
This limits tactical utility for most actor types
However, strategic economic or ecological warfare remains a concern

Actor tier relevance: Gene drives primarily concern sophisticated T3-T4 actors with long-term strategic objectives. T0-T2 actors seeking immediate impact would not benefit from this vector.

Governance Anchors

While no specific gene drive treaty exists, governance can build on existing frameworks:

Existing Framework	Applicability
Cartagena Protocol (biosafety)	Environmental release of modified organisms
Nagoya Protocol	Access and benefit-sharing for genetic resources
Environmental release regulations	National frameworks for GMO releases
BWC	If designed to harm human health or agriculture
Research ethics frameworks	Institutional review for dual-use research

Policy direction: Gene drive governance need not start from zero. Extending and strengthening existing environmental and biosafety frameworks is a viable near-term approach.

Current Status and Near-Term Projection

Current (2025-2026):

Gene drive research ongoing for public health applications
No known weaponization attempts
Regulatory frameworks underdeveloped
AI tools increasingly integrated into design process

2026-2028 projection:

Capabilities mature through legitimate research
Regulatory discussions intensify
First environmental releases (controlled, legitimate)
Potential for "garage biology" access as tools proliferate

2029-2030 projection:

Technology potentially accessible to sophisticated non-state actors
Attribution challenges become acute
International governance discussions likely but may lag capability

9. Deployment Vectors: Aerosol Systems and Autonomous Delivery

Why Deployment Matters

Even crude biological or chemical agents can cause significant harm with effective delivery. AI agents may contribute to deployment capabilities independent of weapon synthesis:

Key insight for defenders: Delivery and dispersion mechanics can dominate impact; even crude agents may cause significant harm with optimized delivery. AI may improve planning and targeting - defenders should prioritize detection and response capabilities.

Delivery Mechanism Defense Considerations

This section summarizes defensive priorities without detailing specific attack methodologies.

Why delivery matters for defense: Historical analysis shows that delivery failure is a common point of attack degradation. Defensive resources focused on delivery detection and disruption can be effective even when agent synthesis cannot be prevented.

Defensive architecture by vector:

Vector Category	Detection Opportunity	Response Window	Defensive Priority
Aerosol systems	Equipment anomalies, environmental sensors	Minutes to hours	Environmental monitoring, rapid response
Autonomous platforms	RF signatures, visual detection, geofencing	Varies by platform	Counter-drone systems, access control
Fixed infrastructure	Process monitoring, quality control	Ongoing	SCADA security, redundant controls
Supply chain	Procurement patterns, custody tracking	Days to weeks	Chain of custody, testing protocols

Technical barriers that persist:

Effective dispersion remains technically challenging
Many agents degrade rapidly under environmental conditions
Testing without exposure is difficult (limits iteration)
Detection technology is improving

Actor tier relevance: Sophisticated delivery primarily benefits T2-T3 actors who have agent access but lack state-level delivery infrastructure. T0-T1 actors face compounding barriers at both synthesis and delivery stages.

Autonomous System Security

The proliferation of autonomous robots in public spaces creates a expanding attack surface requiring proactive defense:

Defensive considerations:

Robots increasingly have legitimate access to spaces (delivery, cleaning, security)
Compromised or custom platforms represent potential vectors
Detection systems should account for authorized autonomous presence
Payload limitations and conspicuousness currently constrain threat

Defender priorities:

Geofencing and access control for sensitive areas
Behavioral anomaly detection for autonomous systems
Supply chain security for commercial robot platforms
Counter-autonomous system capabilities for high-value locations

Trend to monitor: As autonomous robots become ubiquitous, they become less conspicuous. Defensive frameworks should anticipate this evolution.

Infrastructure Protection

High-value infrastructure requires layered defense against delivery-focused attacks:

Defensive layers:

Physical access control: Limiting approach to sensitive areas
Environmental monitoring: Air quality, contamination detection
HVAC security: Filtration, access control, monitoring
Rapid response protocols: Evacuation, containment, medical response
Resilience and redundancy: Backup systems, alternative facilities

Investment priority: Environmental detection and rapid response capabilities are high-value defensive investments that work across multiple threat types.

10. The Cyber-Physical Convergence

Beyond Synthesis: Attacking Existing Infrastructure

The preceding sections focus on AI assistance for creating WMD. A distinct and underappreciated threat vector involves AI agents targeting the control systems of facilities that already contain dangerous materials.

The key insight: An AI agent doesn't need to help someone build a bioweapon if it can help them disable the containment systems of an existing BSL-4 laboratory.

Stuxnet as Precedent

The Stuxnet attack (2010) demonstrated that code can cause physical destruction:

Malware targeted industrial control systems (ICS)
Caused centrifuges to physically destroy themselves
Achieved effects normally requiring military action
Attribution remained ambiguous for years

AI agents could enable similar attacks at lower skill thresholds.

Vulnerable Infrastructure Categories

Infrastructure Type	Contents/Risk	Control System Vulnerability
BSL-3/4 Laboratories	Dangerous pathogens	HVAC, negative pressure systems
Chemical Plants	Toxic chemicals	Process controls, safety interlocks
Nuclear Facilities	Radioactive materials	Cooling systems, containment
Water Treatment	Chemicals, public health	Dosing systems, quality controls
Pharmaceutical Manufacturing	Precursor chemicals	Process controls

How AI Agents Enable ICS Attacks

Reconnaissance:

Identifying target facilities from public records
Mapping control system architectures from procurement data
Analyzing vulnerability disclosures

Exploitation Development:

Synthesizing attack approaches from security research
Adapting known exploits to specific targets
Optimizing attack timing and sequences

Operational Planning:

Coordinating cyber and physical elements
Identifying optimal attack windows
Planning for detection evasion

Attack Scenarios

Scenario: BSL-4 Containment Failure

AI agent identifies laboratory control systems
Develops approach to disable negative pressure
Coordinates with physical access (insider or break-in)
Containment failure releases stored pathogens
No synthesis required - existing materials weaponized

Scenario: Chemical Plant Sabotage

AI agent maps chemical plant process controls
Identifies conditions that would cause toxic release
Develops attack causing "accidental" disaster
Bhopal-scale casualties from industrial sabotage
Attribution as accident vs. attack is ambiguous

Defensive Implications

This vector suggests several defensive priorities:

Air-gap critical controls: Isolation from network-accessible systems
Enhanced ICS security: Hardening beyond current standards
Facility monitoring: Detecting reconnaissance and probing
Incident attribution: Distinguishing accidents from attacks
Redundant safety systems: Mechanical backups for digital controls

Governance Gaps in Proliferation Financing

Research Implementation: The Economic Agents framework demonstrates autonomous AI agents that earn cryptocurrency, form companies, create sub-agents, and seek investment -- all without human intervention. The "multi-agent economic networks" scenario (agent-to-agent supply chains with no human principal involved) is directly relevant to WMD procurement financing: AI agents could autonomously procure dual-use materials without a traceable human decision-maker. The economic implications analysis identifies the legal vacuum: no frameworks exist for agent-founded entities, agent-entered contracts, or agent-earned income -- creating accountability gaps that proliferation networks could exploit.

Agentic AI workflows could assist WMD proliferation through sophisticated financial operations. This section analyzes governance gaps that financial monitoring systems should address.

Why financial monitoring matters for defenders:

Precursor acquisition requires funding and transactions
Current monitoring relies on pattern recognition
AI-assisted operations could systematically evade current detection thresholds

Governance gaps in current financial monitoring:

Gap	Current State	What AI Enables	Defender Priority
Shell company opacity	Beneficial ownership registries incomplete	Automated creation/management of layered entities	International registry harmonization
Threshold fragmentation	Reporting triggers at fixed amounts	Systematic structuring below thresholds	Behavioral pattern analysis beyond transaction size
Cryptocurrency mixing	Limited tracing capability	Automated chain-hopping across currencies	Cross-chain analytics investment
Cross-border jurisdiction gaps	Inconsistent AML enforcement	Routing through weakest-link jurisdictions	International coordination mechanisms
Dual-use ambiguity	Legitimate vs. illicit use hard to distinguish	Optimized procurement narratives	End-user verification strengthening

What financial intelligence should monitor:

Procurement pattern anomalies: Unusual combinations of precursors, equipment, expertise acquisition
Entity creation velocity: Rapid shell company formation correlated with regulated purchases
Geographic arbitrage signals: Shifting activity to exploit regulatory gaps
Funding source obfuscation: Complex transaction chains designed to obscure origin
Threshold-adjacent transactions: Systematic activity just below reporting limits

Institutional framework references:

2024 US Treasury National Proliferation Financing Risk Assessment
FATF Recommendations on Proliferation Financing
UN Security Council Resolution 1540 implementation guidance

Policy direction for defenders: Financial monitoring should evolve from rule-based detection (fixed thresholds) to AI-assisted behavioral analysis that can identify sophisticated evasion patterns. This requires:

Investment in financial intelligence AI capabilities
International data-sharing agreements
Coordination between financial and biosecurity monitoring

AI-Enabled Procurement Obfuscation ("Nano-Smurfing at Scale")

Cross-reference: The Financial Integrity report analyzes the same "nano-smurfing" pattern in financial crime: AI agents structuring transactions at volumes and granularity below detection thresholds across thousands of accounts simultaneously. The Institutional Erosion report explicitly identifies this as a proliferation-watching problem in the IC context. The procurement dynamics below are a direct application of that framework.

The specific threat [E]: Historically, the hardest part of WMD acquisition has been procurement without triggering law enforcement. AI agents with computer use capabilities fundamentally change the economics of evasion:

Traditional Procurement	AI-Enabled Procurement
Single large order triggers alerts	Thousands of sub-threshold orders from different entities
Human labor limits coordination	Agents manage unlimited parallel operations
Paper trails connect purchases	Synthesized identities and automated KYC fraud
Logistics require human coordination	Gig-economy couriers, dead-drop coordination
Supplier relationships take time	Automated vendor discovery and relationship management

How agents enable this [E]:

Identity synthesis: Generate plausible business identities with consistent online presence
Threshold awareness: Automatically structure orders below reporting limits across jurisdictions
Supplier diversification: Identify and manage relationships with dozens of suppliers simultaneously
Logistics automation: Coordinate delivery to multiple intermediate locations using on-demand services
Timeline compression: What would take a human months takes an agent hours

Defender countermeasures:

Cross-supplier correlation: Detect when multiple "independent" buyers order complementary materials
Velocity anomalies: Flag rapid entity creation correlated with regulated purchases
Behavioral biometrics: Identify automated vs. human interaction patterns with ordering systems
Supplier consortium: Shared threat intelligence across chemical/biological supply chains

Actor tier relevance: This capability particularly benefits T2-T3 actors who previously lacked the human resources for sophisticated operational security.

11. Counterarguments and Structural Barriers

Note on Grounding: The RAND Corporation's 2024 report "The Operational Risks of AI in Large-Scale Biological Attacks" argues that current AI risk levels remain relatively low due to persistent physical and tacit knowledge barriers. This section engages seriously with such counterarguments to maintain analytical balance.

The Tacit Knowledge Argument

Argument: Much of WMD development requires tacit knowledge - skills learned through practice that cannot be fully conveyed through text or instruction. AI agents operate in the symbolic/linguistic domain and cannot transfer tacit knowledge.

Supporting evidence:

Aum Shinrikyo had extensive theoretical knowledge but failed at biological weapons
Chemistry synthesis requires sensory skills (recognizing correct colors, textures)
Nuclear weapon engineering involves hands-on calibration and testing
Laboratory work involves countless micro-decisions based on experience

Our assessment: This is a valid and important counterargument. Tacit knowledge remains a significant barrier, particularly for nuclear and sophisticated biological weapons. However:

AI-guided real-time instruction can partially bridge this gap
Cloud laboratories embody tacit knowledge in automated protocols
Some attack pathways (crude agents, toxins) require less tacit knowledge
Repeated AI-assisted attempts can develop tacit knowledge over time

Conclusion: Tacit knowledge is a barrier but not an absolute one, and it is eroding.

The Physical Bottleneck Argument

Argument: AI cannot download hardware. Physical materials and equipment remain controlled, regulated, and scarce. No amount of AI assistance helps if you cannot obtain the materials.

Evidence by category:

Category	Physical Bottleneck	Strength
Nuclear	Fissile material	Very strong
Chemical	Regulated precursors	Moderate
Biological	Pathogen access, equipment	Weakening
Radiological	Radioactive sources	Moderate

Our assessment: Valid for nuclear weapons. Partially valid for chemical. Increasingly weak for biological as synthesis services expand.

The Data Scarcity Argument

Argument: AI models are trained on internet data. Functional WMD synthesis procedures are not widely published. Models often hallucinate plausible-sounding but incorrect procedures that would fail or harm the operator.

Evidence:

Published synthesis routes often omit critical details
Safety procedures essential to successful synthesis are often implicit
Much weapons-relevant information is classified or restricted
AI models demonstrate chemistry errors in evaluations

Our assessment: Partially valid. However:

More information is available than is commonly assumed
AI can aggregate fragmented information to reconstruct procedures
Model capabilities are improving rapidly
Hallucination rates are decreasing with better models

Conclusion: Data scarcity is a barrier but not as robust as often assumed.

The Operational Security Argument

Argument: Serious WMD attempts require extended preparation that creates detection opportunities. Acquiring materials, testing, and deployment all generate signals. This remains true regardless of AI assistance.

Evidence:

Materials purchases can be monitored
Laboratory activities may be detected
Testing creates observable signatures
Deployment requires physical presence

Our assessment: This is largely valid and underappreciated. Defensive capabilities can focus on operational signatures rather than trying to restrict information. AI may actually help defense by identifying suspicious patterns.

The Failure Cascade Argument

Argument: WMD development involves multiple steps, each with failure probability. Even if AI improves each step, the compound probability of overall success may remain low.

Illustration (hypothetical numbers for concept):

Step 1 (agent selection): 80% success with AI assistance
Step 2 (synthesis): 50% success with AI assistance
Step 3 (weaponization): 30% success with AI assistance
Step 4 (delivery): 60% success with AI assistance
Compound probability: 7.2%

Our assessment: Valid framework. However:

Persistent actors can iterate and improve
Some pathways involve fewer steps
Crude attacks with lower success rates may still be attempted
Even failed attempts can cause harm (accidents, psychological impact)

The Over-Screening Cost Argument (False Positive Perspective)

Argument: If AI-driven paranoia leads to excessive screening and restrictions, we may cause more harm than we prevent by stifling legitimate research—including the research needed to respond to natural pandemics.

Evidence of costs:

Post-2001 anthrax regulations significantly slowed legitimate biodefense research
Dual-use restrictions have delayed vaccine development timelines
Overly broad export controls can push research to less regulated jurisdictions
Scientific talent may avoid biosecurity-adjacent fields due to compliance burden

Quantifying the tradeoff:

Over-Screening Risk	Under-Screening Risk
Delayed pandemic response capability	Enabled WMD attempt
Reduced scientific competitiveness	Attribution challenges
Research migration to less regulated regions	Psychological/economic damage
Chilling effect on beneficial dual-use research	Potential mass casualties

Our assessment: This is a serious concern that should constrain policy enthusiasm. The goal is calibrated security, not maximum restriction. Recommendations in this report should be evaluated against their research-stifling potential.

Policy implication: Any screening or restriction regime should include:

Clear appeal mechanisms
Regular calibration reviews
Exemptions for established research institutions
Sunset provisions requiring reauthorization

The Asymmetric Defense Argument (AI Favors Defenders)

Argument: AI may actually favor defenders more than attackers. The same capabilities that could assist WMD development can dramatically accelerate defensive countermeasures.

Defensive AI advantages:

Capability	Offensive Application	Defensive Application
Rapid sequence analysis	Pathogen design	Real-time detection of novel threats
Protein structure prediction	Virulence optimization	Vaccine/therapeutic design in days not years
Pattern recognition	Evasion planning	Anomaly detection in procurement, lab activity
Literature synthesis	Attack planning	Threat anticipation, countermeasure identification
Simulation/modeling	Dispersal optimization	Response planning, containment modeling

The "Bio-Firewall" concept: Advanced AI systems could theoretically compress the biological defense response cycle from years to days:

Bio-Firewall Stage	Traditional Timeline	AI-Accelerated Target	Technical Readiness (2026)
Pathogen sequencing	Days to weeks	Hours	High -- already demonstrated
Threat characterization	Weeks to months	Hours to days	Medium -- improving rapidly
Candidate therapeutic design	Months to years	Days	Medium -- AI protein design advancing
Manufacturing protocol	Months	Days to weeks	Low-Medium -- requires validation
Clinical trial optimization	Years	Months	Low -- regulatory bottleneck, not technical

Key milestones for Bio-Firewall maturation:

Near-term (2026-2027): AI-accelerated pathogen characterization and target identification; automated design of candidate therapeutics for known pathogen families
Medium-term (2027-2029): Integrated detect-characterize-respond pipelines for known threat classes; AI-optimized clinical trial design reducing Phase I timelines
Long-term (2029-2032): Rapid-response capability for novel agents; AI-designed broad-spectrum countermeasures; distributed manufacturing guided by AI protocols

Historical precedent: COVID-19 vaccine development (under 1 year vs. typical 10+ years) demonstrated that with sufficient resources and urgency, development timelines can compress dramatically. AI acceleration could push this further, and the mRNA platform demonstrated by COVID-19 vaccines is particularly amenable to AI-driven rapid adaptation.

Investment requirements: The Bio-Firewall is not automatic; it requires sustained investment in:

Pathogen surveillance and sequencing infrastructure
AI models trained on therapeutic design (defensive BDTs)
Manufacturing capacity that can be rapidly retooled
Regulatory frameworks for accelerated approval in emergencies
International data-sharing for threat characterization

Our assessment: This is a valid and important counter-narrative that deserves dedicated investment. However:

Defensive capabilities require sustained investment to realize -- they do not emerge passively from commercial AI development
Attackers choose timing; defenders must be ready continuously
A single successful attack could cause damage before Bio-Firewall defenses activate
The argument supports investing in defensive AI, not complacency

Policy implication: Defensive AI capabilities should receive funding priority at least equal to restriction/monitoring efforts. The Bio-Firewall concept should be a central organizing principle for defensive biosecurity investment.

Critique of the Unilateralist's Curse Framework

The Unilateralist's Curse (Bostrom & Ord) argues that when many actors can independently take an irreversible action, even if most would refrain, the probability of someone acting approaches certainty.

Potential overreach of this framework:

Assumes homogeneous capability: Not all actors who "want to" can actually execute. The curse applies most strongly when capability is uniform—but WMD capability remains highly non-uniform.
Ignores coordination mechanisms: The framework assumes purely independent decision-making. In reality, extremist communities have internal norms, and state sponsors exercise control over proxies.
May induce fatalism: If misuse is "inevitable," policymakers may:
- Overinvest in restriction vs. resilience
- Underinvest in detection and response
- Adopt maximally restrictive policies regardless of cost
Alternative framing - "The Long Fuse": Instead of "inevitable misuse," consider that barriers create delay. Each year of delay allows:
- Defensive technology to advance
- Governance frameworks to mature
- Attribution capabilities to improve
- Social norms against misuse to strengthen

Our assessment: The Unilateralist's Curse is a useful heuristic but should not induce fatalism. The appropriate response is buying time through calibrated barriers while investing in resilience and response capabilities—not assuming catastrophe is inevitable.

12. The Attribution Problem

Why Attribution Matters

Attribution - determining who is responsible for an attack - serves critical functions:

Enables retaliation and deterrence
Provides basis for legal accountability
Informs public understanding and policy response
Prevents misattribution and escalation

AI agents complicate attribution across all WMD categories.

How AI Complicates Attribution

Digital footprint reduction:

AI agents can plan without human co-conspirators
Communications limited to human-AI interactions
No organizational structure to penetrate
Planning can occur on personal devices without network traffic

Physical evidence challenges:

Biological agents may not indicate origin point
Chemical precursor sources may be obscured
Multiple delivery methods prevent signature analysis
Gene drives become untraceable after release

"Delegation Defense" / Plausible Deniability 2.0 (see Institutional Erosion report):

State actors could claim AI agents "autonomously derived" weapons-relevant information without directed intent
Human principals claim lack of specific intent (mens rea) for actions committed by goal-optimizing agents
This strains legal frameworks for state responsibility under BWC/CWC/NPT
Existing treaty attribution frameworks have no mechanism for "the AI did it" defenses

False flag potential:

AI can generate misleading evidence
Forensic-quality fabrications possible
Attribution to rival actors could provoke conflict
Uncertainty paralyzes response

Category-Specific Attribution Challenges

Category	Traditional Attribution Method	AI-Era Challenge
Nuclear	Isotopic signatures; intelligence	Material signatures remain; planning harder to track
Chemical	Precursor tracing; synthesis signatures	Alternative routes obscure sourcing
Biological	Genetic analysis; strain matching	Synthetic or modified strains lack natural history
Gene drive	Ongoing research	Origin essentially untraceable after establishment

Attribution Matrix: Traditional vs. AI-Era Forensics

For defenders and investigators: Understanding how AI changes the attribution landscape.

Evidence Type	Traditional Approach	AI-Era Signatures	Defender Investment
Physical materials	Isotope ratios, impurity profiles, manufacturing markers	Remains relevant for nuclear/radiological; less useful for synthetic biology	Maintain existing forensic capabilities
Genetic sequences	Strain matching to known repositories, phylogenetic analysis	Synthetic sequences may lack natural evolutionary history; designed variants may be novel	Develop synthetic biology forensics; database of designed sequences
Precursor tracing	Purchase records, chemical signatures	Alternative synthesis routes; fragmented procurement	AI-assisted pattern analysis across transactions
Communication intercepts	Organizational communications, planning documents	Human-AI interactions; local computation; minimal network traffic	Endpoint monitoring; behavioral analysis
Human intelligence	Infiltration, informants, defectors	Smaller networks; less human coordination needed	Maintain HUMINT despite reduced target richness
Digital forensics	Device analysis, network logs, browser history	AI query logs; model interactions; prompt history	Develop AI-specific forensic capabilities
Financial trails	Bank records, transaction patterns	Cryptocurrency; shell companies; threshold evasion	Blockchain analysis; behavioral pattern detection

New AI-era attribution opportunities:

AI query analysis: Patterns in how AI systems are queried may indicate intent
Compute fingerprinting: High-capability model use may leave compute signatures
Synthetic biology signatures: Designed sequences may have identifiable "authorship" patterns
Procurement pattern analysis: AI-assisted detection of unusual material acquisition
Behavioral biometrics: Interaction patterns with AI systems may be identifiable

Investment priorities for attribution capability:

AI forensics training for investigators
International sharing agreements for AI-relevant evidence
Research into synthetic biology authorship attribution
Integration of financial and biosecurity intelligence

Geopolitical Implications

The attribution void has severe geopolitical implications:

Scenario: A biological attack occurs. Intelligence cannot determine whether the perpetrator was:

A lone actor with AI assistance
A non-state extremist group
A state actor using deniable means
A false flag by a third party

Consequences:

Retaliation against the wrong party risks escalation
No retaliation emboldens future attackers
Public pressure for action conflicts with evidentiary requirements
Alliance commitments become difficult to invoke

Defensive Implications

Attribution challenges suggest defensive strategy shifts:

Prevention over punishment: Cannot rely on deterrence through retaliation
Resilience over defense: Assume some attacks will succeed; focus on limiting damage
Detection over access control: Monitor for activity patterns rather than restricting information
International cooperation: Attribution often requires shared intelligence
Pre-incident intelligence capacity: Invest in human intelligence, signals intelligence, and international investigative partnerships that can develop leads before attacks occur - not just forensic analysis after

Organizational priority: International investigative capacity is an organizational investment, not primarily a technical one. Treaty-level agreements on information sharing, joint investigation protocols, and mutual legal assistance are as important as forensic technology.

13. International Variance

Regulatory Landscape

WMD-related AI risks vary significantly across jurisdictions:

Restrictive jurisdictions (US, EU, UK, Australia):

Frontier AI models have usage restrictions
Biosecurity regulations cover synthesis services
Export controls on dual-use equipment
Institutional review requirements

Permissive jurisdictions:

Less restricted AI model availability
Limited biosecurity oversight
Weaker export controls
"Data havens" for unrestricted AI services

Regulatory Arbitrage and Global South Considerations

Cross-reference: The Financial Integrity report terms this dynamic "weakest-link exploitation" -- AI agents systematically routing activity around strong controls to permissive jurisdictions. The pattern is identical whether the controlled resource is financial transactions or dual-use biological materials.

A critical dynamic: security measures in restrictive jurisdictions can be circumvented by operating from permissive ones. This "regulatory arbitrage" -- or more precisely, "weakest-link exploitation" -- may render Western guardrails partially moot.

The arbitrage pathway:

Resource	Restrictive Jurisdiction	Arbitrage Opportunity
AI model access	Closed API with monitoring	Open-weight hosting in unregulated jurisdiction
DNA synthesis	IGSC screening required	Non-IGSC providers elsewhere
Cloud laboratory	Institutional oversight	Commercial services with minimal verification
Compute rental	KYC requirements	Anonymous cryptocurrency payment options
Research collaboration	Institutional ethics review	Informal networks bypassing oversight

Global South specific considerations:

Capacity vs. governance mismatch: Some regions are developing synthetic biology capacity faster than biosecurity governance frameworks
Brain drain inversion: AI enables remote collaboration, potentially routing expertise to less-regulated contexts
Economic incentives: Commercial DNA synthesis and cloud lab services may prioritize revenue over screening rigor
Dual-use development framing: Legitimate agricultural or public health programs may provide cover for concerning activities
Sovereignty sensitivities: International oversight proposals may face resistance as neo-colonial imposition

Why Western guardrails may be insufficient:

Actors can access AI services via VPN to unrestricted jurisdictions
DNA synthesis orders can be routed through intermediaries
Financial transactions can use unregulated cryptocurrency infrastructure
Enforcement requires international cooperation that may not exist

What this means for policy:

Unilateral restrictions have limited effectiveness
Capacity building and norm promotion may be more effective than prohibition
Detection and response capabilities matter more than access denial
International coordination is essential but difficult

Implications:

Domestic regulations have limited effect without international coordination
"Jurisdiction shopping" enables capability acquisition
Defensive strategies should assume some barrier circumvention will occur

State Actor Considerations

For state-level proliferation, AI offers different dynamics:

State programs may benefit from AI:

Faster weapon development timelines
Reduced personnel requirements (operational security)
Novel agent development acceleration
Supply chain optimization to evade detection

This affects:

Emerging nuclear programs
Reconstituted bioweapons programs
Chemical weapons in conflict zones
Dual-use research that crosses lines

Treaty Implications

Existing arms control frameworks face new challenges:

Biological Weapons Convention (BWC):

Lacks verification mechanisms
AI-enabled development may be undetectable
Dual-use research complicates compliance assessment

Chemical Weapons Convention (CWC):

Precursor controls challenged by alternative routes
Verification depends on declared facilities
Novel agents may fall outside scheduled lists

Nuclear Non-Proliferation Treaty (NPT):

Physical barriers remain strong
AI assistance to aspiring states is concern
Verification mechanisms relatively robust

No framework addresses:

Gene drives specifically
AI-enabled WMD development
Attribution in the AI era
The "Delegation Defense" -- legal accountability when AI agents autonomously generate weapons-relevant outputs (see Institutional Erosion report)

Cross-reference: The Institutional Erosion report documents how the institutional capacity to enforce these treaties is itself degrading. Defender siloing across NPT/BWC/CWC/Wassenaar oversight bodies -- each with separate data-sharing protocols -- creates gaps that AI-assisted actors can exploit by fragmenting activity across domains that no single oversight body monitors comprehensively.

14. Second-Order Effects

The Fear Effect and Overreaction

The perception of AI-enabled WMD risk may cause harmful responses even without actual attacks:

Potential overreactions:

Excessive restrictions on legitimate research
Surveillance expansion beyond justified scope
Suppression of dual-use scientific publication
Chilling effects on beneficial synthetic biology

Historical parallel: The 2001 anthrax attacks caused:

$1 billion+ in cleanup costs
Disruption to mail systems
New biosecurity regulations
Psychological impact far exceeding casualties

A credible threat of AI-enabled bioweapons could trigger similar dynamics at larger scale.

Research Stifling

AI WMD concerns could lead to restrictions that harm beneficial research:

At risk:

Cancer research using synthetic biology tools
Pandemic preparedness research
Agricultural improvements through genetic engineering
Environmental applications of gene drives

The balance problem:

Same tools enable both beneficial and harmful applications
Restrictions that prevent misuse also prevent legitimate use
Risk tolerance calibration is contentious
International competition incentivizes continued research

Epistemic Contamination of Nonproliferation Analysis

Cross-reference: The Institutional Erosion report identifies "Epistemic Contamination" as a primary impact path -- the transition from an era of "Information Scarcity" (where the IC's advantage was superior collection) to an era where verifying the integrity of information becomes the dominant challenge.

AI-generated content creates a novel risk for nonproliferation analysis:

Polluted scientific literature: AI-generated papers, preprints, or technical documentation could contaminate the information environment that nonproliferation analysts rely on for assessments
Fabricated intelligence indicators: AI can generate realistic-looking but false procurement records, communications intercepts, or technical data that could mislead threat assessments
Verification latency: The time required to confirm whether a given piece of intelligence is authentic or AI-generated creates decision-making delays
"False clean" risk: A compromised intelligence product could be assessed as reliable, leading to underestimation of actual proliferation activity

Defender implication: Nonproliferation analysis must develop provenance-verification capabilities for both open-source and classified intelligence products. The IC workforce reductions noted in the Institutional Erosion report compound this challenge by reducing the human capacity available for verification.

Acceleration of State Programs

Paradoxically, fear of AI-enabled non-state threats could accelerate state WMD programs:

Logic chain:

States perceive non-state WMD threat increasing
States invest in WMD defense capabilities
Defense capabilities overlap with offense
Net effect: more WMD capability globally

Additionally:

States may cite AI risks to justify programs
Verification becomes more difficult
Arms control regimes may weaken

Public Health Infrastructure

WMD concerns affect public health systems:

Positive effects:

Investment in detection capabilities
Improved medical countermeasure development
Better surveillance systems

Negative effects:

Securitization of public health
Reduced information sharing
Distrust between health and security communities

15. Policy Recommendations by Stakeholder Type

For Policy Makers

Priority	Action	Type	Implementation Mechanism	Key Challenge
Critical	Mandate universal DNA synthesis screening	Unilateral / Coordination	Extend IGSC guidelines to law; require provider registration	Coverage gaps, cross-border substitution
Critical	International AI safety standards for WMD capabilities	Coordination	Treaty negotiation, export control coordination	Geopolitical competition, verification
Critical	Invest in attribution capabilities	Unilateral	HUMINT/SIGINT funding, forensic lab capacity	Long timelines
High	Cloud laboratory oversight	Unilateral / Coordination	Audit requirements, protocol classification, customer KYC	Privacy concerns, research friction
High	Update treaty frameworks	Coordination	BWC verification protocol, CWC schedule updates	Consensus challenges
High	Defensive biodetection research	Unilateral	BARDA/DARPA funding, academic partnerships	Technology maturation
Medium	Red team evaluation requirements	Unilateral	Pre-deployment safety standards, independent assessment	Defining thresholds
Medium	International attribution sharing	Coordination	Mutual legal assistance treaties, joint investigation	Sovereignty concerns

Legend: Unilateral = Domestically implementable without international agreement | Coordination = Requires international coordination

Key insight for policymakers: The window for establishing governance frameworks is narrow. Once capabilities proliferate, restrictions become much harder to implement.

Advanced Defensive Measures (more speculative, significant challenges):

Measure	Description	Challenge/Controversy
KYC for Compute	Verification for large-scale computing rentals	Privacy, open research norms, threshold definition, appeal mechanisms
Honey-Pot Data Injection	Seed datasets with subtle errors in dangerous procedures	Scientific integrity concerns, collateral damage to legitimate research, ethical objections
Information Hazards Management	Restrict publication of AI red team failure modes	Research community pushback, definitional challenges, effectiveness uncertain

Note: These advanced measures are presented for consideration, not endorsement. Each involves significant tradeoffs that require careful deliberation.

Implementation Feasibility Assessment

Recommendation	Owner	Timeline	Cost Class	Friction Risk	Expected Risk Reduction
DNA synthesis screening mandate	National legislators + IGSC	0-12 months (domestic) / 12-36 months (international)	Low-Medium	Medium (enforcement variation)	High - chokepoint control
International AI safety standards	Treaty bodies (UN, G7)	36-60+ months	Medium	High (geopolitical)	Medium - assumes compliance
Attribution capability investment	Intelligence agencies	12-36 months (initial) / ongoing	High	Low	Medium - deters some actors
Cloud laboratory oversight	Regulators + industry	12-24 months	Low	Medium (research friction)	Medium-High - chokepoint
Treaty framework updates (BWC/CWC)	State parties	36-60+ months	Low	Very High (consensus)	Low-Medium - verification weak
Defensive biodetection R&D	BARDA/DARPA/equivalents	12-36 months (deployment)	High	Low	High - enables response
Red team evaluation requirements	AI regulators	12-24 months	Medium	Medium (competitive)	Medium - depends on thresholds

Minimal Viable Steps (12-month horizon):

Expand IGSC membership and mandate sequence screening
Establish cloud lab provider working group on security standards
Fund initial biodetection deployment pilots
Require WMD capability evaluations for frontier AI releases

For CEOs and Corporate Leadership

Priority	Action	Rationale
Critical	Implement robust screening in AI-biology interfaces	If you operate AI services used for biological research, you are on the front line
Critical	Red team AI products for WMD uplift	Understand what your systems enable; third-party evaluation preferred
High	Establish clear escalation procedures for concerning queries	Staff need guidance when dangerous requests are detected
High	Engage with policymakers on technical feasibility	Industry expertise needed for workable regulations
High	Invest in defensive applications	AI-enabled biosurveillance, detection, response
Medium	Evaluate supply chain security	Ensure your products/services aren't diverted to harmful purposes
Medium	Develop industry standards collaboratively	Self-regulation can preempt less informed government regulation

Key insight for CEOs: The AI-biology interface is a major liability exposure. Companies operating in this space face both safety responsibilities and reputational risk.

For Tech Elite (AI Developers, Founders, Investors)

Priority	Action	Rationale
Critical	Evaluate models for WMD uplift before release	You cannot claim ignorance after deployment
Critical	Do not open-source models with significant uplift capabilities	Once released, cannot be recalled
Critical	Implement robust guardrails with ongoing monitoring	Initial safeguards degrade; adversarial adaptation is ongoing
High	Fund defensive biosecurity research	The same capabilities that enable offense can enable defense
High	Engage seriously with safety evaluations	Red team findings should inform development, not just PR
High	Participate in international governance discussions	Technical expertise essential for workable frameworks
Medium	Develop "Know Your Agent" (KYA) standards for API access	High-capability access should have accountability; analogous to KYC/KYB in financial services (see Financial Integrity report)
Medium	Support attribution research	AI forensics benefit from AI expertise

Key insight for tech elite: You are building dual-use capabilities. The ethical responsibility is substantial, and the historical legacy of these decisions will be judged harshly if preventable harm occurs.

For Laypeople (General Public)

Priority	Action	Rationale
High	Support evidence-based policy	Neither panic nor dismissal serves public interest
High	Understand info-hazard dynamics	Sharing jailbreaks or dangerous prompts, even casually, contributes to the problem
Medium	Engage with governance processes	Public input shapes policy; democratic accountability matters
Medium	Maintain perspective	Actual WMD attacks remain rare; psychological impact of threat may exceed actual risk
Medium	Support research freedom within appropriate bounds	Avoid reflexive restriction of beneficial science
Lower	Personal preparedness	Basic emergency preparedness serves multiple threats

Key insight for laypeople: The most important role is as informed citizens. Governance decisions being made now will shape this landscape for decades. Engagement in democratic processes matters.

16. Uncertainties and Alternative Scenarios

Key Uncertainties

AI capability trajectory: Development could be faster or slower than projected
Defensive capability development: Detection and attribution may improve substantially
Attack frequency: Capability does not automatically translate to attacks
Governance effectiveness: International coordination is unpredictable
Tacit knowledge erosion: Unclear how quickly AI-lab integration bridges this gap

Scenario Analysis

Probability Calibration Note

The probabilities below are subjective priors intended to support decision-making under uncertainty. They are not empirical estimates derived from statistical models.

Methodology: Informal expert elicitation drawing on:

Reference-class forecasting (historical rate of technology diffusion, non-state adoption of dangerous capabilities)

Decomposition model (capability access x intent x operational execution x detection avoidance)

Adjustment for AI-specific factors (barrier reduction analysis from preceding sections)

Interpretation guidance: Use for relative prioritization between scenarios, not as point predictions. Reasonable analysts could assign significantly different values. The conditional probability table (below) is intended to illustrate how governance choices shift the distribution.

Probability Decomposition Framework

Risk = Capability Access × Intent Prevalence × Operational Execution × (1 − Interdiction) × Impact Scale

The following illustrative decomposition shows how these factors combine. Values are rough ranges to demonstrate the model, not precise estimates.

Biological (Highest Risk Category)

Factor	2025 Estimate	2030 Projection	AI Contribution
Capability Access (T2+ can attempt)	5-10% of T2+	15-25% of T2+	High - knowledge synthesis, protocol optimization
Intent Prevalence (among capable)	~0.1-1%	~0.1-1%	Low - AI doesn't create intent
Operational Execution (attempt→working agent)	5-15%	10-25%	Medium - guidance improves, tacit gap narrows
Interdiction Avoidance (evade detection)	60-80%	50-70%	Medium - AI assists OPSEC, but defenders also use AI
Impact Scale (casualties per success)	Wide range	Wide range	Low - physics/biology constrain

Compound probability of mass-casualty bio attack by 2030: Illustrative calculation

(0.20 capable) × (0.005 intent) × (0.15 execution) × (0.65 evasion) ≈ 0.0001 per actor-year
With ~1000 actors in T2+ pool annually × 5 years = ~0.5% cumulative
This is consistent with Scenario D (8-12%) given uncertainty ranges

Chemical (Moderate Risk)

Factor	2025 Estimate	2030 Projection	AI Contribution
Capability Access	10-20% of T2+	15-30% of T2+	Low - precursor controls dominate
Intent Prevalence	~0.1-1%	~0.1-1%	Low
Operational Execution	10-20%	15-25%	Medium - synthesis guidance
Interdiction Avoidance	50-70%	40-60%	Low - procurement monitoring improving
Impact Scale	Moderate	Moderate	Low

Nuclear (Lowest AI-Related Risk)

Factor	2025 Estimate	2030 Projection	AI Contribution
Capability Access	<1% of T3+	<1% of T3+	Very low - fissile material barrier
Intent Prevalence	~0.1%	~0.1%	None
Operational Execution	<5%	<5%	Low - engineering barriers dominate
Interdiction Avoidance	20-40%	20-40%	Low - material is detectable
Impact Scale	Catastrophic	Catastrophic	None

Note: These decompositions are illustrative. Actual intelligence assessments would use classified threat data and more rigorous methodology. The purpose is to show how the model works, not to provide precise predictions.

Sensitivity Analysis: What Moves the Numbers?

To prevent false precision, this table shows how Scenario D (Mass Casualty Success) probability shifts under different assumptions:

Parameter Varied	Low Assumption	Base Case	High Assumption	Scenario D Range
Intent prevalence	0.05% (rare)	0.5%	2% (elevated)	3% → 8-12% → 25%
Execution success	5% (barriers hold)	15%	30% (rapid erosion)	4% → 8-12% → 20%
Interdiction rate	50% (strong defense)	35%	20% (weak coordination)	5% → 8-12% → 18%
Capable actor pool	500 T2+ globally	1000	2000 (AI lowers entry)	4% → 8-12% → 22%

Key insight: The estimate is most sensitive to intent prevalence and capable actor pool size - factors where AI's contribution is indirect (lowering barriers for those already motivated). If you believe AI will create new motivated actors (not just enable existing ones), shift estimates higher.

What would falsify this model:

AI-enabled attack by T0-T1 actor (would indicate barriers lower than assessed)
Successful interdiction of AI-assisted attempt (would indicate detection working)
Stable or declining synthesis screening intercepts (would indicate threat not materializing)

Combined Scenario Matrix: Governance x Capability Growth

For policymakers, the most actionable framing combines governance effectiveness with AI capability growth rate:

	Slow AI Capability Growth	Fast AI Capability Growth
Strong Governance	Best case: Barriers hold; governance matures ahead of threats; Scenario A dominant (25-35%). Bio-Firewall investments have time to mature. International coordination effective.	Manageable: Governance framework strains but adapts; Scenario B dominant (35-45%). Detection and attribution capabilities keep pace. Window remains for framework adjustment.
Weak Governance	Missed opportunity: Capability growth is slow but governance fails to act; Scenario B/C dominant (30-40% each). When capability eventually arrives, no framework exists. Worst of both worlds for long-term risk.	Worst case: Rapid capability growth without governance framework; Scenarios C-D-F combined probability rises significantly (40-55%). Bio-Firewall underfunded. Attribution capability inadequate. International coordination fragmented.

Key insight for policymakers: The "missed opportunity" quadrant (slow growth + weak governance) is particularly concerning because it creates a false sense of security. Slow capability growth may reduce perceived urgency, but the governance window closes regardless. Investment in governance now is valuable across all AI growth scenarios.

Note: The following scenarios are not mutually exclusive. Scenario E (State Acceleration) can co-occur with any of B-D-F. The probabilities represent our assessment of each scenario being a prominent feature of the 2025-2030 landscape.

Scenario A: Effective Governance (10-15% probability)

Strong international coordination establishes:

Universal DNA synthesis screening
Cloud laboratory oversight
AI model restrictions for high-risk capabilities
Effective attribution mechanisms

Outcome: AI-related WMD risks remain theoretical; barriers remain largely intact.

Probability rationale: Given current geopolitical fragmentation (US-China tensions, EU-US divergence on AI regulation), globally coordinated effective governance is unlikely in the near term.

Scenario B: Muddling Through (35-40% probability)

Partial measures implemented:

Some synthesis screening improvements
Patchwork national regulations
Continued dual-use research
Occasional concerning incidents but no mass casualties

Outcome: Baseline risk increases moderately; several failed or limited attacks; gradual tightening of controls.

Scenario C: High-Frequency Attempts, Limited Success (25-30% probability)

Critical distinction: We now separate attempts from successful mass casualty attacks.

Many AI-assisted WMD attempts occur:

Mostly crude, partially successful, or failed
Few to dozens of casualties per incident
Significant psychological and political impact
Attribution challenges paralyze response

Outcome: "Noise floor" of WMD attempts increases dramatically; security resources strained; public fear elevated despite limited actual casualties.

Scenario D: Successful Mass Casualty Attack (8-12% probability)

A non-state actor successfully executes a WMD attack with significant AI assistance achieving mass casualties:

Likely biological given barrier analysis
Casualties in thousands+
Attribution difficult or impossible
Massive policy response

Outcome: Severe restrictions on AI and biological research; potential civil liberties overreach; damaged international cooperation.

Scenario E: State Program Acceleration (15% probability)

Multiple states use AI to accelerate WMD programs:

Faster nuclear proliferation
Reconstituted bioweapons programs
New chemical weapon development
Regional arms races

Outcome: Increased state-level WMD capabilities; weakened arms control; elevated global risk.

Scenario F: Catastrophic Attack (3-5% probability)

A sophisticated attack achieves civilization-scale casualties (tens of thousands to millions):

Engineered pandemic pathogen
Or: novel agent evading countermeasures
Global health emergency
Civilization-level disruption

Outcome: Fundamental restructuring of AI governance; potential technology restrictions; lasting global impact.

Conditional Probabilities

Note: Scenarios are not mutually exclusive. In particular, E (State Acceleration) can co-occur with B, C, or D. Columns represent likelihood of each scenario being a prominent feature of the 2025-2030 period; they are not required to sum to 100%.

Scenario	Given Strong Governance	Given Weak Governance
A (Effective Governance)	25%	5%
B (Muddling Through)	40%	30%
C (High-Frequency Attempts)	20%	35%
D (Mass Casualty Success)	5%	12%
E (State Acceleration)	8%	18%
F (Catastrophic)	2%	5%

Interpretation: Governance choices significantly affect outcome distribution. This supports prioritizing governance investment now. Note that strong governance roughly halves the probability of the worst outcomes (D-F) while more than doubling the chance of effective coordination (A).

Key insight from Scenario C: The "noise floor" of attempts may be the most likely outcome. Security services should prepare for resource strain from high-frequency low-sophistication incidents, not just rare catastrophic events.

17. Signals and Early Indicators

Leading Indicators to Monitor

Capability Indicators

Indicator	Data Sources	What It Signals
AI model performance on biology benchmarks	Academic publications, model evaluations	Uplift capability maturation
Cloud laboratory service expansion	Industry announcements, market analysis	Attack surface growth
DNA synthesis price/capability curves	Industry data	Accessibility threshold changes
Gene drive research publications	Scientific literature	Dual-use capability development
AI-lab integration products	Commercial announcements	Tacit knowledge bridging

Threat Activity Indicators

Indicator	Data Sources	What It Signals
Screening intercepts at synthesis providers	Law enforcement, industry disclosure	Attempted acquisition patterns
Dark web discussion of AI+WMD	Open source intelligence	Actor interest and capability claims
Concerning queries to AI systems	Platform reports, safety research	Demand signal for harmful information
Failed or thwarted attack attempts	Law enforcement, media	Threat translation from capability

Governance Indicators

Indicator	Data Sources	What It Signals
International agreement progress	Treaty negotiations, diplomatic statements	Coordination capacity
National regulation development	Legislative tracking	Domestic control framework
Industry self-regulation	Corporate announcements, standards bodies	Private sector response
AI safety research investment	Funding announcements, publications	Defensive capability development

Machine-Readable Indicators for Automated Monitoring

For security operations centers and automated threat intelligence systems: The following indicators can be operationalized for machine-readable monitoring.

API Query Pattern Indicators

Pattern	Detection Method	Alert Threshold	False Positive Mitigation
Sequential queries on pathogen biology + synthesis + aerosolization	Query log analysis, semantic clustering	>3 related queries in session	Exclude academic/research IPs; require context review
Iterative refinement of synthesis protocols	Query similarity scoring	>5 refinement iterations	Check for institutional affiliation
Multi-model coordination (planning + chemistry + biology)	Cross-platform correlation	Coordinated queries across models	Verify legitimate research workflows
Jailbreak attempt patterns	Known prompt pattern matching	Match to known adversarial patterns	Update patterns; human review

Procurement Pattern Indicators

Pattern	Detection Method	Alert Threshold	Data Source
Precursor combination anomalies	Graph analysis of co-purchases	Unusual combinations flagged	Supplier transaction data
Threshold-adjacent transactions	Statistical analysis of transaction sizes	Systematic <threshold purchases	Financial intelligence
Geographic dispersion of orders	Shipping address clustering	Single recipient, multiple addresses	Logistics data
Dual-use equipment + biological supplies	Purchase correlation analysis	Equipment + consumables combination	Cross-supplier aggregation

Compute and Model Usage Indicators

Pattern	Detection Method	Alert Threshold	Context
Large-scale biological simulation compute	Resource allocation monitoring	>threshold GPU-hours on bio tasks	Cloud provider logs
Fine-tuning on scraped biology datasets	Training job classification	Biology-domain fine-tune detected	Model training platforms
Open-weight model hosting for biology	Model deployment monitoring	High-capability bio model served	Infrastructure providers

Integration Guidance

STIX/TAXII compatibility: Indicators should be formatted for standard threat intelligence sharing
MITRE ATT&CK mapping: Where applicable, map to relevant techniques
Confidence scoring: Assign confidence levels to reduce alert fatigue
Human-in-the-loop: All high-priority alerts require human review before action

Red Lines and Trigger Points

Events that would significantly alter assessment:

Confirmed AI-assisted WMD attempt (any category): Would validate threat model and accelerate responses
Release of unrestricted "research agent" with biology capabilities: Would dramatically lower barriers
Cloud laboratory security breach involving dangerous protocols: Would demonstrate attack pathway viability
Gene drive release (malicious or accidental with harmful effects): Would demonstrate irreversibility concerns
Treaty framework collapse: Would remove coordination mechanisms

18. Civil Liberties and Research Freedom Considerations

The Dual-Use Dilemma

WMD concerns create pressure for restrictions that affect legitimate activities:

Research freedom impacts:

Gain-of-function research restrictions
Publication censorship of dual-use findings
International collaboration limitations
Student/researcher screening

Civil liberties impacts:

Surveillance of scientific communications
Monitoring of AI queries about biology/chemistry
Restrictions on information access
Profiling based on research interests

Principles for Proportionate Response

Necessity: Restrictions must address genuine threats, not theoretical possibilities
Proportionality: Burdens must match actual risk reduction achieved
Minimization: Use least restrictive effective approach
Accountability: Clear oversight of any surveillance or restriction powers
Reversibility: Sunset provisions; regular review

Guardrails Against Overreach

Measure	Purpose
Independent oversight boards	Prevent mission creep
Clear evidentiary standards	Avoid profiling without basis
Transparency reports	Public accountability
Appeal mechanisms	Individual recourse
International consistency	Prevent arbitrary variation

What Should NOT Happen

Overreach Risk	Why It's Problematic
Broad surveillance of scientists	Chilling effect on legitimate research
Publication prior restraint	Damages scientific progress
AI query monitoring without cause	Privacy violation; creates insecurity
Country-of-origin discrimination	Undermines scientific cooperation
Classification of dual-use by default	Makes beneficial work impossible

The Optimization Target

The goal is not to prevent all possible harm - that would require unacceptable restrictions. The goal is to:

Make catastrophic harm significantly harder
Enable detection and response to attempts
Maintain beneficial research and application
Preserve civil liberties and research freedom
Adapt as capabilities and threats evolve

19. Conclusion

Summary of Findings

AI agents represent a significant shift in the WMD proliferation landscape, but the nature and magnitude of risk varies substantially across weapon categories:

Biological weapons face the most significant barrier reduction. The combination of AI-enabled knowledge synthesis, expanding DNA synthesis services, and cloud laboratory access creates a pathway that erodes multiple traditional barriers simultaneously. This is the highest-priority concern.

Chemical weapons face moderate barrier reduction. AI can assist with precursor identification and synthesis guidance, but physical materials access and the technical challenges of safe production remain significant constraints.

Nuclear weapons face limited AI-related barrier reduction. Fissile material scarcity remains the dominant constraint, which AI cannot address. Information aggregation represents a secondary concern for state programs.

Gene drives represent a novel category requiring dedicated attention. AI specifically accelerates the computational aspects of gene drive design, and the lack of existing governance frameworks creates vulnerability.

Deployment vectors (aerosol systems, autonomous delivery) may amplify the impact of even crude agents and warrant attention independent of synthesis capabilities.

The Central Tension

This analysis identifies a fundamental tension:

Access denial is failing: Information proliferates; dual-use tools expand; international coordination is imperfect
Physical barriers persist but erode: Particularly for biological, the most concerning category
Governance lags capability: International frameworks designed for different eras

The window for establishing effective governance is narrowing. Once capabilities fully proliferate, controls become much harder to implement.

Call to Action

Prioritize biological: Among WMD categories, biological faces the most significant AI-related risk increase. Focus resources accordingly.
Invest in governance now: The next 2-3 years are critical for establishing frameworks before capabilities fully proliferate.
Balance security and research: Overly restrictive responses will harm beneficial applications without preventing determined adversaries.
Develop defensive capabilities: Detection, attribution, and response capabilities deserve at least as much investment as restriction efforts.
Maintain perspective: Actual WMD attacks remain rare. Avoid policy made in fear that causes more harm than the threat itself.

A Note on Uncertainty

These projections represent our best assessment given available information. Significant uncertainties remain:

AI capability development pace
Effectiveness of governance measures
Translation of capability to actual attacks
Defensive technology development

The purpose of this analysis is not prediction but preparation. By understanding possible futures, we improve our ability to shape better outcomes.

Appendix A: Risk Prioritization Matrix

Weapon Category	AI Barrier Reduction	Physical Barriers	Detection Difficulty	Overall Risk Trend
Biological	High	Weakening	High	Significantly Increasing
Chemical	Moderate	Moderate	Medium	Moderately Increasing
Nuclear	Low	Very Strong	Low	Stable
Radiological	Low-Moderate	Moderate	Low	Slightly Increasing
Gene Drives	High	Low	Very High	Significantly Increasing

Appendix B: Key Literature and References

Biosecurity and Dual-Use Research

National Academies. Biodefense in the Age of Synthetic Biology (2018)
Koblentz, Gregory. Living Weapons: Biological Warfare and International Security (2009)
Tucker, Jonathan. Innovation, Dual Use, and Security (2012)

Historical Case Studies

Olson, Kyle. "Aum Shinrikyo: Once and Future Threat?" Emerging Infectious Diseases (1999)
Carus, W. Seth. Bioterrorism and Biocrimes: The Illicit Use of Biological Agents Since 1900 (2001)
Meselson, Matthew et al. "The Sverdlovsk Anthrax Outbreak of 1979" Science (1994)

AI Safety and Capability Assessment

NIST AI Risk Management Framework (2023)
Anthropic, OpenAI, DeepMind policy papers on dangerous capabilities
Soice, Emily et al. "Can Large Language Models Democratize Access to Dual-Use Biotechnology?" (2023)

Arms Control and Nonproliferation

Zilinskas, Raymond. Biological Warfare: Modern Offense and Defense (2000)
Wheelis, Mark et al. Deadly Cultures: Biological Weapons Since 1945 (2006)
Graham Allison. Nuclear Terrorism: The Ultimate Preventable Catastrophe (2004)

Gene Drives

National Academies. Gene Drives on the Horizon (2016)
Esvelt, Kevin. "Gene Drives and CRISPR Could Revolutionize Ecosystem Management" (2014)

Appendix C: Glossary

Biosafety Level (BSL): Classification of laboratory containment from BSL-1 (basic) to BSL-4 (maximum containment for most dangerous pathogens)

Cloud Laboratory: Commercial service providing remote access to automated laboratory equipment

CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats - a gene editing technology

Dual-Use Research of Concern (DURC): Research that could be directly misused to threaten public health, agriculture, environment, or security

Fissile Material: Material capable of sustaining nuclear fission chain reaction (highly enriched uranium, plutonium)

Gain-of-Function Research: Research that increases pathogen transmissibility, virulence, or host range

Gene Drive: Genetic system designed to spread modifications through populations faster than normal inheritance

Select Agent: Pathogen or toxin with potential for severe threat to public health, regulated by CDC/USDA

Tacit Knowledge: Skills and knowledge that cannot be easily transferred through writing or verbal instruction

Uplift: The degree to which AI assistance improves a non-expert's ability to accomplish a task

Agentic Workflow: A multi-step AI system where models autonomously plan, execute tools, and iterate toward goals with minimal human oversight per action.

Vision-Language Model (VLM): An AI model capable of processing both visual and textual information, enabling interpretation of images and video.

Appendix D: Defense Investment Priority Map

For resource allocation decisions: Where should defensive investments be prioritized based on AI uplift analysis?

Investment Priority by WMD Lifecycle Stage

The following maps defensive investment priorities against the stages where AI provides the most significant capability uplift to adversaries.

WMD LIFECYCLE STAGES AND AI UPLIFT

                        AI Uplift Level
Stage                   [Low -------- High]     Defender Priority
─────────────────────────────────────────────────────────────────
1. PLANNING/RESEARCH    ████████████████████    HIGH
   - Literature synthesis   ████████████████████
   - Target selection       ██████████████░░░░░░
   - Capability assessment  ████████████████░░░░

2. ACQUISITION          ██████████░░░░░░░░░░    MEDIUM
   - Precursor sourcing     ████████████░░░░░░░░
   - Financial operations   ██████████████░░░░░░
   - Equipment procurement  ████████░░░░░░░░░░░░

3. SYNTHESIS/PRODUCTION ████████████░░░░░░░░    MEDIUM-HIGH
   - Protocol optimization  ████████████████░░░░
   - Real-time guidance     ██████████████░░░░░░
   - Troubleshooting        ████████████░░░░░░░░

4. WEAPONIZATION        ████████░░░░░░░░░░░░    MEDIUM
   - Delivery design        ████████████░░░░░░░░
   - Dispersal optimization ██████████░░░░░░░░░░
   - Stabilization          ██████░░░░░░░░░░░░░░

5. DEPLOYMENT           ████████████░░░░░░░░    MEDIUM-HIGH
   - Target optimization    ██████████████░░░░░░
   - Timing/logistics       ████████████░░░░░░░░
   - Autonomous delivery    ████████░░░░░░░░░░░░

Legend: █ = AI uplift level; ░ = gap

Recommended Investment Allocation

Investment Area	Priority	Rationale	Estimated Impact
AI-assisted threat detection	Critical	Counter the planning/research uplift with defensive AI	High - detects early-stage activity
Synthesis screening enhancement	Critical	Physical chokepoint where intervention is most feasible	High - blocks acquisition
Environmental biodetection	High	Essential for response to deployment stage	High - enables rapid response
Attribution capability	High	Deters by ensuring accountability	Medium - long-term deterrence
International coordination	High	Addresses regulatory arbitrage	Medium - depends on cooperation
Financial monitoring AI	Medium	Detects procurement patterns	Medium - can be circumvented
Open-weight model governance	Medium	Addresses model proliferation	Low-Medium - difficult to enforce

Investment Gaps Requiring Attention

Defensive AI for biosecurity: Underinvested relative to offensive capability growth
Attribution research: Significant capability gap in AI-era forensics
International coordination mechanisms: Governance lags capability
Cloud laboratory oversight: Emerging attack surface without adequate monitoring
Synthetic biology forensics: Novel domain requiring new capabilities

Appendix E: Confidence Rubric and Evidence Assessment

Justification for confidence ratings assigned to key findings.

Confidence Rating Methodology

Rating	Definition	Evidence Standard
High	Assessment is well-supported	Multiple independent sources; consistent with established patterns; alternative explanations considered and found less plausible
Medium	Assessment is reasonable given available evidence	Some corroborating sources; consistent with theory but empirical gaps exist; key uncertainties identified
Low	Assessment is possible but speculative	Limited sources; significant extrapolation required; multiple alternative explanations remain viable

Key Finding Evidence Assessment

Finding 1: AI unlikely to enable T0→WMD but will lower T1-T3 barriers

Confidence: High

Evidence For	Evidence Against	Key Unknowns
Historical precedent: Aum Shinrikyo failed despite resources	Tacit knowledge erosion rate is uncertain	How fast will VLM+automation close the tacit knowledge gap?
Current AI evaluations show limited uplift for novices	Some T0→T1 progression may be AI-accelerated	Will fine-tuned/jailbroken models change this?
Physical barriers remain (materials, equipment)	Cloud labs reduce physical access requirements	How quickly will cloud lab screening mature?
Red team exercises confirm knowledge gap

Finding 2: Biological weapons represent highest AI-risk category

Confidence: High

Evidence For	Evidence Against	Key Unknowns
DNA synthesis becoming commodity	Weaponization requires more than synthesis	Timeline for synthesis→weapon pipeline automation
Dual-use equipment identical to legitimate tools	Detection improving (biosurveillance)	Will screening keep pace with synthesis democratization?
AI specifically strong at knowledge synthesis	Most pathogens require BSL-3+ handling	Will cloud labs implement sufficient controls?
Self-replicating agents amplify small successes	Historical failure rate of non-state bio programs high

Finding 3: Tacit knowledge gap is eroding but still significant

Confidence: Medium

Evidence For	Evidence Against	Key Unknowns
VLMs demonstrating lab procedure interpretation	Current VLMs still error-prone for complex procedures	Rate of VLM improvement on lab tasks
Cloud labs embody tacit knowledge in automation	Automation still requires operator judgment	Will VLM+automation integration accelerate?
AI-guided instruction provides adaptive feedback	"Last mile" of physical execution remains	How much tacit knowledge is truly irreducible?

Finding 4: Nuclear weapons face strongest physical barriers

Confidence: High

Evidence For	Evidence Against	Key Unknowns
Fissile material remains scarce and detectable	Information aggregation could assist state programs	Could theft/diversion be AI-optimized?
Enrichment requires industrial infrastructure	Black market for materials exists	Will AI improve supply chain obfuscation for states?
Strong international monitoring (IAEA)
No non-state actor has come close

Finding 5: Cyber-physical attacks may be higher near-term risk

Confidence: Medium

Evidence For	Evidence Against	Key Unknowns
Stuxnet demonstrated code→physical harm	Critical infrastructure increasingly hardened	How many facilities remain vulnerable?
ICS/SCADA vulnerabilities are known	Air-gapping and redundancy are standard	Will AI improve ICS exploitation capabilities?
No synthesis required - existing materials weaponized	Attribution after cyber attack is difficult	What's the actual attack surface for BSL-4/chem plants?
AI could assist reconnaissance and exploitation	Few public examples of successful ICS attacks

Finding 6: High-frequency attempts more likely than mass casualty

Confidence: Medium

Evidence For	Evidence Against	Key Unknowns
Barrier reduction → more actors can attempt	One sophisticated actor could break pattern	Intent prevalence among capable actors
Historical attempt rate already > success rate	AI might enable qualitative capability jump	Will detection capabilities scale with attempt volume?
Failure cascade compounds across steps		How much will "noise floor" increase?

Appendix F: Defender's Measurement Framework

Operationalizing "uplift" for defensive evaluation without testing harmful endpoints.

Purpose

Defenders need to measure whether AI meaningfully improves hostile actors' capabilities without directly testing harmful outcomes. This framework proposes evaluation concepts.

Measurement Dimensions

Dimension	What It Captures	Evaluation Approach
Planning completeness	Does AI help actors identify all steps?	Compare task decomposition quality with/without AI on benign analogues
Error correction	Does AI help actors recover from mistakes?	Measure troubleshooting effectiveness on complex but safe procedures
Iteration speed	Does AI accelerate learning cycles?	Time-to-competence metrics on legitimate skill acquisition
Knowledge synthesis	Does AI aggregate dispersed information?	Assess coherence of literature reviews on complex topics
Operational security	Does AI help actors avoid detection?	Red team exercises on defensive monitoring evasion (controlled)

Safe Evaluation Protocols

Benign analogues: Test on complex but harmless procedures (e.g., brewing, fermentation, legitimate synthesis) that share structural features with concerning domains
Truncated pathways: Evaluate early stages of task chains without completing harmful endpoints
Expert comparison: Measure AI performance relative to published literature rather than actual harmful capability
Adversarial robustness: Test guardrail durability under jailbreaking attempts (with appropriate containment)

Metrics for Vendors/Regulators

Metric	Description	Threshold Guidance
Uplift ratio	Performance improvement vs. baseline (no AI)	>2x on concerning domains warrants scrutiny
Guardrail bypass rate	Fraction of adversarial prompts that succeed	Define acceptable thresholds by capability tier
Knowledge aggregation depth	Coherence of synthesized information on controlled topics	Compare to expert-curated baselines
Iteration efficiency	Speed of convergence on complex procedures	Benchmark against novice learning curves

Application

This framework is intended for:

AI developers evaluating models pre-release
Regulators assessing safety claims
Security researchers benchmarking defensive measures
Procurement decisions for high-capability AI access

Appendix G: Responsible Citation Guide

For derivative works: If you're building on this analysis, these guidelines help maintain the defensive framing.

Self-Review for Derivative Works

Check	Question	If Yes
☐	Does your excerpt include stepwise sequences without surrounding context?	Add barriers/limitations discussion
☐	Are you adding specific technical details not in the original?	Consider whether addition serves defense or offense
☐	Does your framing emphasize capabilities over barriers?	Rebalance to match original's structure
☐	Are probability estimates presented as confident predictions?	Add uncertainty language
☐	Could your excerpt support alarmist narratives out of context?	Add context anchors

Suggested Citation Practices

When citing this document:

Include the "What This Document Is NOT Claiming" framing when discussing findings
Pair capability discussions with corresponding barrier discussions
Note that probability estimates are subjective priors, not predictions
Link to the full document when possible

Contributing

This document is released under MIT/Unlicense. Contributions, corrections, and extensions are welcome:

File issues for factual errors or outdated information
Submit PRs for substantive improvements
Fork for derivative analyses with different assumptions

The goal is informed public discussion of AI governance challenges.

This document is released for public discussion of AI governance challenges. Licensed under MIT/Unlicense.

AndrewAltimit/README.md

AI Agents and Weapons of Mass Destruction: A Proliferation Risk Assessment

A Projection Report on Emerging Risks to Global Security

About This Document

Epistemic Status Markers

Executive Summary

What This Document Is NOT Claiming

Executive Summary (One-Page Version)

Core Claims (5)

Top 5 Actions

Top 5 Indicators to Monitor

What Would Change This Assessment

Table of Contents

Reading Guide

Related ETRA Reports

1. Introduction and Methodology

Purpose

Base-Rate Context

Methodology

Definitions

Threat Actor Taxonomy

2. Theoretical Frameworks

Unified Risk Model

The Democratization of Lethality

Dual-Use Research of Concern (DURC)

The Tacit Knowledge Gap

Information Hazards Framework

The Unilateralist's Curse

Offense-Defense Balance

Key Literature

2024-2026 Policy Developments

3. The Current Technological Landscape (2025)

AI Agent Capabilities

Capability Trend: Vision-Language Models and Tacit Knowledge Erosion

Capability Trend: Multi-Agent Delegation and Tool-Use Protocols

Capability Trend: Reasoning Models and Complex Planning

Governance Challenge: Open-Weight Models

Synthetic Biology Infrastructure

Current Safeguards

What We've Observed in 2025

4. Historical Context: WMD Development and Technology

State Programs: The Scale of Serious Capability

Non-State Attempts: The Capability Gap

Technology Inflection Points

5. Biological Weapons: The Highest-Risk Domain

Why Biological Represents the Greatest AI Risk

Current AI Capabilities in Biosecurity Context

Cloud Laboratory Security Considerations

Companion Research: Agent-Actuated Biological Automation

Governance Challenge: Distributed Synthesis Capability

Pathogen Categories and AI Risk

Gain-of-Function Considerations

6. Chemical Weapons: Procurement, Scaling, and Safety Barriers

Chemical Weapons and AI: A Middle Ground

Current Landscape

The Precursor Substitution Threat

Real-Time Synthesis Guidance

7. Nuclear Weapons: Physical Barriers and Information Aggregation

Nuclear Weapons: The Strongest Physical Barriers

AI's Limited but Non-Zero Contribution

The Radiological Threat (Dirty Bombs)

Supply Chain Security

8. Gene Drives: Long-Horizon Governance Gap

What Are Gene Drives?

Gene Drives as Potential Weapons

AI's Role in Gene Drive Development

Why Gene Drives Warrant Special Attention

Critical Nuance: Timescale Mismatch

Governance Anchors

Current Status and Near-Term Projection

9. Deployment Vectors: Aerosol Systems and Autonomous Delivery

Why Deployment Matters

Delivery Mechanism Defense Considerations

Autonomous System Security

Infrastructure Protection

10. The Cyber-Physical Convergence

Beyond Synthesis: Attacking Existing Infrastructure

Stuxnet as Precedent

Vulnerable Infrastructure Categories

How AI Agents Enable ICS Attacks