Skip to content

Instantly share code, notes, and snippets.

@knowsuchagency
Last active March 11, 2026 18:03
Show Gist options
  • Select an option

  • Save knowsuchagency/34b954c60d6a1cf9bb1067c39dad03bd to your computer and use it in GitHub Desktop.

Select an option

Save knowsuchagency/34b954c60d6a1cf9bb1067c39dad03bd to your computer and use it in GitHub Desktop.

Comparison: Agent Browser Protocol vs Vercel agent-browser

Two browser automation solutions purpose-built for AI agents, both born in January 2026, taking radically different architectural approaches.

Overview

Agent Browser Protocol (ABP) Vercel agent-browser
Repo theredsix/agent-browser-protocol vercel-labs/agent-browser
Approach Chromium fork with engine-level integration Rust CLI wrapping a Playwright/Node.js daemon
Stars 54 ~21,000
Age ~2 months (Jan 2026) ~2 months (Jan 2026)
Latest v0.1.6 v0.17.1
License BSD 3-Clause Apache 2.0
Language C++ (browser internals) Rust (CLI) + TypeScript (daemon)
Contributors 1 ~30 (1 primary)

Architecture

ABP takes the radical approach of embedding an HTTP server directly inside a Chromium fork. It has direct access to browser internals (Browser, TabStripModel, DevTools agent), dispatching on the UI thread. This is not a CDP/Puppeteer wrapper — it's engine-level integration in ~30+ C++ source files under /chrome/browser/abp/.

agent-browser uses a conventional layered architecture: a fast Rust CLI communicates over IPC with a persistent Node.js daemon that wraps Playwright. An experimental pure-Rust daemon using CDP directly exists but is limited. The daemon auto-starts and persists between calls, avoiding browser startup costs.

Verdict: ABP is architecturally bolder and more tightly integrated. agent-browser is more pragmatic and maintainable.


Core Design Philosophy

ABP: Synchronous, deterministic actions. Every action (click, type, navigate) returns an atomic JSON response containing before/after screenshots, scroll state, event log, timing, and cursor position. JavaScript execution and virtual time freeze between agent steps — eliminating race conditions entirely.

agent-browser: Accessibility-tree-first interaction. The snapshot command produces an accessibility tree with element refs (@e1, @e2). Agents identify elements by these refs rather than coordinates or CSS selectors, then issue commands like click @e2. This avoids the fragility of pixel-coordinate clicking.

Verdict: Different but complementary philosophies. ABP optimizes for determinism and visual grounding (screenshots + element markup). agent-browser optimizes for semantic grounding (accessibility tree + refs). ABP's approach is closer to how VLMs "see" pages; agent-browser's is closer to how text-based LLMs reason about structure.


Key Feature Differences

Feature ABP agent-browser
JS/time freeze between actions Yes (engine-level) No
Accessibility tree snapshots No Yes (primary workflow)
Element bounding boxes on screenshots Yes (compositor-level) Yes (--annotate)
Session recording to SQLite Yes (built-in training data pipeline) No
Auth vault (LLM never sees passwords) No Yes
Domain allowlists / action policies No Yes
Network interception Yes Yes
Multi-tab support Yes Yes
iOS Simulator support No Yes (Appium)
Cloud browser providers No Yes (Browserbase, Browser Use, Kernel)
Serverless deployment No Yes (Vercel Sandbox, AWS Lambda)
Streaming viewport No Yes (WebSocket JPEG)
CDP connect to existing browser No (it is the browser) Yes
Headed/headless Both Both
MCP server Yes (embedded C++ + npm) Via skill files
Encrypted state storage No Yes (AES-256-GCM)

AI Agent Integration

ABP exposes a REST API on localhost:8222 with 50+ endpoints. It also has a built-in MCP server accessible via npx -y agent-browser-protocol --mcp. Integrates with Claude Code, Claude Desktop, and Codex CLI directly.

agent-browser is CLI-first — every operation is a single shell command (agent-browser click @e2). This makes it trivially usable by any AI agent that can execute shell commands. Skill files are available for Claude Code, Codex, Cursor, Gemini CLI, Copilot, Goose, and others via npx skills add.

Verdict: agent-browser has broader AI tool integration. ABP's REST API is more flexible for programmatic use cases. Both work well with Claude Code.


Performance & Reliability

ABP claims ~100ms overhead per action and 90.53% on the Online Mind2Web benchmark. The JS freeze mechanism makes interactions deterministic — no flaky waits or race conditions.

agent-browser has sub-millisecond CLI parsing overhead, but the actual automation runs through Playwright with its default 25s timeout and standard waiting mechanisms. No determinism guarantees beyond Playwright's built-in auto-waiting.

Verdict: ABP has a structural advantage in determinism thanks to JS/virtual-time freezing. agent-browser relies on Playwright's (good but imperfect) auto-waiting.


Security

ABP blocks real system input by default during agent operation. Runs on localhost only. Minimal security surface.

agent-browser has significantly more security features: auth vault (passwords never exposed to LLM), domain allowlists, action policies, confirmation gates, content boundary markers, output length limits, and AES-256-GCM encrypted state storage.

Verdict: agent-browser is clearly more security-conscious for production deployments where you're giving an AI agent browser access.


Maintenance & Sustainability

ABP is a 51 GB Chromium fork. Keeping it synced with upstream Chromium security patches is an enormous burden for a single developer. Building from source takes 4-6 hours. This is the project's biggest risk.

agent-browser builds on Playwright (well-maintained by Microsoft) and standard Node.js tooling. Contributing is straightforward. However, it's a Vercel Labs project — experimental, with no guarantee of long-term support.

Verdict: ABP has higher technical risk (Chromium fork maintenance). agent-browser has organizational risk (Vercel Labs may deprioritize it). Neither is a safe long-term bet yet.


Summary

Dimension Winner
Architectural innovation ABP
Determinism & reliability ABP
Training data pipeline ABP
Security for production agent-browser
Breadth of features agent-browser
Ease of adoption agent-browser
AI tool ecosystem integration agent-browser
Community traction agent-browser (400x more stars)
Maintainability agent-browser
Deployment flexibility agent-browser

Bottom Line

ABP is a technically impressive, research-oriented project that solves the browser-agent impedance mismatch at the deepest possible level (engine internals). Its JS freeze and session recording features are unique. However, it carries enormous maintenance risk as a Chromium fork by a single developer.

agent-browser is the more practical, production-oriented choice with broader features, better security, wider ecosystem integration, and a more sustainable architecture. Its accessibility-tree-first approach is well-suited to text-based LLMs, though it lacks ABP's determinism guarantees.

  • For research and VLM fine-tuning: ABP's session recording and deterministic actions are compelling.
  • For shipping AI agents in production: agent-browser is the safer bet today.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment