Skip to content

Instantly share code, notes, and snippets.

@cgcardona
Created March 6, 2026 16:20
Show Gist options
  • Select an option

  • Save cgcardona/c1da6da992239d9e2ac20382885d25eb to your computer and use it in GitHub Desktop.

Select an option

Save cgcardona/c1da6da992239d9e2ac20382885d25eb to your computer and use it in GitHub Desktop.
The Agent Shell: MCP as the bash of the LLM era — a brain dump

The Agent Shell: MCP as the bash of the LLM Era

A speculative brain dump. The seed of an idea captured before it gets buried under shipping pressure. Origin: a tangential conversation while building AgentCeption, March 2026.


The Spark

The conversation started with a throwaway question: is MCP more "native" to an agent than HTTP or syscalls?

The answer is yes — but not for the reason you'd first guess. It's not about the wire protocol. It's about the semantic layer. MCP sits closer to intent than HTTP does. A language model can read an MCP tool schema and know what to call without any translation step. It's a typed, discoverable, intent-legible interface — and that's a fundamentally different thing from a REST endpoint or a raw syscall.

Which led to the real question: what would an OS look like if it were designed for agents instead of humans?

And then: you don't need a new OS. You need a new shell.


The Stack

Here's the standard kernel-to-userspace stack, annotated for where the agent fits:

┌─────────────────────────────────────────────┐
│            Agent / LLM runtime              │  ← user space (top)
├─────────────────────────────────────────────┤
│         Capability Broker (MCP surface)     │  ← the interesting layer
├─────────────────────────────────────────────┤
│         Standard OS services                │
│   (libc, POSIX API, file descriptors,       │
│    sockets, signals, env vars)              │
├─────────────────────────────────────────────┤
│         Kernel boundary                     │
│   (syscall table: read, write, open,        │
│    fork, execve, mmap, ioctl…)              │
├─────────────────────────────────────────────┤
│         Hardware                            │  ← bottom
└─────────────────────────────────────────────┘

The Capability Broker lives between POSIX and the agent runtime — solidly in user space, acting as a gatekeeper to OS services below it. The agent never calls libc or makes syscalls directly. It only sees MCP tools. The broker translates those tool calls downward into real OS operations, enforcing policy at the translation boundary.


Why "Shell" Is the Right Analogy

The shell wasn't just a convenience wrapper on top of Unix. It defined how humans interacted with the OS. It shaped what felt possible, what felt natural, what you'd even think to try. The entire Unix philosophy of composable small tools — cat, grep, sort, head — emerged partly because the shell made composition easy to express. The interface shaped the culture of what got built.

The capability broker does the same thing for agents:

  • If you design the MCP surface well, it shapes what the agent reaches for.
  • Narrow, well-named tools with clear schemas are the agent equivalent of the Unix tool philosophy — do one thing, take typed input, return typed output, compose cleanly.
  • A bloated tool with ten optional parameters is the agent equivalent of a God-object — technically capable of everything, hostile to reasoning about it.

The shell abstraction is also honest about what it is: it doesn't pretend to be the kernel. It exposes a curated, composable surface that happens to map onto kernel capabilities. The broker should be the same — it doesn't expose raw OS power, it exposes scoped, auditable, typed operations that happen to be implemented via OS calls underneath.


What the Broker Does at Each Layer Boundary

Agent calls:  read_file(path="/etc/config.toml")
                        │
              Broker checks:
                - Is this path in the agent's allowed mount?
                - Does this run_id have read capability on this path?
                - Log the access attempt with timestamp + agent identity
                        │
              Broker calls:  open() + read()  [POSIX]
                        │
              Kernel:        sys_openat() + sys_read()
                        │
              Returns typed result back to agent

Three things happen at the broker boundary that don't happen with raw OS access:

  1. Authorization — capability check before the OS call is made
  2. Audit — every tool call is logged with agent identity, run_id, timestamp, and result
  3. Semantic enrichment — the result can be shaped for the agent's consumption (e.g., binary files described rather than dumped, errors explained in natural language)

The Session State Problem — and Its Solution

A shell session isn't stateless. It has:

  • A working directory (cwd)
  • Environment variables (inherited, exported, scoped)
  • A command history
  • Job state (foreground/background processes)

An LLM agent session today carries all of this implicitly in its context window — which means it re-states it on every call, burns tokens, and loses it when the context is trimmed.

The broker should own session state explicitly:

Shell concept Agent broker equivalent
cwd Active scope: repo, branch, run_id
Environment variables Active capabilities, secrets, config values
Command history Tool call log for this session
Job control Async tool calls + status polling

This is already half-working in AgentCeption: agent_run_id functions as the working directory of the agent's shell session. Every MCP report call carries it, and the broker (AgentCeption's backend) uses it to scope all state. The idea just needs to be made explicit and first-class.


The Pipe Problem — The Hardest Part

Unix's most powerful idea isn't the shell. It's pipes. ls | grep foo | sort | head -10 is a dataflow graph expressed in one line. Each tool is stateless; composition is the source of power.

Agents today do this sequentially — one tool call at a time, LLM holding intermediate state in context. That's expensive (tokens), lossy (context trimming), and serial (no parallelism).

The deep question for an agent shell: what is the pipe?

Some candidate answers:

Option A: Explicit dataflow notation in the tool call

The agent expresses a pipeline in one call:

{
  "pipeline": [
    { "tool": "list_issues", "args": { "label": "ac-workflow", "state": "open" } },
    { "tool": "filter", "args": { "field": "labels", "exclude": ["blocked", "agent:wip"] } },
    { "tool": "for_each", "subtool": "get_issue", "bind": "number" }
  ]
}

The broker executes the graph, parallelizing where possible, and returns a typed result. The LLM never sees intermediate state.

Pro: Massively more efficient. Con: Forces the LLM to plan ahead; breaks the incremental reasoning loop that LLMs are actually good at.

Option B: Reactive tool results with embedded continuations

Each tool result includes a suggested_next field — a typed hint about what the broker thinks the agent should call next, given the result shape.

{
  "result": [...issues...],
  "suggested_next": [
    { "tool": "get_issue", "description": "Read each issue in detail", "bind": "number" }
  ]
}

The agent can follow or ignore. The broker is guiding without controlling.

Pro: Preserves agent autonomy. Con: Doesn't help with parallelism.

Option C: The broker executes fan-out automatically

When the broker sees a tool that returns a list, and the next tool call operates on a single item of that list, it fans out automatically — spawning parallel executions and joining results.

This is basically MapReduce, but baked into the shell layer.

Pro: Transparent to the agent; works with existing tool designs. Con: The broker needs to infer intent from the sequence of calls, which is hard to do reliably.

The honest answer

None of these is obviously right. Pipes were elegant because the data model was simple — byte streams. Agent tool results are rich structured types. Composing them requires either a type system (hard) or a smart intermediary (the broker, doing inference). This is a genuinely unsolved design problem.


Prior Art Worth Studying

The idea isn't fully novel. Several systems have approached this from different angles:

Capsicum (FreeBSD)

Capability-based security at the kernel level. Processes can only access file descriptors they were explicitly handed at fork time — they can't open new ones without going through a broker. The enforcement is at the kernel boundary, not a user-space intermediary.

Lesson: Capability control at the lowest possible layer prevents capability escalation. The broker should enforce at translation time, not at call time.

Landlock (Linux 5.13+)

Unprivileged sandboxing via a declarative ruleset. A process defines its own sandbox before executing, and the kernel enforces it. No root required.

Lesson: Self-declared capability limits are more auditable than externally imposed ones. An agent that declares its own scope ("I only need read access to /etc/config.toml") is easier to reason about than one that's given a capability set by an external administrator.

WASI (WebAssembly System Interface)

The closest existing system to what's being described here. Wasm modules can't touch the OS directly — they call WASI functions (fd_read, path_open, sock_recv), and the runtime decides whether to fulfill them. The module is fully sandboxed; capabilities are granted explicitly at instantiation.

Lesson: This is exactly the broker pattern, implemented for a different kind of "agent" (Wasm modules). Swap "WASI functions" for "MCP tools" and "Wasm module" for "LLM agent" and you have the architecture. The MCP version has richer, higher-level semantics and natural-language-legible schemas.

Plan 9 from Bell Labs

"Everything is a file server." Every resource — processes, network connections, graphics, even the CPU — is exposed as a filesystem that can be mounted and accessed via standard file operations. Composition happens through namespace manipulation.

Lesson: A uniform interface across all resource types makes composition natural. MCP's tool model is the agent equivalent — a uniform call convention across all capabilities.

Nix / NixOS

Declarative, reproducible system configuration. Every capability a process has is derived from its closure in the Nix store — there's no mutable global state.

Lesson: Reproducibility matters. If an agent's capability set is declared and sealed at spawn time, you can replay, audit, and diff runs. This is directly applicable to agent orchestration systems like AgentCeption.


What This Means for AgentCeption Specifically

AgentCeption is already most of the way to a working prototype of this pattern:

Broker concept AgentCeption implementation
Session identity run_id (scopes all state)
Capability surface MCP tools (user-agentception server)
Audit log build_report_step, build_report_decision, etc.
Authorization Implicit (tools don't check yet — future work)
Fan-out / MapReduce Engineering coordinator spawning parallel engineers
Shell session The .agent-task file (working directory + env vars)
History SSE event log per run

The .agent-task file is particularly interesting in this frame. It's the agent's equivalent of a shell's initialized environment — a flat key-value store that defines the agent's scope, identity, and capabilities before it starts executing. It's a session initialization file. A .bashrc for the agent.

The gaps:

  1. No capability enforcement — the MCP tools don't currently check whether the calling run_id is allowed to call them. Any agent can call any tool. This is fine for a single-user system; it becomes a problem at scale or with untrusted agents.
  2. No pipe primitive — fan-out is done by spawning child Task agents, which is expensive (one LLM context per fan-out). A native broker-level fan-out would be faster.
  3. No session state management — the .agent-task file is static; there's no mechanism for the broker to update ambient state as the session progresses.

The MVP of an Agent Shell

If you were going to build this as a standalone system — call it agsh for the sake of naming — the minimal viable version looks like:

agsh/
├── broker/
│   ├── server.py          # MCP server (the shell process)
│   ├── session.py         # Session state (cwd, env, history)
│   ├── capabilities.py    # Capability declarations + enforcement
│   └── audit.py           # Structured audit log
├── tools/
│   ├── fs.py              # Filesystem tools (read_file, write_file, list_dir, …)
│   ├── process.py         # Process tools (exec_command, kill_process, …)
│   ├── network.py         # Network tools (http_get, http_post, open_socket, …)
│   ├── secrets.py         # Secret access (get_secret — never returns raw value, only injects)
│   └── meta.py            # Shell meta-tools (get_capabilities, get_session, set_cwd, …)
├── runtime/
│   ├── agent.py           # Agent loop (call tool → get result → decide next)
│   └── sandbox.py         # OS-level sandbox setup (Landlock/Capsicum/seccomp)
└── agsh.py                # Entry point: agsh run --role engineer --scope issue:42

The key design principle: the agent never gets credentials, only capabilities. It never sees an API key — it calls http_get(url=..., auth_scope="github") and the broker injects the credential into the outbound request. The agent can't exfiltrate secrets because it never has them.


The Deeper Philosophical Point

The shell wasn't just a technical artifact. It was a legibility layer — it made the OS legible to humans. The human didn't need to know how fork and execve worked to run a program. The shell provided a vocabulary that matched how humans thought about work.

MCP is the beginning of a legibility layer for agents. The vocabulary of tools, schemas, and typed results is a language designed for the way LLMs process information — not the way CPUs do. The broker's job is to make the OS legible in that vocabulary.

The really interesting long-run question: does the agent's legibility layer converge with the human's? Or do they diverge — optimizing for different cognitive architectures? A shell that's perfect for an LLM might be completely illegible to a human, and vice versa.

Or maybe the right answer is a shell that's legible to both — where a human can inspect what the agent did, understand the audit log, and intervene when needed. That's actually what AgentCeption's build board is, in embryonic form: a human-legible view into what the agent shell session produced.


Open Questions (for future exploration)

  1. What is the type system for tool results? Pipes work because byte streams are universal. Tool results are rich structured types — they need a type system to compose. What does that type system look like?

  2. How does capability inheritance work across spawned children? When an agent spawns a child (via Task), does the child inherit the parent's capabilities? A strict subset? None? This is the fork-and-exec capability model question, applied to agents.

  3. What does sudo look like for agents? Capability escalation — when an agent needs a capability it wasn't granted — requires a mechanism for requesting it and a human (or senior agent) to approve. What's the UX for that?

  4. Can the broker be the orchestrator? In AgentCeption, orchestration is done by manager agents (CTO → coordinator → engineer). Could the broker itself handle fan-out, removing the need for manager-tier agents entirely? The broker would read a capability declaration and execute the tree directly.

  5. What's the agent equivalent of cron? Scheduled, triggered, and reactive execution — the broker as an event loop that fires agents in response to OS events, time, or external signals.

  6. How do you diff two agent shell sessions? If reproducibility is a goal (à la Nix), you need to be able to compare two runs: same capabilities, same tool calls, same results? Or did something diverge? This is the foundation of agent testing.


TL;DR (for future-me skimming this)

  • MCP is to agents what the shell is to humans — a legibility layer between the agent and the OS.
  • The Capability Broker sits between POSIX and the LLM runtime, translating MCP tool calls into OS operations with authorization, auditing, and semantic enrichment at the boundary.
  • The broker should own session state (scope, capabilities, history) so the agent doesn't have to re-state it on every call.
  • Pipes are the unsolved hard part — composing tool results without burning LLM context on intermediate state.
  • AgentCeption is already a working prototype of this pattern. The .agent-task file is the .bashrc. The run_id is the session. The MCP tools are the shell builtins. The build board is the terminal window.
  • The design principle that matters most: agents get capabilities, not credentials. The broker injects. The agent never holds secrets.
  • Prior art: Capsicum, Landlock, WASI, Plan 9, Nix. All of them solved a version of this problem for a different kind of agent. The LLM version inherits their lessons.

Captured: March 2026. Return to this when AgentCeption ships and the team's projects are unblocked. The seed is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment