Skip to content

Instantly share code, notes, and snippets.

@apg
Last active March 12, 2026 03:03
Show Gist options
  • Select an option

  • Save apg/572f7bd6eac3b29ab4b8032b025c2746 to your computer and use it in GitHub Desktop.

Select an option

Save apg/572f7bd6eac3b29ab4b8032b025c2746 to your computer and use it in GitHub Desktop.
sqzr — Cedar-Governed LLM Agent Orchestrator Architecture Plan

sqzr — Cedar-Governed LLM Agent Orchestrator

Executive Summary

sqzr (pronounced "squeezer") is a daemon-based LLM agent orchestration server written in Go. It manages multiple concurrent agent sessions across model providers, enforces fine-grained access control via Cedar policy, and sandboxes all tool execution using macOS seatbelt profiles compiled from those same Cedar policies. Sessions are long-lived, filesystem-backed, and interconnected — they can read each other's context for cross-pollination. A gRPC streaming API enables both local TUI clients and remote session control.


1. Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        sqzrd (daemon)                       │
│                                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                  │
│  │ Session A │  │ Session B │  │ Session C │  ...            │
│  │ (worker1) │  │ (w1, w2) │  │ (worker1) │                 │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘                  │
│       │              │              │                        │
│  ┌────┴──────────────┴──────────────┴────┐                  │
│  │          Session Manager              │                  │
│  │  - context store (filesystem)         │                  │
│  │  - kanban task board                  │                  │
│  │  - cross-session read access          │                  │
│  └───────────────┬───────────────────────┘                  │
│                  │                                           │
│  ┌───────────────┴───────────────────────┐                  │
│  │          Policy Engine (Cedar)         │                  │
│  │  - tool authorization                 │                  │
│  │  - sandbox profile compilation        │                  │
│  │  - HTTP allowlisting                  │                  │
│  └───────────────┬───────────────────────┘                  │
│                  │                                           │
│  ┌───────────────┴───────────────────────┐                  │
│  │         Provider Abstraction           │                  │
│  │  - Anthropic, OpenAI, Gemini, etc.    │                  │
│  │  - streaming, tool calling            │                  │
│  │  - token counting & budget            │                  │
│  └───────────────────────────────────────┘                  │
│                                                             │
│  ┌───────────────────────────────────────┐                  │
│  │         gRPC Streaming API            │                  │
│  └───────────────┬───────────────────────┘                  │
└──────────────────┼──────────────────────────────────────────┘
                   │
         ┌─────────┴─────────┐
         │   sqzr (TUI CLI)  │
         └───────────────────┘

Key Components

Component Responsibility
sqzrd Parent daemon. Owns all sessions, workers, and the policy engine.
Session A logical conversation with one or more workers. Owns a filesystem-backed context store and a kanban task board.
Worker A single LLM API connection executing within a session. Multiple workers can be attached to one session.
Policy Engine Evaluates Cedar policies for every tool call, file access, and HTTP request. Compiles seatbelt profiles.
Sandbox macOS seatbelt profile applied to tool execution subprocesses. Generated from Cedar policy.
Provider Abstraction over LLM APIs (Anthropic, OpenAI, Google, etc.). Handles streaming, tool schemas, token counting.
gRPC API Bidirectional streaming interface for TUI clients and remote control.
sqzr TUI client binary. Connects to sqzrd via gRPC.

2. Context Management Strategy

2.1 The 250K Hard Ceiling

LLM quality degrades as context grows. sqzr enforces a hard ceiling of 250K tokens per active API session. This is not a soft compaction target — when context approaches the limit, the API session is killed and restarted with a fresh context window populated from the filesystem-backed store.

2.2 Filesystem-Backed Context Store

Rather than trying to summarize or compact context in-flight (which loses information), sqzr persists all context to the filesystem in structured form. This serves as the durable source of truth.

Session directory structure:

~/.sqzr/sessions/<session-id>/
├── meta.json                  # Session metadata, model, created_at, etc.
├── context/
│   ├── turns/                 # Complete turn history (JSONL, append-only)
│   │   ├── 0001.jsonl
│   │   ├── 0002.jsonl
│   │   └── ...
│   ├── decisions/             # Individual tagged decision files
│   │   ├── dec-001-cedar-over-opa.md
│   │   └── ...
│   ├── requirements/          # Individual tagged requirement files (with validators)
│   │   ├── req-001-seatbelt-deny-network.md
│   │   └── ...
│   ├── user-context.md        # Accumulated user-provided context
│   └── summary.md             # Auto-generated session summary (updated periodically)
├── kanban/
│   ├── TODO/
│   ├── IN-PROGRESS/
│   ├── BLOCKED/
│   ├── REVIEW/
│   └── DONE/
├── artifacts/                 # Files produced by the session
└── shared/                    # Symlink or mount point for cross-session reads

All artifacts in decisions/, requirements/, and kanban/ use tagged frontmatter (see Section 4 for the tagging system).

2.3 Context Reload Strategy

When a session must restart (approaching 250K, explicit reset, or error recovery), the new API session is seeded with:

  1. System prompt (from Cedar-governed template)
  2. Session summary (summary.md — auto-maintained)
  3. Tag-relevant decisions & requirements (matching the active task's tags, plus anything tagged always — see Section 4.7)
  4. Active kanban tasks (IN-PROGRESS and BLOCKED items)
  5. Recent turns (as many as fit within a configurable budget, e.g. 50K tokens)

This gives the model enough continuity to resume work without carrying the full history. The full history remains on disk for retrieval if needed.

2.4 Decision Context

The agent is instructed to write to decisions.md whenever it:

  • Chooses between alternatives
  • Commits to an approach
  • Encounters a constraint or blocker
  • Receives user direction that changes course

This file is always included in context reloads, giving the model persistent awareness of why things are the way they are.

2.5 Token Budget Tracking

Each worker tracks token usage from the provider's usage response. When input_tokens + output_tokens for the active session exceeds a configurable threshold (default: 200K input tokens), sqzr:

  1. Triggers a summary generation (using a separate, short-context call)
  2. Persists the summary to summary.md
  3. Marks the current API session for restart
  4. On the next turn, starts a fresh API session with the reload strategy above

The 200K threshold leaves headroom — the model sees ~200K of accumulated context, summary generation uses a fresh small context, and the restart resumes at whatever the reload budget is.


3. Planning & Kanban Task System

3.1 Planning Workflow

When the agent receives a complex task, it produces:

  1. Plan document (kanban/PLAN.md) — high-level approach, alternatives considered, architecture decisions, open questions
  2. Task files — individual files in kanban/TODO/, one per discrete unit of work

3.2 Task File Format

---
id: task-001
title: Implement Cedar policy parser
priority: high
depends_on: []
blocked_by: ""
assigned_worker: worker-1
prompt: |
  Parse Cedar policy files from ~/.sqzr/policies/ and compile them into
  a PolicySet. Use the cedar-go library. Handle syntax errors gracefully
  and report them to the session log.
---

## Acceptance Criteria
- Cedar files are parsed on daemon startup and on SIGHUP
- Parse errors are reported but don't crash the daemon
- PolicySet is accessible to the authorization engine

3.3 Kanban State Machine

TODO ──→ IN-PROGRESS ──→ REVIEW ──→ DONE
  │           │
  │           └──→ BLOCKED ──→ IN-PROGRESS
  │                   │
  └───────────────────┘ (re-prioritized)
  • TODO: Ready to be picked up. Contains the prompt needed to complete it.
  • IN-PROGRESS: A worker is actively executing. The task file is moved (or symlinked) and annotated with the worker ID.
  • BLOCKED: Cannot proceed. The blocked_by field explains why. The daemon alerts the user via the gRPC stream when a task enters BLOCKED.
  • REVIEW: Work is done but needs user verification.
  • DONE: Completed and verified.

3.4 Blocked Task Alerts

When a task moves to BLOCKED, sqzrd sends a TaskBlocked event on all connected gRPC streams for that session. The TUI displays this prominently. Common block reasons:

  • User feedback required (ambiguous requirement, design choice)
  • Policy denial (Cedar forbids a needed action — user must update policy)
  • External dependency (waiting on API access, credentials, etc.)
  • Inter-task dependency (another task must complete first)

4. Tagged Context Artifacts & Requirements

4.1 The Problem

Context reload (Section 2.3) currently loads everything — all decisions, all active tasks. But most of the time a worker only needs context relevant to what it's working on. Meanwhile, requirements (things the system must do) are a distinct artifact type that's currently absent: they're normative and testable, unlike decisions (which are historical rationale) or tasks (which are units of work).

We need:

  1. A way to tag all context artifacts so relevant context can be loaded selectively
  2. Requirements as a first-class artifact type with embedded validation criteria
  3. A mechanism to run validators that check whether requirements are satisfied

4.2 Tags

Tags are simple string labels applied to any context artifact (requirement, decision, task). They serve two purposes:

  • Selective context loading: When a worker picks up a task tagged sandbox, the context loader pulls in decisions and requirements also tagged sandbox, rather than loading everything.
  • Cross-cutting concerns: A requirement tagged both security and sandbox is surfaced whenever either tag is relevant.

Tags are freeform strings but we establish conventions:

Convention Examples Purpose
Subsystem policy, sandbox, session, provider, kanban, proxy, grpc, tui Major architectural components
Cross-cutting security, performance, ux, reliability Concerns that span subsystems
Feature http-tool, cross-session, multi-worker Specific feature areas

There is no tag registry — tags are created by use. The system tracks which tags exist (by scanning artifact frontmatter) for autocomplete and reporting.

4.3 Tagged Artifact Format

All context artifacts (decisions, requirements, tasks) share a common frontmatter envelope:

---
kind: requirement | decision | task
id: req-001
title: Seatbelt profile must deny all network except localhost
tags: [sandbox, security]
created: 2026-03-11T10:00:00Z
# ... kind-specific fields follow
---

Decisions move from a monolithic decisions.md to individual files:

~/.sqzr/sessions/<session-id>/context/
├── decisions/
│   ├── dec-001-cedar-over-opa.md
│   ├── dec-002-restart-over-compact.md
│   └── ...
├── requirements/
│   ├── req-001-seatbelt-deny-network.md
│   ├── req-002-token-ceiling.md
│   └── ...
└── ...

This replaces the flat decisions.md file. The trade-off is more files, but each is individually tagged, addressable, and loadable.

4.4 Requirements

A requirement is a tagged artifact with additional fields for validation:

---
kind: requirement
id: req-001
title: Seatbelt profile must deny all network except localhost
tags: [sandbox, security, seatbelt]
created: 2026-03-11T10:00:00Z
status: active          # active | satisfied | waived | obsolete
satisfies: []           # links to parent requirements (traceability)
---

## Description

All generated seatbelt profiles MUST include `(deny default)` and MUST NOT
contain any `(allow network-outbound ...)` rule that permits non-localhost
destinations. The only permitted network rule is:
`(allow network-outbound (remote ip "localhost:*"))`.

## Validation

```yaml
validators:
  - type: grep
    description: "Deny default is present in all generated profiles"
    command: "grep -r '(deny default)' {{artifacts_dir}}/sandbox/"
    expect: exit_code_0

  - type: grep_absent
    description: "No non-localhost network allow rules"
    command: "grep -rP 'allow network-outbound.*(?!localhost)' {{artifacts_dir}}/sandbox/"
    expect: exit_code_nonzero

  - type: test
    description: "Sandbox integration test passes"
    command: "go test ./internal/sandbox/ -run TestProfileDenyNetwork -v"
    expect: exit_code_0

Rationale

Defense in depth. Even if Cedar evaluation has a bug, the kernel-level seatbelt profile must independently enforce network isolation.


### 4.5 Validator Types

Validators are intentionally simple — they're shell commands with expected
outcomes. This keeps them language-agnostic and composable.

| Type | What it checks | `expect` |
|------|---------------|----------|
| `grep` | Pattern is present in output | `exit_code_0` |
| `grep_absent` | Pattern is NOT present | `exit_code_nonzero` |
| `test` | Test suite passes | `exit_code_0` |
| `command` | Arbitrary command | `exit_code_0`, `output_contains:`, `output_matches:` |
| `cedar` | Cedar policy evaluates as expected | `allow` or `deny` (with a test request) |
| `lint` | Static analysis check | `exit_code_0` |

Template variables available in validator commands:

| Variable | Expands to |
|----------|-----------|
| `{{workspace}}` | Project workspace root |
| `{{artifacts_dir}}` | Session artifacts directory |
| `{{session_dir}}` | Session root directory |
| `{{requirement_id}}` | The requirement's own ID |

### 4.6 Validation Runner

The validation runner is a daemon subsystem that can be triggered:

- **On demand**: User requests validation via TUI or gRPC (`ValidateRequirements` RPC)
- **On task completion**: When a task moves to REVIEW, validators for all
  requirements sharing any of the task's tags are run automatically
- **On session restart**: Optionally, as a "sanity check" before resuming work
- **Periodically**: Configurable interval for continuous validation

ValidateRequirements(tags: ["sandbox"]) │ ├─→ Find all requirements with matching tags │ req-001 (tags: [sandbox, security]) │ req-003 (tags: [sandbox, seatbelt]) │ ├─→ For each requirement, run validators │ req-001: validator 1 ✓, validator 2 ✓, validator 3 ✓ → PASS │ req-003: validator 1 ✓, validator 2 ✗ → FAIL │ └─→ Report results - Update requirement status - If FAIL: create BLOCKED task or alert user - Log to session audit trail


Validators run **sandboxed** (same seatbelt profile as tool execution) to
prevent validation commands from having more access than the agent itself.

### 4.7 Tag-Based Context Loading

The context reload strategy (Section 2.3) is updated to be tag-aware:

**Before (load everything):**
1. System prompt
2. Session summary
3. All decisions
4. All active tasks
5. Recent turns

**After (tag-scoped loading):**
1. System prompt
2. Session summary
3. **Decisions matching the active task's tags** (+ any tagged `always`)
4. **Requirements matching the active task's tags** (descriptions only, not validators)
5. Active kanban tasks (all, since they're small)
6. Recent turns

When a worker picks up task `task-014` tagged `[sandbox, cedar]`, the context
loader:

1. Scans `context/decisions/` for files tagged `sandbox` OR `cedar` OR `always`
2. Scans `context/requirements/` for files tagged `sandbox` OR `cedar` OR `always`
3. Includes these as synthetic context messages before the recent turns

This keeps context **focused and relevant** rather than dumping everything.

### 4.8 The `always` Tag

Some decisions and requirements are always relevant regardless of what the
worker is doing. Tag these with `always`:

```yaml
tags: [security, always]

Examples: "we chose Go", "the 250K token ceiling", "Cedar governs all tool access". These are loaded into every context reload.

4.9 Tag Propagation

Tasks inherit tags from the requirements they address. When planning produces tasks from requirements, the task's tags field is seeded from the requirement:

# requirement
kind: requirement
id: req-001
tags: [sandbox, security]

# task generated from req-001
kind: task
id: task-007
tags: [sandbox, security]      # inherited from req-001
validates: [req-001]           # explicit link back

This ensures that when a worker picks up task-007, it automatically gets the sandbox and security context.

4.10 Requirement Lifecycle

DRAFT ──→ ACTIVE ──→ SATISFIED
  │          │           │
  │          └──→ WAIVED │  (user decides it's not needed)
  │                      │
  └──────────────────────┴──→ OBSOLETE  (superseded or removed)
  • DRAFT: Written but not yet agreed upon. Not validated.
  • ACTIVE: Agreed. Validators are run. Failures create alerts.
  • SATISFIED: All validators pass consistently. Still checked.
  • WAIVED: Explicitly skipped (with rationale in the file).
  • OBSOLETE: No longer relevant. Kept for history.

5. Cedar Policy Engine

5.1 Policy Model

Cedar naturally maps to agent authorization:

  • Principal: Worker::"worker-1" or Session::"session-abc"
  • Action: Action::"tool/bash", Action::"tool/write", Action::"file/read", Action::"http/GET"
  • Resource: File::"/path/to/file", URL::"https://api.example.com/v1", Directory::"/workspace"

5.2 Entity Hierarchy

Session::"session-abc"
  ├── Worker::"worker-1"
  └── Worker::"worker-2"

Directory::"/workspace"
  ├── File::"/workspace/main.go"
  ├── File::"/workspace/go.mod"
  └── Directory::"/workspace/internal"
       └── File::"/workspace/internal/foo.go"

ToolSet::"core"
  ├── Action::"tool/read"
  ├── Action::"tool/write"
  ├── Action::"tool/edit"
  ├── Action::"tool/bash"
  ├── Action::"tool/grep"
  ├── Action::"tool/find"
  └── Action::"tool/ls"

ToolSet::"network"
  ├── Action::"tool/http"
  └── Action::"tool/fetch"

5.3 Example Policies

// Default: allow core tools within the workspace directory
permit (
    principal is Worker,
    action in ToolSet::"core",
    resource in Directory::"/workspace"
);

// Deny access to secrets
forbid (
    principal,
    action,
    resource in Directory::"/Users/you/.ssh"
);

forbid (
    principal,
    action,
    resource in Directory::"/Users/you/.aws"
);

// Allow HTTP tool only through the proxy (leash)
permit (
    principal is Worker,
    action == Action::"tool/http",
    resource
)
when { context.via_proxy == true };

// Allow network tools for specific sessions
permit (
    principal in Session::"research-session",
    action in ToolSet::"network",
    resource
)
when { resource.host == "api.github.com" };

// Block writes to go.sum (only go mod tidy should touch it)
forbid (
    principal,
    action == Action::"tool/write",
    resource == File::"/workspace/go.sum"
);

5.4 Cedar → Seatbelt Compilation

The policy engine compiles Cedar policies into macOS seatbelt profiles. This is a one-way transformation at session/worker startup:

Cedar policyIntermediate representationSBPL profile

The compiler walks Cedar policies and extracts:

  1. File paths from permit/forbid rules where resource is File:: or Directory::

    • permit ... resource in Directory::"/workspace"(allow file-read* (subpath "/workspace"))
    • forbid ... resource in Directory::"/Users/you/.ssh"(deny file-read* (subpath "/Users/you/.ssh"))
  2. Network rules from policies involving Action::"tool/http" or ToolSet::"network"

    • Default: (deny network*) then (allow network-outbound (remote ip "localhost:*"))
    • If policies allow specific hosts, configure the localhost proxy accordingly
  3. Process execution from Action::"tool/bash" policies

    • Allowed executables list → (allow process-exec (literal "/usr/bin/..."))

The seatbelt profile is the enforcement floor. Cedar is also checked at the application layer before tool execution. The seatbelt profile prevents escapes — even if there's a bug in the Go authorization code, the kernel won't allow forbidden operations.

Generated seatbelt profile template:

(version 1)
(deny default)

;; System libraries (always needed)
(allow file-read* (subpath "/usr/lib") (subpath "/System") (subpath "/Library"))
(allow file-read* (subpath "/dev/null") (subpath "/dev/urandom"))

;; Workspace access (from Cedar)
(allow file-read* (subpath (param "WORKSPACE")))
(allow file-write* (subpath (param "WORKSPACE")))

;; Temp directory
(allow file-write* (subpath (param "TMPDIR")))

;; Denied paths (from Cedar forbid rules)
(deny file-read* (subpath (param "SECRETS_DIR_0")))

;; Network: localhost only (proxy handles allowlisting)
(allow network-outbound (remote ip "localhost:*"))

;; Process execution (from Cedar)
(allow process-exec (literal "/bin/sh"))
(allow process-exec (literal "/usr/bin/env"))
(allow process-fork)

5.5 Runtime Authorization Flow

Worker requests tool execution
  │
  ├─→ Cedar Authorize(policySet, entities, request)
  │     │
  │     ├─→ DENY → return error to LLM, log denial
  │     │
  │     └─→ ALLOW → proceed to sandbox execution
  │
  └─→ sandbox-exec -f <compiled-profile> <command>
        │
        └─→ kernel enforces seatbelt (defense in depth)

6. Tool System

6.1 Core Tools (pi-mono inspired, minimal set)

Tool Description Cedar Action
read Read file contents (with line limits) Action::"tool/read"
write Create or overwrite a file Action::"tool/write"
edit Apply a targeted string replacement Action::"tool/edit"
bash Execute a shell command (sandboxed) Action::"tool/bash"
grep Search file contents with regex Action::"tool/grep"
find Find files by glob pattern Action::"tool/find"
ls List directory contents Action::"tool/ls"

These 7 tools are the default set, matching pi-mono's philosophy. They're sufficient for most coding tasks.

6.2 Extended Tools (Cedar-gated)

Additional tools can be enabled via Cedar policy:

Tool Description Cedar Action
http Make HTTP requests (via proxy) Action::"tool/http"
task Create/update kanban tasks Action::"tool/task"
session_read Read another session's context Action::"tool/session_read"
plan Write/update the plan document Action::"tool/plan"

The tool registry checks Cedar before exposing tools in the LLM's tool schema. If a policy doesn't permit Action::"tool/http" for the current worker, that tool simply isn't included in the API call.

6.3 HTTP Tool & Leash Integration

The HTTP tool is designed to never expose secrets to the agent.

Architecture options (see Section 10.1 for detailed trade-offs):

Recommended: Built-in proxy sidecar inspired by leash

Rather than depending on the full leash container setup, sqzrd runs a lightweight HTTP proxy on localhost that:

  1. Receives requests from the sandboxed agent (which can only reach localhost)
  2. Checks Cedar policy for the target URL/method
  3. Injects authorization headers from a credential store the agent cannot access
  4. Forwards the request
  5. Returns the response (optionally filtering sensitive headers)
  6. Logs the request for audit

Credential store:

~/.sqzr/credentials/
├── github.json      # {"header": "Authorization", "value": "Bearer ghp_..."}
├── openai.json      # {"header": "Authorization", "value": "Bearer sk-..."}
└── internal-api.json

The agent sees the http tool but never sees credentials. Cedar policies control which URLs the agent can access:

permit (
    principal is Worker,
    action == Action::"tool/http",
    resource
)
when { resource.host == "api.github.com" && context.method in ["GET"] };

The proxy maps resource.host to the appropriate credential file.


7. Session & Worker Model

7.1 Sessions

A session is a logical unit of work. It has:

  • A unique ID
  • A filesystem-backed context store (Section 2.2)
  • A kanban task board (Section 3)
  • One or more attached workers
  • A Cedar policy scope (which policies apply)

Sessions are created explicitly (via gRPC or TUI) and persist across daemon restarts.

7.2 Workers

A worker is a single LLM API connection within a session. Workers:

  • Execute against a specific model/provider
  • Share the session's context store (read-write)
  • Share the session's kanban board
  • Are independently sandboxed (each gets its own seatbelt profile)
  • Can be added/removed from sessions dynamically

Multi-worker patterns:

  • Parallel execution: Multiple workers tackle different tasks from the kanban board simultaneously
  • Specialist workers: One worker uses Claude for reasoning, another uses GPT-4 for code generation
  • Review worker: A dedicated worker reviews artifacts produced by others

7.3 Cross-Session Access

Sessions can read (but not write to) other sessions' context stores. This is implemented via:

  1. The session_read tool, which is Cedar-gated
  2. A read-only view of the target session's context/ and kanban/ directories
  3. Entity hierarchy: Session::"target" in Workspace::"default" allows policies to scope cross-session access
// Allow research sessions to read from any session in the workspace
permit (
    principal in Session::"research",
    action == Action::"tool/session_read",
    resource in Workspace::"default"
);

7.4 Session Lifecycle

CREATE ──→ ACTIVE ──→ SUSPENDED ──→ ACTIVE (resume)
              │                        │
              └──→ ARCHIVED ←──────────┘
  • ACTIVE: Has at least one worker. Accepting commands.
  • SUSPENDED: No workers. Context is on disk. Can resume instantly.
  • ARCHIVED: Explicitly closed. Read-only. Kept for reference.

8. Provider Abstraction

8.1 Interface

type Provider interface {
    // Stream sends a request and returns a channel of streaming events.
    Stream(ctx context.Context, req *ChatRequest) (<-chan StreamEvent, error)

    // Models returns available models for this provider.
    Models() []ModelInfo

    // CountTokens estimates token count for messages.
    CountTokens(messages []Message) (int, error)
}

type ChatRequest struct {
    Model       string
    Messages    []Message
    Tools       []ToolDef
    System      string
    MaxTokens   int
    Temperature float64
    // Provider-specific options
    Extra       map[string]any
}

type StreamEvent struct {
    Type    EventType // TextDelta, ToolCall, Usage, Done, Error
    Text    string
    Tool    *ToolCallEvent
    Usage   *UsageEvent
    Error   error
}

8.2 Supported Providers (initial)

Provider Notes
Anthropic Primary. Messages API with streaming. Extended thinking.
OpenAI Chat completions API. Responses API for newer models.
Google Gemini Gemini API with streaming.

Additional providers can be added by implementing the Provider interface. The system is not provider-specific in its core — tool definitions, message formats, and streaming events are normalized.

8.3 Token Counting

Each provider implements CountTokens using their respective tokenizer (or estimation heuristic). The session manager uses this to:

  • Track cumulative context size
  • Trigger context reloads at the 200K threshold
  • Select how many recent turns to include in a reload
  • Enforce the 250K hard ceiling

9. gRPC API & TUI

9.1 gRPC Service Definition

service Sqzr {
    // Session management
    rpc CreateSession(CreateSessionRequest) returns (Session);
    rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse);
    rpc GetSession(GetSessionRequest) returns (Session);
    rpc ArchiveSession(ArchiveSessionRequest) returns (Session);

    // Worker management
    rpc AttachWorker(AttachWorkerRequest) returns (Worker);
    rpc DetachWorker(DetachWorkerRequest) returns (Empty);

    // Interactive streaming — bidirectional
    rpc Connect(stream ClientEvent) returns (stream ServerEvent);

    // Task management
    rpc ListTasks(ListTasksRequest) returns (ListTasksResponse);
    rpc UpdateTask(UpdateTaskRequest) returns (Task);

    // Policy management
    rpc ReloadPolicies(Empty) returns (ReloadPoliciesResponse);

    // Requirements & validation
    rpc ValidateRequirements(ValidateRequest) returns (stream ValidationEvent);
    rpc ListRequirements(ListRequirementsRequest) returns (ListRequirementsResponse);
}

9.2 Bidirectional Streaming (Connect)

The Connect RPC is the primary interaction channel:

ClientEvent types:

  • UserMessage — user sends a message to a session
  • Abort — cancel the current generation
  • TaskAction — move a task, update a task
  • SessionSwitch — change which session the client is viewing

ServerEvent types:

  • TextDelta — streaming text from the model
  • ToolCall — tool invocation (name, args)
  • ToolResult — tool execution result
  • TaskBlocked — a task has been blocked (alert!)
  • TaskUpdate — task state change
  • SessionUpdate — session metadata change
  • ValidationResult — requirement validation pass/fail
  • RequirementFailed — a requirement's validators failed (alert!)
  • Error — error event

9.3 TUI Client

The TUI client (sqzr binary) connects to sqzrd via gRPC and provides:

  • Session list with status indicators
  • Active session view with streaming output
  • Kanban board view (TODO/IN-PROGRESS/BLOCKED/DONE)
  • Task detail view with block reason
  • Policy violation notifications
  • Multi-session switching (tabs or split)

Library choice: bubbletea — mature Go TUI framework, good for streaming content.


10. Daemon Architecture

10.1 Process Model

sqzrd (parent daemon)
  ├── gRPC server (goroutine)
  ├── policy engine (goroutine, watches policy files)
  ├── proxy server (goroutine, localhost HTTP proxy)
  ├── session manager
  │     ├── session-A
  │     │     ├── worker-1 (goroutine, LLM stream)
  │     │     └── worker-2 (goroutine, LLM stream)
  │     └── session-B
  │           └── worker-1 (goroutine, LLM stream)
  └── tool executor
        └── sandbox-exec child processes (forked per tool call)

10.2 Startup

  1. Load Cedar policies from ~/.sqzr/policies/*.cedar
  2. Compile seatbelt profile templates
  3. Start the HTTP proxy on a random localhost port
  4. Start the gRPC server on a configured address (default: unix:///tmp/sqzr.sock)
  5. Restore any ACTIVE or SUSPENDED sessions from disk
  6. Begin accepting connections

10.3 Signals

  • SIGHUP: Reload Cedar policies and recompile seatbelt profiles
  • SIGTERM/SIGINT: Graceful shutdown — suspend all sessions, stop workers
  • SIGUSR1: Dump session state to stderr (debugging)

11. Design Decisions & Alternatives

11.1 HTTP Tool: Leash vs. Built-in Proxy

Option A: Depend on Leash directly

  • Pros: Full-featured, battle-tested, container isolation, web UI
  • Cons: Requires Docker/Podman, heavy dependency, redundant with our seatbelt sandboxing, adds container orchestration complexity

Option B: Built-in lightweight proxy (Recommended)

  • Pros: Single binary, no container dependency, Cedar-native policy, simpler deployment, credential injection fits naturally into our auth model
  • Cons: Less isolation than a container boundary, must implement ourselves

Option C: Build on leash as a library

  • Pros: Reuse leash's proxy logic without the container model
  • Cons: Leash is Go but tightly coupled to its container model, would require significant forking

Decision: Option B. sqzr already has seatbelt for process isolation and Cedar for policy. A built-in proxy keeps the architecture simple and self-contained. We can revisit container-based isolation later if needed (e.g., Linux support where seatbelt isn't available).

11.2 Context Management: Compaction vs. Restart

Option A: In-flight compaction (pi-mono style)

  • Summarize older context, keep recent turns
  • Pros: No visible interruption, gradual degradation
  • Cons: Lossy, summary quality varies, still accumulates drift

Option B: Hard restart with filesystem reload (Recommended)

  • Kill session at threshold, restart with curated context from disk
  • Pros: Predictable quality, clean context window, all context preserved on disk, simple to reason about
  • Cons: Visible restart (brief pause), requires good summary generation

Decision: Option B. The filesystem-backed approach means nothing is ever lost. The model gets a clean, high-quality context window every time. The restart is a feature, not a bug — it prevents the degradation that compaction only delays.

11.3 Multi-Session Communication: Sub-agents vs. Shared Filesystem

Option A: Sub-agent spawning (tool that creates new sessions)

  • Pros: Familiar pattern, explicit delegation
  • Cons: Complex lifecycle management, hard to share context, unclear ownership

Option B: Shared filesystem with read-only cross-access (Recommended)

  • Pros: Simple, auditable, Cedar-governed, no new protocol needed
  • Cons: Eventually consistent (file writes aren't instant), no real-time notification between sessions

Option C: Message passing between sessions

  • Pros: Real-time, explicit communication
  • Cons: Complex protocol, needs its own Cedar policy layer, over-engineering

Decision: Option B. Sessions write to their own context store. Other sessions read from it via the session_read tool. This is simple, auditable, and naturally governed by Cedar. If real-time coordination is needed later, we can add a lightweight event bus.

11.4 Worker Model: Goroutines vs. Processes

Option A: Workers as goroutines (Recommended)

  • Pros: Cheap, fast communication, shared memory for session state
  • Cons: A panic in one worker could crash the daemon

Option B: Workers as child processes

  • Pros: Process isolation, crash containment
  • Cons: Complex IPC, expensive, over-engineering for this use case

Decision: Option A. Use goroutines with panic recovery. Tool execution (the risky part) is already isolated in sandbox-exec child processes.

11.5 TUI Protocol: gRPC vs. Unix Socket Custom Protocol

Option A: gRPC with bidirectional streaming (Recommended)

  • Pros: Well-defined, supports remote access, good Go libraries, protobuf schema, works over TCP or Unix socket
  • Cons: Slightly heavier than a custom protocol

Option B: Custom JSON-lines over Unix socket

  • Pros: Simpler, no protobuf dependency
  • Cons: Must define our own framing, no codegen, harder to extend

Decision: Option A. gRPC gives us remote access for free and the protobuf schema serves as documentation. The overhead is negligible.


12. Cedar-to-Tool-Schema Compilation

An interesting consequence of Cedar-governed tools: the set of tools exposed to the LLM in each API call is dynamic and policy-dependent.

At the start of each turn:

  1. Enumerate all registered tools
  2. For each tool, construct a Cedar request: principal=<worker>, action=<tool-action>, resource=<workspace>
  3. Evaluate against the policy set
  4. Only include tools that are Allow in the LLM's tool schema

This means:

  • A restrictive policy automatically reduces the tool set
  • Different workers in the same session can have different tools
  • Policy changes take effect on the next turn without restarts

13. Directory Layout (Go Project)

sqzr/
├── cmd/
│   ├── sqzrd/          # Daemon entry point
│   └── sqzr/           # TUI client entry point
├── internal/
│   ├── daemon/          # Daemon lifecycle, signal handling
│   ├── session/         # Session & worker management
│   ├── context/         # Filesystem context store, summaries
│   ├── kanban/          # Task board implementation
│   ├── artifacts/       # Tagged artifact system (decisions, requirements, tasks)
│   ├── validate/        # Requirement validation runner
│   ├── policy/          # Cedar policy engine, seatbelt compiler
│   ├── sandbox/         # Seatbelt profile generation & execution
│   ├── provider/        # LLM provider abstraction
│   │   ├── anthropic/
│   │   ├── openai/
│   │   └── google/
│   ├── tools/           # Tool implementations
│   │   ├── read.go
│   │   ├── write.go
│   │   ├── edit.go
│   │   ├── bash.go
│   │   ├── grep.go
│   │   ├── find.go
│   │   ├── ls.go
│   │   ├── http.go
│   │   ├── task.go
│   │   └── session_read.go
│   ├── proxy/           # HTTP proxy for credential injection
│   └── tui/             # TUI components (bubbletea)
├── proto/
│   └── sqzr/v1/         # Protobuf definitions
├── policies/            # Default Cedar policies
│   ├── default.cedar
│   └── examples/
├── go.mod
└── go.sum

14. Implementation Phases

Phase 1: Foundation

  • Go module setup, basic project structure
  • Provider abstraction with Anthropic support
  • Core tool implementations (read, write, edit, bash, grep, find, ls)
  • Basic agent loop: system prompt → user message → stream → tool calls → loop
  • Tagged artifact system (frontmatter parsing, tag index)
  • Filesystem context store (turns, decisions/, requirements/)
  • Token tracking and 250K hard ceiling with restart

Phase 2: Policy & Sandbox

  • Cedar policy engine integration (using cedar-go)
  • Tool authorization (Cedar check before every tool call)
  • Seatbelt profile compilation from Cedar
  • Sandboxed bash execution via sandbox-exec
  • Dynamic tool schema based on policy

Phase 3: Sessions, Kanban & Requirements

  • Session manager with create/list/archive
  • Kanban task board (filesystem-backed, tagged)
  • Requirements as first-class artifacts with validators
  • Validation runner (on-demand, on-task-completion, periodic)
  • Planning workflow (plan document + task generation with tag inheritance)
  • Tag-based context loading on session restart
  • Blocked task detection and alerting

Phase 4: gRPC & TUI

  • Protobuf definitions (including ValidateRequirements RPC)
  • gRPC server in sqzrd
  • Bidirectional streaming (Connect RPC)
  • TUI client with bubbletea
  • Session switching, kanban view, streaming output
  • Validation results display and requirement failure alerts

Phase 5: Multi-Worker & Cross-Session

  • Multiple workers per session
  • Cross-session read access (session_read tool)
  • HTTP proxy with credential injection
  • HTTP tool with Cedar-governed URL allowlisting

Phase 6: Polish & Hardening

  • SIGHUP policy reload
  • Graceful shutdown and session persistence
  • Audit logging
  • Error recovery (worker crash, provider timeout)
  • Documentation and default policy examples

15. Open Questions

  1. Linux support: Seatbelt is macOS-only. For Linux, the equivalent would be seccomp-bpf + mount namespaces (or just containers). Should we abstract the sandbox interface now, or defer Linux support?

  2. Token counting accuracy: Provider tokenizers differ. Should we use exact tokenizer libraries (tiktoken for OpenAI, Anthropic's tokenizer) or a rough heuristic (chars/4)?

  3. Summary generation model: Should the summary be generated by the same model/provider as the session, or a cheaper/faster model? Using a cheaper model saves cost but might miss nuance.

  4. Kanban task assignment: When multiple workers are available, how are tasks assigned? Options: round-robin, explicit assignment, workers self-select from TODO queue.

  5. Cross-session write: Currently read-only. Are there cases where sessions should be able to write to each other's context? (E.g., a coordinator session assigning tasks to worker sessions.)

  6. Policy hot-reload scope: When policies are reloaded via SIGHUP, should active workers be immediately re-evaluated (potentially losing tool access mid-turn), or should changes only apply to new turns?

  7. Tag cardinality: Should there be a recommended max number of tags per artifact? Too many tags dilute the signal; too few lose the benefit of selective loading. Practical experience will inform this.

  8. Requirement scope: Are requirements session-scoped (live inside a session's context store) or workspace-scoped (shared across all sessions)? Session-scoped is simpler but means duplicate requirements across sessions working on the same codebase. Workspace-scoped is more natural but needs its own directory and access control.

  9. Validator trust: Validators are shell commands. Should they run in the same seatbelt sandbox as agent tool calls, or a less restrictive one? They need to read build artifacts and run tests, which may require more access than the agent's write sandbox.

  10. Tag-based context budget: When loading tag-relevant decisions and requirements, how much of the token budget should they consume? Fixed allocation (e.g., 20K tokens) vs. proportional (e.g., 10% of budget)?


16. Security Model Summary

Layer Mechanism What it prevents
Application Cedar policy evaluation Unauthorized tool use, file access, HTTP requests
Kernel macOS seatbelt (sandbox-exec) Sandbox escapes, direct syscall bypasses
Network Localhost-only + proxy Direct outbound connections, credential exposure
Credential Proxy-injected auth, separate store Agent seeing API keys or tokens
Context 250K ceiling + restart Model degradation, prompt injection accumulation
Cross-session Read-only + Cedar-gated Unauthorized cross-session data access

Defense in depth: even if the Cedar check has a bug, seatbelt prevents the operation at the kernel level. Even if seatbelt is misconfigured, the proxy prevents credential exposure. Each layer independently limits blast radius.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment