sqzr (pronounced "squeezer") is a daemon-based LLM agent orchestration server written in Go. It manages multiple concurrent agent sessions across model providers, enforces fine-grained access control via Cedar policy, and sandboxes all tool execution using macOS seatbelt profiles compiled from those same Cedar policies. Sessions are long-lived, filesystem-backed, and interconnected — they can read each other's context for cross-pollination. A gRPC streaming API enables both local TUI clients and remote session control.
┌─────────────────────────────────────────────────────────────┐
│ sqzrd (daemon) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Session A │ │ Session B │ │ Session C │ ... │
│ │ (worker1) │ │ (w1, w2) │ │ (worker1) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────┴──────────────┴──────────────┴────┐ │
│ │ Session Manager │ │
│ │ - context store (filesystem) │ │
│ │ - kanban task board │ │
│ │ - cross-session read access │ │
│ └───────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────────────┐ │
│ │ Policy Engine (Cedar) │ │
│ │ - tool authorization │ │
│ │ - sandbox profile compilation │ │
│ │ - HTTP allowlisting │ │
│ └───────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────────────┐ │
│ │ Provider Abstraction │ │
│ │ - Anthropic, OpenAI, Gemini, etc. │ │
│ │ - streaming, tool calling │ │
│ │ - token counting & budget │ │
│ └───────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────┐ │
│ │ gRPC Streaming API │ │
│ └───────────────┬───────────────────────┘ │
└──────────────────┼──────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ sqzr (TUI CLI) │
└───────────────────┘
| Component | Responsibility |
|---|---|
| sqzrd | Parent daemon. Owns all sessions, workers, and the policy engine. |
| Session | A logical conversation with one or more workers. Owns a filesystem-backed context store and a kanban task board. |
| Worker | A single LLM API connection executing within a session. Multiple workers can be attached to one session. |
| Policy Engine | Evaluates Cedar policies for every tool call, file access, and HTTP request. Compiles seatbelt profiles. |
| Sandbox | macOS seatbelt profile applied to tool execution subprocesses. Generated from Cedar policy. |
| Provider | Abstraction over LLM APIs (Anthropic, OpenAI, Google, etc.). Handles streaming, tool schemas, token counting. |
| gRPC API | Bidirectional streaming interface for TUI clients and remote control. |
| sqzr | TUI client binary. Connects to sqzrd via gRPC. |
LLM quality degrades as context grows. sqzr enforces a hard ceiling of 250K tokens per active API session. This is not a soft compaction target — when context approaches the limit, the API session is killed and restarted with a fresh context window populated from the filesystem-backed store.
Rather than trying to summarize or compact context in-flight (which loses information), sqzr persists all context to the filesystem in structured form. This serves as the durable source of truth.
Session directory structure:
~/.sqzr/sessions/<session-id>/
├── meta.json # Session metadata, model, created_at, etc.
├── context/
│ ├── turns/ # Complete turn history (JSONL, append-only)
│ │ ├── 0001.jsonl
│ │ ├── 0002.jsonl
│ │ └── ...
│ ├── decisions/ # Individual tagged decision files
│ │ ├── dec-001-cedar-over-opa.md
│ │ └── ...
│ ├── requirements/ # Individual tagged requirement files (with validators)
│ │ ├── req-001-seatbelt-deny-network.md
│ │ └── ...
│ ├── user-context.md # Accumulated user-provided context
│ └── summary.md # Auto-generated session summary (updated periodically)
├── kanban/
│ ├── TODO/
│ ├── IN-PROGRESS/
│ ├── BLOCKED/
│ ├── REVIEW/
│ └── DONE/
├── artifacts/ # Files produced by the session
└── shared/ # Symlink or mount point for cross-session reads
All artifacts in decisions/, requirements/, and kanban/ use tagged
frontmatter (see Section 4 for the tagging system).
When a session must restart (approaching 250K, explicit reset, or error recovery), the new API session is seeded with:
- System prompt (from Cedar-governed template)
- Session summary (
summary.md— auto-maintained) - Tag-relevant decisions & requirements (matching the active task's tags,
plus anything tagged
always— see Section 4.7) - Active kanban tasks (IN-PROGRESS and BLOCKED items)
- Recent turns (as many as fit within a configurable budget, e.g. 50K tokens)
This gives the model enough continuity to resume work without carrying the full history. The full history remains on disk for retrieval if needed.
The agent is instructed to write to decisions.md whenever it:
- Chooses between alternatives
- Commits to an approach
- Encounters a constraint or blocker
- Receives user direction that changes course
This file is always included in context reloads, giving the model persistent awareness of why things are the way they are.
Each worker tracks token usage from the provider's usage response. When
input_tokens + output_tokens for the active session exceeds a configurable
threshold (default: 200K input tokens), sqzr:
- Triggers a summary generation (using a separate, short-context call)
- Persists the summary to
summary.md - Marks the current API session for restart
- On the next turn, starts a fresh API session with the reload strategy above
The 200K threshold leaves headroom — the model sees ~200K of accumulated context, summary generation uses a fresh small context, and the restart resumes at whatever the reload budget is.
When the agent receives a complex task, it produces:
- Plan document (
kanban/PLAN.md) — high-level approach, alternatives considered, architecture decisions, open questions - Task files — individual files in
kanban/TODO/, one per discrete unit of work
---
id: task-001
title: Implement Cedar policy parser
priority: high
depends_on: []
blocked_by: ""
assigned_worker: worker-1
prompt: |
Parse Cedar policy files from ~/.sqzr/policies/ and compile them into
a PolicySet. Use the cedar-go library. Handle syntax errors gracefully
and report them to the session log.
---
## Acceptance Criteria
- Cedar files are parsed on daemon startup and on SIGHUP
- Parse errors are reported but don't crash the daemon
- PolicySet is accessible to the authorization engineTODO ──→ IN-PROGRESS ──→ REVIEW ──→ DONE
│ │
│ └──→ BLOCKED ──→ IN-PROGRESS
│ │
└───────────────────┘ (re-prioritized)
- TODO: Ready to be picked up. Contains the prompt needed to complete it.
- IN-PROGRESS: A worker is actively executing. The task file is moved (or symlinked) and annotated with the worker ID.
- BLOCKED: Cannot proceed. The
blocked_byfield explains why. The daemon alerts the user via the gRPC stream when a task enters BLOCKED. - REVIEW: Work is done but needs user verification.
- DONE: Completed and verified.
When a task moves to BLOCKED, sqzrd sends a TaskBlocked event on all
connected gRPC streams for that session. The TUI displays this prominently.
Common block reasons:
- User feedback required (ambiguous requirement, design choice)
- Policy denial (Cedar forbids a needed action — user must update policy)
- External dependency (waiting on API access, credentials, etc.)
- Inter-task dependency (another task must complete first)
Context reload (Section 2.3) currently loads everything — all decisions, all active tasks. But most of the time a worker only needs context relevant to what it's working on. Meanwhile, requirements (things the system must do) are a distinct artifact type that's currently absent: they're normative and testable, unlike decisions (which are historical rationale) or tasks (which are units of work).
We need:
- A way to tag all context artifacts so relevant context can be loaded selectively
- Requirements as a first-class artifact type with embedded validation criteria
- A mechanism to run validators that check whether requirements are satisfied
Tags are simple string labels applied to any context artifact (requirement, decision, task). They serve two purposes:
- Selective context loading: When a worker picks up a task tagged
sandbox, the context loader pulls in decisions and requirements also taggedsandbox, rather than loading everything. - Cross-cutting concerns: A requirement tagged both
securityandsandboxis surfaced whenever either tag is relevant.
Tags are freeform strings but we establish conventions:
| Convention | Examples | Purpose |
|---|---|---|
| Subsystem | policy, sandbox, session, provider, kanban, proxy, grpc, tui |
Major architectural components |
| Cross-cutting | security, performance, ux, reliability |
Concerns that span subsystems |
| Feature | http-tool, cross-session, multi-worker |
Specific feature areas |
There is no tag registry — tags are created by use. The system tracks which tags exist (by scanning artifact frontmatter) for autocomplete and reporting.
All context artifacts (decisions, requirements, tasks) share a common frontmatter envelope:
---
kind: requirement | decision | task
id: req-001
title: Seatbelt profile must deny all network except localhost
tags: [sandbox, security]
created: 2026-03-11T10:00:00Z
# ... kind-specific fields follow
---Decisions move from a monolithic decisions.md to individual files:
~/.sqzr/sessions/<session-id>/context/
├── decisions/
│ ├── dec-001-cedar-over-opa.md
│ ├── dec-002-restart-over-compact.md
│ └── ...
├── requirements/
│ ├── req-001-seatbelt-deny-network.md
│ ├── req-002-token-ceiling.md
│ └── ...
└── ...
This replaces the flat decisions.md file. The trade-off is more files, but
each is individually tagged, addressable, and loadable.
A requirement is a tagged artifact with additional fields for validation:
---
kind: requirement
id: req-001
title: Seatbelt profile must deny all network except localhost
tags: [sandbox, security, seatbelt]
created: 2026-03-11T10:00:00Z
status: active # active | satisfied | waived | obsolete
satisfies: [] # links to parent requirements (traceability)
---
## Description
All generated seatbelt profiles MUST include `(deny default)` and MUST NOT
contain any `(allow network-outbound ...)` rule that permits non-localhost
destinations. The only permitted network rule is:
`(allow network-outbound (remote ip "localhost:*"))`.
## Validation
```yaml
validators:
- type: grep
description: "Deny default is present in all generated profiles"
command: "grep -r '(deny default)' {{artifacts_dir}}/sandbox/"
expect: exit_code_0
- type: grep_absent
description: "No non-localhost network allow rules"
command: "grep -rP 'allow network-outbound.*(?!localhost)' {{artifacts_dir}}/sandbox/"
expect: exit_code_nonzero
- type: test
description: "Sandbox integration test passes"
command: "go test ./internal/sandbox/ -run TestProfileDenyNetwork -v"
expect: exit_code_0Defense in depth. Even if Cedar evaluation has a bug, the kernel-level seatbelt profile must independently enforce network isolation.
### 4.5 Validator Types
Validators are intentionally simple — they're shell commands with expected
outcomes. This keeps them language-agnostic and composable.
| Type | What it checks | `expect` |
|------|---------------|----------|
| `grep` | Pattern is present in output | `exit_code_0` |
| `grep_absent` | Pattern is NOT present | `exit_code_nonzero` |
| `test` | Test suite passes | `exit_code_0` |
| `command` | Arbitrary command | `exit_code_0`, `output_contains:`, `output_matches:` |
| `cedar` | Cedar policy evaluates as expected | `allow` or `deny` (with a test request) |
| `lint` | Static analysis check | `exit_code_0` |
Template variables available in validator commands:
| Variable | Expands to |
|----------|-----------|
| `{{workspace}}` | Project workspace root |
| `{{artifacts_dir}}` | Session artifacts directory |
| `{{session_dir}}` | Session root directory |
| `{{requirement_id}}` | The requirement's own ID |
### 4.6 Validation Runner
The validation runner is a daemon subsystem that can be triggered:
- **On demand**: User requests validation via TUI or gRPC (`ValidateRequirements` RPC)
- **On task completion**: When a task moves to REVIEW, validators for all
requirements sharing any of the task's tags are run automatically
- **On session restart**: Optionally, as a "sanity check" before resuming work
- **Periodically**: Configurable interval for continuous validation
ValidateRequirements(tags: ["sandbox"]) │ ├─→ Find all requirements with matching tags │ req-001 (tags: [sandbox, security]) │ req-003 (tags: [sandbox, seatbelt]) │ ├─→ For each requirement, run validators │ req-001: validator 1 ✓, validator 2 ✓, validator 3 ✓ → PASS │ req-003: validator 1 ✓, validator 2 ✗ → FAIL │ └─→ Report results - Update requirement status - If FAIL: create BLOCKED task or alert user - Log to session audit trail
Validators run **sandboxed** (same seatbelt profile as tool execution) to
prevent validation commands from having more access than the agent itself.
### 4.7 Tag-Based Context Loading
The context reload strategy (Section 2.3) is updated to be tag-aware:
**Before (load everything):**
1. System prompt
2. Session summary
3. All decisions
4. All active tasks
5. Recent turns
**After (tag-scoped loading):**
1. System prompt
2. Session summary
3. **Decisions matching the active task's tags** (+ any tagged `always`)
4. **Requirements matching the active task's tags** (descriptions only, not validators)
5. Active kanban tasks (all, since they're small)
6. Recent turns
When a worker picks up task `task-014` tagged `[sandbox, cedar]`, the context
loader:
1. Scans `context/decisions/` for files tagged `sandbox` OR `cedar` OR `always`
2. Scans `context/requirements/` for files tagged `sandbox` OR `cedar` OR `always`
3. Includes these as synthetic context messages before the recent turns
This keeps context **focused and relevant** rather than dumping everything.
### 4.8 The `always` Tag
Some decisions and requirements are always relevant regardless of what the
worker is doing. Tag these with `always`:
```yaml
tags: [security, always]
Examples: "we chose Go", "the 250K token ceiling", "Cedar governs all tool access". These are loaded into every context reload.
Tasks inherit tags from the requirements they address. When planning produces
tasks from requirements, the task's tags field is seeded from the
requirement:
# requirement
kind: requirement
id: req-001
tags: [sandbox, security]
# task generated from req-001
kind: task
id: task-007
tags: [sandbox, security] # inherited from req-001
validates: [req-001] # explicit link backThis ensures that when a worker picks up task-007, it automatically gets
the sandbox and security context.
DRAFT ──→ ACTIVE ──→ SATISFIED
│ │ │
│ └──→ WAIVED │ (user decides it's not needed)
│ │
└──────────────────────┴──→ OBSOLETE (superseded or removed)
- DRAFT: Written but not yet agreed upon. Not validated.
- ACTIVE: Agreed. Validators are run. Failures create alerts.
- SATISFIED: All validators pass consistently. Still checked.
- WAIVED: Explicitly skipped (with rationale in the file).
- OBSOLETE: No longer relevant. Kept for history.
Cedar naturally maps to agent authorization:
- Principal:
Worker::"worker-1"orSession::"session-abc" - Action:
Action::"tool/bash",Action::"tool/write",Action::"file/read",Action::"http/GET" - Resource:
File::"/path/to/file",URL::"https://api.example.com/v1",Directory::"/workspace"
Session::"session-abc"
├── Worker::"worker-1"
└── Worker::"worker-2"
Directory::"/workspace"
├── File::"/workspace/main.go"
├── File::"/workspace/go.mod"
└── Directory::"/workspace/internal"
└── File::"/workspace/internal/foo.go"
ToolSet::"core"
├── Action::"tool/read"
├── Action::"tool/write"
├── Action::"tool/edit"
├── Action::"tool/bash"
├── Action::"tool/grep"
├── Action::"tool/find"
└── Action::"tool/ls"
ToolSet::"network"
├── Action::"tool/http"
└── Action::"tool/fetch"
// Default: allow core tools within the workspace directory
permit (
principal is Worker,
action in ToolSet::"core",
resource in Directory::"/workspace"
);
// Deny access to secrets
forbid (
principal,
action,
resource in Directory::"/Users/you/.ssh"
);
forbid (
principal,
action,
resource in Directory::"/Users/you/.aws"
);
// Allow HTTP tool only through the proxy (leash)
permit (
principal is Worker,
action == Action::"tool/http",
resource
)
when { context.via_proxy == true };
// Allow network tools for specific sessions
permit (
principal in Session::"research-session",
action in ToolSet::"network",
resource
)
when { resource.host == "api.github.com" };
// Block writes to go.sum (only go mod tidy should touch it)
forbid (
principal,
action == Action::"tool/write",
resource == File::"/workspace/go.sum"
);
The policy engine compiles Cedar policies into macOS seatbelt profiles. This is a one-way transformation at session/worker startup:
Cedar policy → Intermediate representation → SBPL profile
The compiler walks Cedar policies and extracts:
-
File paths from
permit/forbidrules where resource isFile::orDirectory::permit ... resource in Directory::"/workspace"→(allow file-read* (subpath "/workspace"))forbid ... resource in Directory::"/Users/you/.ssh"→(deny file-read* (subpath "/Users/you/.ssh"))
-
Network rules from policies involving
Action::"tool/http"orToolSet::"network"- Default:
(deny network*)then(allow network-outbound (remote ip "localhost:*")) - If policies allow specific hosts, configure the localhost proxy accordingly
- Default:
-
Process execution from
Action::"tool/bash"policies- Allowed executables list →
(allow process-exec (literal "/usr/bin/..."))
- Allowed executables list →
The seatbelt profile is the enforcement floor. Cedar is also checked at the application layer before tool execution. The seatbelt profile prevents escapes — even if there's a bug in the Go authorization code, the kernel won't allow forbidden operations.
Generated seatbelt profile template:
(version 1)
(deny default)
;; System libraries (always needed)
(allow file-read* (subpath "/usr/lib") (subpath "/System") (subpath "/Library"))
(allow file-read* (subpath "/dev/null") (subpath "/dev/urandom"))
;; Workspace access (from Cedar)
(allow file-read* (subpath (param "WORKSPACE")))
(allow file-write* (subpath (param "WORKSPACE")))
;; Temp directory
(allow file-write* (subpath (param "TMPDIR")))
;; Denied paths (from Cedar forbid rules)
(deny file-read* (subpath (param "SECRETS_DIR_0")))
;; Network: localhost only (proxy handles allowlisting)
(allow network-outbound (remote ip "localhost:*"))
;; Process execution (from Cedar)
(allow process-exec (literal "/bin/sh"))
(allow process-exec (literal "/usr/bin/env"))
(allow process-fork)Worker requests tool execution
│
├─→ Cedar Authorize(policySet, entities, request)
│ │
│ ├─→ DENY → return error to LLM, log denial
│ │
│ └─→ ALLOW → proceed to sandbox execution
│
└─→ sandbox-exec -f <compiled-profile> <command>
│
└─→ kernel enforces seatbelt (defense in depth)
| Tool | Description | Cedar Action |
|---|---|---|
| read | Read file contents (with line limits) | Action::"tool/read" |
| write | Create or overwrite a file | Action::"tool/write" |
| edit | Apply a targeted string replacement | Action::"tool/edit" |
| bash | Execute a shell command (sandboxed) | Action::"tool/bash" |
| grep | Search file contents with regex | Action::"tool/grep" |
| find | Find files by glob pattern | Action::"tool/find" |
| ls | List directory contents | Action::"tool/ls" |
These 7 tools are the default set, matching pi-mono's philosophy. They're sufficient for most coding tasks.
Additional tools can be enabled via Cedar policy:
| Tool | Description | Cedar Action |
|---|---|---|
| http | Make HTTP requests (via proxy) | Action::"tool/http" |
| task | Create/update kanban tasks | Action::"tool/task" |
| session_read | Read another session's context | Action::"tool/session_read" |
| plan | Write/update the plan document | Action::"tool/plan" |
The tool registry checks Cedar before exposing tools in the LLM's tool
schema. If a policy doesn't permit Action::"tool/http" for the current
worker, that tool simply isn't included in the API call.
The HTTP tool is designed to never expose secrets to the agent.
Architecture options (see Section 10.1 for detailed trade-offs):
Recommended: Built-in proxy sidecar inspired by leash
Rather than depending on the full leash container setup, sqzrd runs a lightweight HTTP proxy on localhost that:
- Receives requests from the sandboxed agent (which can only reach localhost)
- Checks Cedar policy for the target URL/method
- Injects authorization headers from a credential store the agent cannot access
- Forwards the request
- Returns the response (optionally filtering sensitive headers)
- Logs the request for audit
Credential store:
~/.sqzr/credentials/
├── github.json # {"header": "Authorization", "value": "Bearer ghp_..."}
├── openai.json # {"header": "Authorization", "value": "Bearer sk-..."}
└── internal-api.json
The agent sees the http tool but never sees credentials. Cedar policies
control which URLs the agent can access:
permit (
principal is Worker,
action == Action::"tool/http",
resource
)
when { resource.host == "api.github.com" && context.method in ["GET"] };
The proxy maps resource.host to the appropriate credential file.
A session is a logical unit of work. It has:
- A unique ID
- A filesystem-backed context store (Section 2.2)
- A kanban task board (Section 3)
- One or more attached workers
- A Cedar policy scope (which policies apply)
Sessions are created explicitly (via gRPC or TUI) and persist across daemon restarts.
A worker is a single LLM API connection within a session. Workers:
- Execute against a specific model/provider
- Share the session's context store (read-write)
- Share the session's kanban board
- Are independently sandboxed (each gets its own seatbelt profile)
- Can be added/removed from sessions dynamically
Multi-worker patterns:
- Parallel execution: Multiple workers tackle different tasks from the kanban board simultaneously
- Specialist workers: One worker uses Claude for reasoning, another uses GPT-4 for code generation
- Review worker: A dedicated worker reviews artifacts produced by others
Sessions can read (but not write to) other sessions' context stores. This is implemented via:
- The
session_readtool, which is Cedar-gated - A read-only view of the target session's
context/andkanban/directories - Entity hierarchy:
Session::"target" in Workspace::"default"allows policies to scope cross-session access
// Allow research sessions to read from any session in the workspace
permit (
principal in Session::"research",
action == Action::"tool/session_read",
resource in Workspace::"default"
);
CREATE ──→ ACTIVE ──→ SUSPENDED ──→ ACTIVE (resume)
│ │
└──→ ARCHIVED ←──────────┘
- ACTIVE: Has at least one worker. Accepting commands.
- SUSPENDED: No workers. Context is on disk. Can resume instantly.
- ARCHIVED: Explicitly closed. Read-only. Kept for reference.
type Provider interface {
// Stream sends a request and returns a channel of streaming events.
Stream(ctx context.Context, req *ChatRequest) (<-chan StreamEvent, error)
// Models returns available models for this provider.
Models() []ModelInfo
// CountTokens estimates token count for messages.
CountTokens(messages []Message) (int, error)
}
type ChatRequest struct {
Model string
Messages []Message
Tools []ToolDef
System string
MaxTokens int
Temperature float64
// Provider-specific options
Extra map[string]any
}
type StreamEvent struct {
Type EventType // TextDelta, ToolCall, Usage, Done, Error
Text string
Tool *ToolCallEvent
Usage *UsageEvent
Error error
}| Provider | Notes |
|---|---|
| Anthropic | Primary. Messages API with streaming. Extended thinking. |
| OpenAI | Chat completions API. Responses API for newer models. |
| Google Gemini | Gemini API with streaming. |
Additional providers can be added by implementing the Provider interface.
The system is not provider-specific in its core — tool definitions, message
formats, and streaming events are normalized.
Each provider implements CountTokens using their respective tokenizer
(or estimation heuristic). The session manager uses this to:
- Track cumulative context size
- Trigger context reloads at the 200K threshold
- Select how many recent turns to include in a reload
- Enforce the 250K hard ceiling
service Sqzr {
// Session management
rpc CreateSession(CreateSessionRequest) returns (Session);
rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse);
rpc GetSession(GetSessionRequest) returns (Session);
rpc ArchiveSession(ArchiveSessionRequest) returns (Session);
// Worker management
rpc AttachWorker(AttachWorkerRequest) returns (Worker);
rpc DetachWorker(DetachWorkerRequest) returns (Empty);
// Interactive streaming — bidirectional
rpc Connect(stream ClientEvent) returns (stream ServerEvent);
// Task management
rpc ListTasks(ListTasksRequest) returns (ListTasksResponse);
rpc UpdateTask(UpdateTaskRequest) returns (Task);
// Policy management
rpc ReloadPolicies(Empty) returns (ReloadPoliciesResponse);
// Requirements & validation
rpc ValidateRequirements(ValidateRequest) returns (stream ValidationEvent);
rpc ListRequirements(ListRequirementsRequest) returns (ListRequirementsResponse);
}The Connect RPC is the primary interaction channel:
ClientEvent types:
UserMessage— user sends a message to a sessionAbort— cancel the current generationTaskAction— move a task, update a taskSessionSwitch— change which session the client is viewing
ServerEvent types:
TextDelta— streaming text from the modelToolCall— tool invocation (name, args)ToolResult— tool execution resultTaskBlocked— a task has been blocked (alert!)TaskUpdate— task state changeSessionUpdate— session metadata changeValidationResult— requirement validation pass/failRequirementFailed— a requirement's validators failed (alert!)Error— error event
The TUI client (sqzr binary) connects to sqzrd via gRPC and provides:
- Session list with status indicators
- Active session view with streaming output
- Kanban board view (TODO/IN-PROGRESS/BLOCKED/DONE)
- Task detail view with block reason
- Policy violation notifications
- Multi-session switching (tabs or split)
Library choice: bubbletea — mature Go TUI framework, good for streaming content.
sqzrd (parent daemon)
├── gRPC server (goroutine)
├── policy engine (goroutine, watches policy files)
├── proxy server (goroutine, localhost HTTP proxy)
├── session manager
│ ├── session-A
│ │ ├── worker-1 (goroutine, LLM stream)
│ │ └── worker-2 (goroutine, LLM stream)
│ └── session-B
│ └── worker-1 (goroutine, LLM stream)
└── tool executor
└── sandbox-exec child processes (forked per tool call)
- Load Cedar policies from
~/.sqzr/policies/*.cedar - Compile seatbelt profile templates
- Start the HTTP proxy on a random localhost port
- Start the gRPC server on a configured address (default:
unix:///tmp/sqzr.sock) - Restore any ACTIVE or SUSPENDED sessions from disk
- Begin accepting connections
- SIGHUP: Reload Cedar policies and recompile seatbelt profiles
- SIGTERM/SIGINT: Graceful shutdown — suspend all sessions, stop workers
- SIGUSR1: Dump session state to stderr (debugging)
Option A: Depend on Leash directly
- Pros: Full-featured, battle-tested, container isolation, web UI
- Cons: Requires Docker/Podman, heavy dependency, redundant with our seatbelt sandboxing, adds container orchestration complexity
Option B: Built-in lightweight proxy (Recommended)
- Pros: Single binary, no container dependency, Cedar-native policy, simpler deployment, credential injection fits naturally into our auth model
- Cons: Less isolation than a container boundary, must implement ourselves
Option C: Build on leash as a library
- Pros: Reuse leash's proxy logic without the container model
- Cons: Leash is Go but tightly coupled to its container model, would require significant forking
Decision: Option B. sqzr already has seatbelt for process isolation and Cedar for policy. A built-in proxy keeps the architecture simple and self-contained. We can revisit container-based isolation later if needed (e.g., Linux support where seatbelt isn't available).
Option A: In-flight compaction (pi-mono style)
- Summarize older context, keep recent turns
- Pros: No visible interruption, gradual degradation
- Cons: Lossy, summary quality varies, still accumulates drift
Option B: Hard restart with filesystem reload (Recommended)
- Kill session at threshold, restart with curated context from disk
- Pros: Predictable quality, clean context window, all context preserved on disk, simple to reason about
- Cons: Visible restart (brief pause), requires good summary generation
Decision: Option B. The filesystem-backed approach means nothing is ever lost. The model gets a clean, high-quality context window every time. The restart is a feature, not a bug — it prevents the degradation that compaction only delays.
Option A: Sub-agent spawning (tool that creates new sessions)
- Pros: Familiar pattern, explicit delegation
- Cons: Complex lifecycle management, hard to share context, unclear ownership
Option B: Shared filesystem with read-only cross-access (Recommended)
- Pros: Simple, auditable, Cedar-governed, no new protocol needed
- Cons: Eventually consistent (file writes aren't instant), no real-time notification between sessions
Option C: Message passing between sessions
- Pros: Real-time, explicit communication
- Cons: Complex protocol, needs its own Cedar policy layer, over-engineering
Decision: Option B. Sessions write to their own context store. Other
sessions read from it via the session_read tool. This is simple, auditable,
and naturally governed by Cedar. If real-time coordination is needed later,
we can add a lightweight event bus.
Option A: Workers as goroutines (Recommended)
- Pros: Cheap, fast communication, shared memory for session state
- Cons: A panic in one worker could crash the daemon
Option B: Workers as child processes
- Pros: Process isolation, crash containment
- Cons: Complex IPC, expensive, over-engineering for this use case
Decision: Option A. Use goroutines with panic recovery. Tool execution (the risky part) is already isolated in sandbox-exec child processes.
Option A: gRPC with bidirectional streaming (Recommended)
- Pros: Well-defined, supports remote access, good Go libraries, protobuf schema, works over TCP or Unix socket
- Cons: Slightly heavier than a custom protocol
Option B: Custom JSON-lines over Unix socket
- Pros: Simpler, no protobuf dependency
- Cons: Must define our own framing, no codegen, harder to extend
Decision: Option A. gRPC gives us remote access for free and the protobuf schema serves as documentation. The overhead is negligible.
An interesting consequence of Cedar-governed tools: the set of tools exposed to the LLM in each API call is dynamic and policy-dependent.
At the start of each turn:
- Enumerate all registered tools
- For each tool, construct a Cedar request:
principal=<worker>, action=<tool-action>, resource=<workspace> - Evaluate against the policy set
- Only include tools that are
Allowin the LLM's tool schema
This means:
- A restrictive policy automatically reduces the tool set
- Different workers in the same session can have different tools
- Policy changes take effect on the next turn without restarts
sqzr/
├── cmd/
│ ├── sqzrd/ # Daemon entry point
│ └── sqzr/ # TUI client entry point
├── internal/
│ ├── daemon/ # Daemon lifecycle, signal handling
│ ├── session/ # Session & worker management
│ ├── context/ # Filesystem context store, summaries
│ ├── kanban/ # Task board implementation
│ ├── artifacts/ # Tagged artifact system (decisions, requirements, tasks)
│ ├── validate/ # Requirement validation runner
│ ├── policy/ # Cedar policy engine, seatbelt compiler
│ ├── sandbox/ # Seatbelt profile generation & execution
│ ├── provider/ # LLM provider abstraction
│ │ ├── anthropic/
│ │ ├── openai/
│ │ └── google/
│ ├── tools/ # Tool implementations
│ │ ├── read.go
│ │ ├── write.go
│ │ ├── edit.go
│ │ ├── bash.go
│ │ ├── grep.go
│ │ ├── find.go
│ │ ├── ls.go
│ │ ├── http.go
│ │ ├── task.go
│ │ └── session_read.go
│ ├── proxy/ # HTTP proxy for credential injection
│ └── tui/ # TUI components (bubbletea)
├── proto/
│ └── sqzr/v1/ # Protobuf definitions
├── policies/ # Default Cedar policies
│ ├── default.cedar
│ └── examples/
├── go.mod
└── go.sum
- Go module setup, basic project structure
- Provider abstraction with Anthropic support
- Core tool implementations (read, write, edit, bash, grep, find, ls)
- Basic agent loop: system prompt → user message → stream → tool calls → loop
- Tagged artifact system (frontmatter parsing, tag index)
- Filesystem context store (turns, decisions/, requirements/)
- Token tracking and 250K hard ceiling with restart
- Cedar policy engine integration (using cedar-go)
- Tool authorization (Cedar check before every tool call)
- Seatbelt profile compilation from Cedar
- Sandboxed bash execution via sandbox-exec
- Dynamic tool schema based on policy
- Session manager with create/list/archive
- Kanban task board (filesystem-backed, tagged)
- Requirements as first-class artifacts with validators
- Validation runner (on-demand, on-task-completion, periodic)
- Planning workflow (plan document + task generation with tag inheritance)
- Tag-based context loading on session restart
- Blocked task detection and alerting
- Protobuf definitions (including ValidateRequirements RPC)
- gRPC server in sqzrd
- Bidirectional streaming (Connect RPC)
- TUI client with bubbletea
- Session switching, kanban view, streaming output
- Validation results display and requirement failure alerts
- Multiple workers per session
- Cross-session read access (session_read tool)
- HTTP proxy with credential injection
- HTTP tool with Cedar-governed URL allowlisting
- SIGHUP policy reload
- Graceful shutdown and session persistence
- Audit logging
- Error recovery (worker crash, provider timeout)
- Documentation and default policy examples
-
Linux support: Seatbelt is macOS-only. For Linux, the equivalent would be seccomp-bpf + mount namespaces (or just containers). Should we abstract the sandbox interface now, or defer Linux support?
-
Token counting accuracy: Provider tokenizers differ. Should we use exact tokenizer libraries (tiktoken for OpenAI, Anthropic's tokenizer) or a rough heuristic (chars/4)?
-
Summary generation model: Should the summary be generated by the same model/provider as the session, or a cheaper/faster model? Using a cheaper model saves cost but might miss nuance.
-
Kanban task assignment: When multiple workers are available, how are tasks assigned? Options: round-robin, explicit assignment, workers self-select from TODO queue.
-
Cross-session write: Currently read-only. Are there cases where sessions should be able to write to each other's context? (E.g., a coordinator session assigning tasks to worker sessions.)
-
Policy hot-reload scope: When policies are reloaded via SIGHUP, should active workers be immediately re-evaluated (potentially losing tool access mid-turn), or should changes only apply to new turns?
-
Tag cardinality: Should there be a recommended max number of tags per artifact? Too many tags dilute the signal; too few lose the benefit of selective loading. Practical experience will inform this.
-
Requirement scope: Are requirements session-scoped (live inside a session's context store) or workspace-scoped (shared across all sessions)? Session-scoped is simpler but means duplicate requirements across sessions working on the same codebase. Workspace-scoped is more natural but needs its own directory and access control.
-
Validator trust: Validators are shell commands. Should they run in the same seatbelt sandbox as agent tool calls, or a less restrictive one? They need to read build artifacts and run tests, which may require more access than the agent's write sandbox.
-
Tag-based context budget: When loading tag-relevant decisions and requirements, how much of the token budget should they consume? Fixed allocation (e.g., 20K tokens) vs. proportional (e.g., 10% of budget)?
| Layer | Mechanism | What it prevents |
|---|---|---|
| Application | Cedar policy evaluation | Unauthorized tool use, file access, HTTP requests |
| Kernel | macOS seatbelt (sandbox-exec) | Sandbox escapes, direct syscall bypasses |
| Network | Localhost-only + proxy | Direct outbound connections, credential exposure |
| Credential | Proxy-injected auth, separate store | Agent seeing API keys or tokens |
| Context | 250K ceiling + restart | Model degradation, prompt injection accumulation |
| Cross-session | Read-only + Cedar-gated | Unauthorized cross-session data access |
Defense in depth: even if the Cedar check has a bug, seatbelt prevents the operation at the kernel level. Even if seatbelt is misconfigured, the proxy prevents credential exposure. Each layer independently limits blast radius.