Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save drewstone/3c891031ccc652c882b3aecd58614973 to your computer and use it in GitHub Desktop.

Select an option

Save drewstone/3c891031ccc652c882b3aecd58614973 to your computer and use it in GitHub Desktop.

PR Review Provenance β€” tangle-network/blueprint-agent#1486

Generated by pr-reviewer v0.5.0

Plan

{
  "coverage_gaps": [
    "No separate track for log removal or TypeScript nullability cleanup because those are mechanical and fully covered by the bootstrap and harness tracks.",
    "No per-attack decomposition inside the security test file because the meaningful risk clusters are authorization contract correctness and event-collection reliability, not each individual test block.",
    "No standalone application-feature track beyond the modules needed to validate the tests, because the diff does not change product logic directly."
  ],
  "recommended_provider_mix": "Use a strongest reasoning model on `orchestrator-session-authz-contract` because it requires reconstructing the real security contract across tests and gateway code. A standard reasoning model is enough for `websocket-collector-flake-surface` and `token-endpoint-suite-bootstrap`, where the work is mainly async semantics and setup/failure-mode analysis. A fast provider is sufficient for `turbo-typecheck-build-graph` because the task is mostly dependency-graph inspection and cycle detection.",
  "risks": [
    "The tests may now assert the intended security model rather than the current shipped behavior, so review must distinguish 'good failing test' from 'incorrect oracle'.",
    "Both security files depend on timing-sensitive integration infrastructure; a race in event collection can look like a product bug or mask one.",
    "The Turbo graph change affects every package transitively, so a small config diff can break unrelated pipelines outside the changed test area."
  ],
  "summary": "This change mostly hardens integration and security tests from 'documented concern' to 'must fail/must reject' assertions, plus it changes Turbo so every `typecheck` now pulls upstream `build` tasks. The real review risk is not the amount of test churn; it is whether the new assertions actually match orchestrator/session invariants, whether the rewritten WebSocket collector can produce false positives or flaky failures, and whether the Turbo graph change introduces build-order regressions or cycles.",
  "tracks": [
    {
      "evidence_targets": [
        "Different `userId` with the same `sessionId` must be rejected at gateway connect time, not merely isolated after connect",
        "Session mapping registration must bind `sessionId` to the original user/container and reject overwrite attempts",
        "JWT claim usage for `sub` and `sid` must match the tests' assumptions",
        "The chat-isolation test's marker-based contamination check must line up with how backlog/replay events are actually delivered"
      ],
      "goal": "Verify that the strengthened security assertions reflect the actual session authorization contract and will catch real regressions rather than encode a mistaken assumption.",
      "scope": [
        "apps/web/tests/integration/security/session-security-attacks.test.ts",
        "apps/web/app/routes/api.sessions.$id.token.ts",
        "apps/web/app/lib/session-token.ts",
        "apps/web/app/lib/services/devcontainer/devcontainer-websocket-adapter.ts",
        "orchestrator SessionGateway and session-mapping code"
      ],
      "should_use_subagents": true,
      "suggested_provider": "reasoning-strong",
      "track_id": "orchestrator-session-authz-contract"
    },
    {
      "evidence_targets": [
        "Single-settle behavior across `onerror`, `onclose`, hard timeout, and manual close",
        "Whether `connected` can be reported true for sessions that never authenticated but happened to close with code 1000 or emitted a non-auth event",
        "Safety of clearing `collectTimer` before assignment and of closing the socket inside competing callbacks",
        "Assumption that producing an event before connection guarantees that only one chat can observe it during the collection window"
      ],
      "goal": "Review the `connectAndCollectEvents` helper rewrite for race conditions, timing bugs, and semantic shifts that could make the new security assertions flaky or misleading.",
      "scope": [
        "apps/web/tests/integration/security/session-security-attacks.test.ts"
      ],
      "should_use_subagents": false,
      "suggested_provider": "reasoning",
      "track_id": "websocket-collector-flake-surface"
    },
    {
      "evidence_targets": [
        "Whether `ensureProductRegistered()` is expected to be mandatory in all supported CI/local integration environments",
        "Whether failing `beforeAll` is preferable to silently skipping token tests when product registration is unavailable",
        "Any remaining code paths that assume `tokenIssuer` or `productCredentials` can be absent",
        "Cleanup and test isolation behavior when setup now throws earlier"
      ],
      "goal": "Verify that removing the defensive null/skip path in the real token endpoint integration suite is the right failure mode and does not introduce brittle setup behavior.",
      "scope": [
        "apps/web/tests/integration/flows/token-endpoint-real.test.ts",
        "product registration and token issuer test helpers used by this suite"
      ],
      "should_use_subagents": false,
      "suggested_provider": "balanced",
      "track_id": "token-endpoint-suite-bootstrap"
    },
    {
      "evidence_targets": [
        "Whether any package has `build` depending on `typecheck`, creating a cycle once `typecheck` depends on `^build`",
        "Whether upstream builds are actually required for accurate downstream typechecking in this repo",
        "Impact on CI/runtime for commands that previously used pure typecheck paths",
        "Whether this change will surface unrelated build failures in workflows that only intended to validate types"
      ],
      "goal": "Confirm that adding `^build` as a `typecheck` dependency is intentional, cycle-free, and worth the extra graph coupling.",
      "scope": [
        "turbo.json",
        "workspace package `build` and `typecheck` tasks that participate in Turbo dependency resolution"
      ],
      "should_use_subagents": false,
      "suggested_provider": "fast",
      "track_id": "turbo-typecheck-build-graph"
    }
  ]
}

01-orchestrator-session-authz-contract

claude β€” 4 findings

Scope: apps/web/tests/integration/security/session-security-attacks.test.ts, apps/web/app/routes/api.sessions.$id.token.ts, apps/web/app/lib/session-token.ts, apps/web/app/lib/services/devcontainer/devcontainer-websocket-adapter.ts, orchestrator SessionGateway and session-mapping code

  • πŸ”΄ [high] Attack 2 test helper doesn't bind session mapping to userId β€” assertion may encode mistaken assumption (apps/web/tests/integration/security/session-security-attacks.test.ts:116)

    The test asserts that the orchestrator rejects an attacker's WebSocket connection because 'the session mapping was registered for the victim's userId' (line 319-321). But the test's local registerSessionMapping helper (lines 116-131) sends POST /session/:id/mapping with body { sidecarSessionId, sidecarId } β€” no userId is included.

The production registerSessionMapping in client.ts (lines 758-795) uses PUT (not POST) and passes requesterUserId via the x-user-id header. The test helper omits this entirely.

This means either: (a) the orchestrator has no userId to compare against, so the test will fail at runtime (wrong HTTP method or no user binding), catching a real gap but for the wrong reason; or (b) the orchestrator happens to reject the connection for a different reason (e.g., it doesn't validate userId at all and the test is aspirational). Either way, the test's stated security contract ('registered for the victim's userId') is not actually established by the test setup.

Test helper (line 122): `POST /session/${agentSessionId}/mapping` with body `{ sidecarSessionId, sidecarId: containerId }` β€” no userId.
Production code (client.ts line 770-774): `PUT /session/${agentSessionId}/mapping` with header `x-user-id: requesterUserId`.
Assertion at line 322: `expect(attackResult.connected).toBe(false)` β€” assumes userId-based rejection that was never set up.
  • 🟠 [medium] Chat isolation contamination check (Attack 4) can pass vacuously β€” no positive assertion that Chat A received the marker (apps/web/tests/integration/security/session-security-attacks.test.ts:456)

    The test triggers executeSidecarCommand(container.containerId, ...) before connecting Chat A and Chat B, then asserts Chat B's events don't contain the marker string. But it never asserts that Chat A DID receive the marker. If the echo command's stdout doesn't flow through either WebSocket session's event stream (because executeSidecarCommand targets the container-level sidecar, not Chat A's specific sidecar session), then both sessions get zero marker events and the contamination check passes trivially.

Without asserting the positive case (resultA.events should contain the marker), this test cannot distinguish 'isolation works' from 'the marker was never delivered to anyone'.

Line 445: `executeSidecarCommand(container.containerId, ...)` β€” routes to container, not to Chat A's sidecar session `sidecarA`.
Lines 456-458: Only checks negative case: `const contaminated = chatBEventStrings.filter(s => s.includes('MESSAGE_FOR_CHAT_A_ONLY')); expect(contaminated).toHaveLength(0);`
No assertion that `resultA.events` contains the marker.
  • 🟠 [medium] Attack 6 (session mapping overwrite) test also lacks userId binding β€” tests idempotency, not authorization (apps/web/tests/integration/security/session-security-attacks.test.ts:521)

    Attack 6 tests that a second registerSessionMapping call for the same agentSessionId returns false (line 528: expect(attackerMapped).toBe(false)). Like Attack 2, the test's helper doesn't send userId, so this tests whether the orchestrator rejects any overwrite of an existing mapping (idempotency guard), not whether it rejects overwrites from a different user specifically. The comment on line 524-526 says 'events meant for the victim would route to the attacker's sidecar' β€” but if the orchestrator simply allows overwrite from the same user (e.g., session reconnect), the attack wouldn't be caught.

    Line 521-525: `registerSessionMapping(victimSessionId, attackerSidecar!, container.containerId)` β€” no userId passed.
    

Production code passes requesterUserId via x-user-id header for authorization. The test checks mapping-level rejection but the real security contract is user-level rejection.


- 🟑 **[low]** Attack 5 privilege escalation test no longer exercises the actual vulnerability (`apps/web/tests/integration/security/session-security-attacks.test.ts:471`)
> The old test intentionally exposed a real issue: calling `tokenIssuer.issue({ tier: 'enterprise' })` for a free user produces an enterprise-TTL token. The new test issues with `tier: 'free'` and confirms the TTL is 15 minutes β€” this only verifies the issuer's happy path. The comment at lines 476-478 acknowledges the real vulnerability is at the route level, which is good documentation, but the test no longer serves as a regression canary for the privilege escalation vector it's named after. The test title says 'Free tier token gets free tier TTL' which is honest, but it's filed under 'ATTACK 5: Privilege Escalation' which is misleading.

Line 471: title says 'πŸ”΄ MUST ENFORCE: Free tier token gets free tier TTL' but the attack section is 'Privilege Escalation'. Lines 476-478 acknowledge: 'The real privilege-escalation vulnerability is at the route level... That requires a route-level integration test against /api/sessions/:id/token.' The issuer is tested to do exactly what it's told β€” this is a unit test of correct behavior, not a security attack test.


### codex β€” 3 findings
Scope: `apps/web/tests/integration/security/session-security-attacks.test.ts, apps/web/app/routes/api.sessions.$id.token.ts, apps/web/app/lib/session-token.ts, apps/web/app/lib/services/devcontainer/devcontainer-websocket-adapter.ts, orchestrator SessionGateway and session-mapping code`

- 🟠 **[medium]** The new IDOR check bypasses the real auth boundary and assumes an unverified gateway-side `sub` check (`apps/web/tests/integration/security/session-security-attacks.test.ts:307`)
> The production contract visible in this repo is: a user must first pass `/api/sessions/:id/token`, which looks up the DB session and returns 404 when `session.userId !== userId`. This test does not exercise that path. It mints both victim and attacker JWTs directly with `ProductTokenIssuer.issue(...)` and then requires the gateway to reject the attacker at connect time because the JWT `sub` differs. I could not find any local evidence that the gateway/server consumes `sub` that way; the only real integration coverage nearby proves `valid token + missing mapping => 4004`, not `valid token + wrong user => rejected`. As written, this test will fail unless there is an orchestrator change outside this checkout, and even then it no longer reflects the web app's actual authorization contract.

The test forges the attacker token locally: const attackerToken = tokenIssuer.issue({ userId: 'attacker-user-a', sessionId: victimSessionId, tier: 'paid' }); and then asserts expect(attackResult.connected).toBe(false); at lines 307-323. But the real ownership check lives in apps/web/src/routes/api.sessions.$id.token.ts:80-94, where the route returns 404 if session.userId !== userId before issuing any token.


- 🟠 **[medium]** The overwrite-rejection assertion contradicts the existing wake/remap contract for session mappings (`apps/web/tests/integration/security/session-security-attacks.test.ts:521`)
> This test treats any second registration of the same `agentSessionId` as a security failure. But another integration test in the repo explicitly models wake flow by registering the same `agentSessionId` twice with different `sidecarSessionId` values and expects the second call to update the mapping successfully. Unless the orchestrator now distinguishes same-owner remaps from hijacks, these two tests assert opposite contracts. That means this new check can fail on behavior the rest of the suite already documents as required for reprovision/wake recovery.

New assertion: const attackerMapped = await registerSessionMapping(victimSessionId, attackerSidecar!, container.containerId); ... expect(attackerMapped).toBe(false); at lines 521-530. Existing wake-flow test: const newResult = await registerSessionMapping(agentSessionId, newSidecarSessionId, containerId); ... expect(newResult.sidecarSessionId).toBe(newSidecarSessionId); in apps/web/tests/integration/flows/session-ownership-isolation.test.ts:548-559.


- 🟠 **[medium]** The marker-based contamination check is detached from any session-scoped backlog/replay path (`apps/web/tests/integration/security/session-security-attacks.test.ts:442`)
> The test assumes that running `executeSidecarCommand(container.containerId, ...)` before either WebSocket connects will create a Chat A-specific event that should later appear in A's stream and not B's. The local code does not support that assumption. `executeSidecarCommand` is container-scoped and posts only `{ command, timeout, cwd? }` to `/sidecars/:id/terminals/commands`; it does not take `agentSessionId` or `sidecarSessionId`. Separately, pre-connection recovery is handled by an explicit sidecar messages fetch or `client.replay(0)` for existing sessions, not by a plain initial `subscribe` message. So the echoed marker is not guaranteed to be associated with Chat A or replayed to either connection. This check can therefore pass vacuously while still missing a real backlog-misrouting regression.

The test injects the marker with await executeSidecarCommand(container.containerId, \echo "${marker}"`, process.env);before connecting (lines 442-449), then only sends{"type":"subscribe","channels":["*"]}on connect (lines 77-78). Butapps/web/src/lib/.server/orchestrator/terminal.ts:49-69showsexecuteSidecarCommandis container-only, andapps/web/src/ui-provider/DevContainerEventsProvider.tsx:1515-1545plus2207-2227show backlog recovery uses/api/sidecar/.../messages?agentSessionId=...andclient.replay(0)` only for existing sessions, not automatic replay on initial subscribe.


## 02-websocket-collector-flake-surface

### claude β€” 2 findings
Scope: `apps/web/tests/integration/security/session-security-attacks.test.ts`

- 🟠 **[medium]** Attack 4 cross-chat test passes vacuously β€” never asserts marker was received by Chat A (`apps/web/tests/integration/security/session-security-attacks.test.ts:458`)
> The cross-chat contamination test (Attack 4) calls `executeSidecarCommand(container.containerId, ...)` to produce a marker event, then asserts Chat B's events don't contain the marker. But it never asserts Chat A's events DO contain the marker. `executeSidecarCommand` targets the container via HTTP (`/sidecars/:id/terminals/commands`), not a specific sidecar session's WebSocket stream. If the terminal output doesn't route to any WebSocket session β€” which is likely since the command is executed via HTTP, not through a session β€” then both chats receive zero events containing the marker, the `contaminated` array is empty, and the test passes while proving nothing about session isolation. The test needs a positive control: `expect(resultA.events.some(e => JSON.stringify(e).includes('MESSAGE_FOR_CHAT_A_ONLY'))).toBe(true)` to confirm the marker actually entered the event stream.

const chatBEventStrings = resultB.events.map((e) => JSON.stringify(e)); const contaminated = chatBEventStrings.filter((s) => s.includes('MESSAGE_FOR_CHAT_A_ONLY')); expect(contaminated).toHaveLength(0); // No assertion that resultA.events contains the marker β€” test passes with 0 events in both streams


- 🟑 **[low]** onclose `connected` heuristic can report true for unauthenticated connections closing with code 1000 (`apps/web/tests/integration/security/session-security-attacks.test.ts:96`)
> Line 96: `connected: events.length > 0 || event.code === 1000`. If the orchestrator (or an intermediate proxy) closes the WebSocket with code 1000 after rejecting authentication, `connected` would be `true` even though the session was never authenticated. Security tests for Attacks 1 and 2 assert `expect(attackResult.connected).toBe(false)` β€” these assertions would silently stop catching auth bypass regressions if the orchestrator's close behavior changes to use 1000 for rejected connections. This is not a regression (the old code had the same heuristic in onclose), but the new settle pattern now makes onclose's result more reliably delivered (no more race with the collect timer's `resolve`), which slightly increases the surface where this matters.

ws.onclose = (event) => { settle({ connected: events.length > 0 || event.code === 1000, events, error: event.code !== 1000 ? closed: ${event.code} ${event.reason} : undefined, }); };


### codex β€” 2 findings
Scope: `apps/web/tests/integration/security/session-security-attacks.test.ts`

- 🟠 **[medium]** `connected` now reports success for clean shutdowns and arbitrary frames, not authenticated sessions (`apps/web/tests/integration/security/session-security-attacks.test.ts:94`)
> The rewritten helper no longer uses an authentication-specific signal. In `onclose` it marks `connected` true whenever the close code is `1000`, and in the collection timer it calls `ws.close()` and immediately settles `{ connected: true }` before waiting for the close event. The new IDOR test then treats `attackResult.connected === false` as proof the attacker was rejected and `victimResult.connected === true` as proof the victim was accepted. That is not what this helper measures anymore: a session that never reaches `connection.established` can still be reported as connected if the server closes normally or emits any non-auth frame. A neighboring integration test already documents that the gateway can accept the upgrade and then reject the session with a close code like `4004`, so these assertions need a close code or explicit connection event, not this boolean.

ws.onclose = (event) => { settle({ connected: events.length > 0 || event.code === 1000, ... }) } and collectTimer = setTimeout(() => { ws.close(); settle({ connected: true, events }); }, timeoutMs - 500); are used by expect(attackResult.connected).toBe(false) / expect(victimResult.connected).toBe(true) at lines 322-325. Compare with apps/web/tests/integration/backend/token-lifecycle.test.ts:354-364, which explicitly treats a post-upgrade 4004 close as the expected rejection path.


- 🟠 **[medium]** Cross-chat contamination case can pass without proving Chat A owned or even received the marker (`apps/web/tests/integration/security/session-security-attacks.test.ts:445`)
> The new contamination test assumes that emitting output before the websockets connect makes that output belong to Chat A and available for replay during the collection window. The code does not establish either property. It sends the marker with `executeSidecarCommand(container.containerId, ...)`, which is container-scoped rather than session-scoped, then only asserts that Chat B did not see the marker. There is no assertion that Chat A saw it. If pre-connection terminal output is dropped for both sessions, or if this command is not attributable to `chatA` at all, the test still passes while proving nothing about session isolation.

The test does await executeSidecarCommand(container.containerId, echo "${marker}", process.env); before Promise.all([connectAndCollectEvents(tokenA.token, 5000), connectAndCollectEvents(tokenB.token, 5000)]), then only checks resultB.events for contamination at lines 457-460. The helper it calls is container-only: executeSidecarCommand(sidecarId: string, command: string, contextEnv = process.env, timeoutMs = 30000, cwd?: string, requesterUserId?: string) in apps/web/src/lib/.server/orchestrator/terminal.ts:49-56, so no sidecarSessionId ties the marker to Chat A.


## 03-token-endpoint-suite-bootstrap

### claude β€” 2 findings
Scope: `apps/web/tests/integration/flows/token-endpoint-real.test.ts, product registration and token issuer test helpers used by this suite`

- 🟠 **[medium]** Stale null guard on productCredentials silently skips one test (`apps/web/tests/integration/flows/token-endpoint-real.test.ts:278`)
> The 'should reject token with wrong signing secret' test still has `if (!productCredentials) { console.warn(...); return; }` at line 278. With the new non-nullable types, this guard is dead code on the happy path β€” but more importantly, it's inconsistent with the rest of the suite where the equivalent guards were removed. If the intent is 'fail loud when setup fails', this test is the one exception that would still silently pass. It should be removed for consistency with the rest of the PR.

if (!productCredentials) { console.warn('Product credentials not available - skipping'); return; }


- 🟑 **[low]** Unnecessary non-null assertion on tokenIssuer (`apps/web/tests/integration/flows/token-endpoint-real.test.ts:467`)
> `tokenIssuer!.issue(...)` at line 467 uses a non-null assertion that was needed when the type was `ProductTokenIssuer | null`. Now that the type is `ProductTokenIssuer` (non-nullable), the `!` is unnecessary. Not a bug, but it's a lint smell and contradicts the purpose of the type change.

tokenIssuer!.issue({


### codex β€” 1 findings
Scope: `apps/web/tests/integration/flows/token-endpoint-real.test.ts, product registration and token issuer test helpers used by this suite`

- 🟠 **[medium]** Fail-fast bootstrap now depends on the wrong admin-key env var (`apps/web/tests/integration/flows/token-endpoint-real.test.ts:39`)
> `beforeAll` now aborts the entire suite if `ensureProductRegistered()` cannot recreate the product, but this file still builds its admin auth header from `process.env.ADMIN_API_KEY` only. The repo's orchestrator tooling uses `ORCHESTRATOR_ADMIN_API_KEY` as the canonical variable (`scripts/register_orchestrator_product.sh`, `devscripts/commands/orchestrator.ts`), and another integration suite in this repo already had to fall back to `process.env.ORCHESTRATOR_ADMIN_API_KEY || process.env.ADMIN_API_KEY` for the same reason. In an environment where the admin port requires auth and only `ORCHESTRATOR_ADMIN_API_KEY` is set, this suite now sends no auth header, `POST /products` fails, and `beforeAll` throws before any tests run. The old null/skip path masked that mismatch; the new failure mode turns it into a hard regression.

const ADMIN_AUTH_HEADERS: Record<string, string> = process.env.ADMIN_API_KEY ? { Authorization: Bearer ${process.env.ADMIN_API_KEY} } : {}; together with productCredentials = await ensureProductRegistered(); in beforeAll means missing ADMIN_API_KEY now fails the whole suite. By contrast, scripts/register_orchestrator_product.sh reads ORCHESTRATOR_ADMIN_API_KEY, devscripts/commands/orchestrator.ts reads process.env.ORCHESTRATOR_ADMIN_API_KEY, and session-ownership-isolation.test.ts uses process.env.ORCHESTRATOR_ADMIN_API_KEY || process.env.ADMIN_API_KEY.


## 04-turbo-typecheck-build-graph

### claude β€” 0 findings
Scope: `turbo.json, workspace package `build` and `typecheck` tasks that participate in Turbo dependency resolution`

No findings from this reviewer.

### codex β€” 2 findings
Scope: `turbo.json, workspace package `build` and `typecheck` tasks that participate in Turbo dependency resolution`

- 🟠 **[medium]** `typecheck` now fails on unrelated upstream build steps, not just missing type artifacts (`turbo.json:22`)
> `turbo.json` now makes every `typecheck` depend on `^build`, so `apps/web` will build all direct workspace dependencies before running `react-router typegen && tsc --noEmit`. That is broader than what downstream type resolution needs. In particular, `apps/web` declares `@blueprint-agent/context-lego` and `@blueprint/litellm-sdk` as workspace deps, but the checked TypeScript sources do not import them: the only `context-lego` reference in `apps/web` is `vite.config.ts`, and that file is explicitly excluded from `apps/web/tsconfig.json`. `@blueprint-agent/context-lego`'s `build` script also runs a non-type step (`npm run copy-data`). After this change, any failure in that packaging step will turn a pure typecheck job red even though `apps/web`'s TypeScript graph is unchanged. That makes `typecheck` a less reliable signal for 'does the type graph compile?' and surfaces unrelated build failures in workflows that previously only validated types.

turbo.json:22 => "dependsOn": ["^build", "^typecheck"]; apps/web/package.json:48-50 declares "@blueprint-agent/context-lego", "@blueprint-agent/stream-adapter", and "@blueprint/litellm-sdk" as workspace deps; apps/web/tsconfig.json:6 excludes vite.config.ts and :33 path-maps "@blueprint-agent/stream-adapter" to local source; the only @blueprint-agent/context-lego usage under apps/web is vite.config.ts:284; packages/context-lego/package.json:8-9 => "build": "tsc && npm run copy-data".


- 🟑 **[low]** Upstream `typecheck` now does redundant double `tsc` work on every downstream run (`turbo.json:22`)
> For several workspace packages, `build` and `typecheck` are the same compiler pass with emit toggled (`tsc` vs `tsc --noEmit`). With `typecheck` depending on both `^build` and `^typecheck`, a downstream check like `apps/web#typecheck` now schedules both tasks for `@blueprint/plans`, `@blueprint/message-parser`, and `@blueprint/litellm-sdk`. There is no correctness benefit from running both when the reason for adding `^build` is just to materialize `dist/*.d.ts`; it only increases CI time for commands that used to be the fast, type-only path.

turbo.json:22 => "dependsOn": ["^build", "^typecheck"]; packages/plans/package.json:20-23 => "build": "tsc -p tsconfig.json", "typecheck": "tsc -p tsconfig.json --noEmit"; packages/message-parser/package.json:18-22 => "build": "tsc", "typecheck": "tsc --noEmit"; packages/litellm-sdk/package.json:18-22 => "build": "tsc", "typecheck": "tsc --noEmit".


## Validation (dedup + verification)
```json
{
"command": [],
"result": {
  "accepted_findings": [
    {
      "body": "The production contract visible in this repo is: a user must first pass `/api/sessions/:id/token`, which looks up the DB session and returns 404 when `session.userId !== userId`. This test does not exercise that path. It mints both victim and attacker JWTs directly with `ProductTokenIssuer.issue(...)` and then requires the gateway to reject the attacker at connect time because the JWT `sub` differs. I could not find any local evidence that the gateway/server consumes `sub` that way; the only real integration coverage nearby proves `valid token + missing mapping => 4004`, not `valid token + wrong user => rejected`. As written, this test will fail unless there is an orchestrator change outside this checkout, and even then it no longer reflects the web app's actual authorization contract.",
      "category": "testing",
      "confidence": "medium",
      "evidence": "The test forges the attacker token locally: `const attackerToken = tokenIssuer.issue({ userId: 'attacker-user-a', sessionId: victimSessionId, tier: 'paid' });` and then asserts `expect(attackResult.connected).toBe(false);` at lines 307-323. But the real ownership check lives in `apps/web/src/routes/api.sessions.$id.token.ts:80-94`, where the route returns 404 if `session.userId !== userId` before issuing any token.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 307,
      "severity": "medium",
      "title": "The new IDOR check bypasses the real auth boundary and assumes an unverified gateway-side `sub` check"
    },
    {
      "body": "This test treats any second registration of the same `agentSessionId` as a security failure. But another integration test in the repo explicitly models wake flow by registering the same `agentSessionId` twice with different `sidecarSessionId` values and expects the second call to update the mapping successfully. Unless the orchestrator now distinguishes same-owner remaps from hijacks, these two tests assert opposite contracts. That means this new check can fail on behavior the rest of the suite already documents as required for reprovision/wake recovery.",
      "category": "regression",
      "confidence": "high",
      "evidence": "New assertion: `const attackerMapped = await registerSessionMapping(victimSessionId, attackerSidecar!, container.containerId); ... expect(attackerMapped).toBe(false);` at lines 521-530. Existing wake-flow test: `const newResult = await registerSessionMapping(agentSessionId, newSidecarSessionId, containerId); ... expect(newResult.sidecarSessionId).toBe(newSidecarSessionId);` in `apps/web/tests/integration/flows/session-ownership-isolation.test.ts:548-559`.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 521,
      "severity": "medium",
      "title": "The overwrite-rejection assertion contradicts the existing wake/remap contract for session mappings"
    },
    {
      "body": "The test assumes that running `executeSidecarCommand(container.containerId, ...)` before either WebSocket connects will create a Chat A-specific event that should later appear in A's stream and not B's. The local code does not support that assumption. `executeSidecarCommand` is container-scoped and posts only `{ command, timeout, cwd? }` to `/sidecars/:id/terminals/commands`; it does not take `agentSessionId` or `sidecarSessionId`. Separately, pre-connection recovery is handled by an explicit sidecar messages fetch or `client.replay(0)` for existing sessions, not by a plain initial `subscribe` message. So the echoed marker is not guaranteed to be associated with Chat A or replayed to either connection. This check can therefore pass vacuously while still missing a real backlog-misrouting regression.",
      "category": "testing",
      "confidence": "high",
      "evidence": "The test injects the marker with `await executeSidecarCommand(container.containerId, \\`echo \"${marker}\"\\`, process.env);` before connecting (lines 442-449), then only sends `{\"type\":\"subscribe\",\"channels\":[\"*\"]}` on connect (lines 77-78). But `apps/web/src/lib/.server/orchestrator/terminal.ts:49-69` shows `executeSidecarCommand` is container-only, and `apps/web/src/ui-provider/DevContainerEventsProvider.tsx:1515-1545` plus `2207-2227` show backlog recovery uses `/api/sidecar/.../messages?agentSessionId=...` and `client.replay(0)` only for existing sessions, not automatic replay on initial subscribe.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 442,
      "severity": "medium",
      "title": "The marker-based contamination check is detached from any session-scoped backlog/replay path"
    },
    {
      "body": "The test asserts that the orchestrator rejects an attacker's WebSocket connection because 'the session mapping was registered for the victim's userId' (line 319-321). But the test's local `registerSessionMapping` helper (lines 116-131) sends `POST /session/:id/mapping` with body `{ sidecarSessionId, sidecarId }` \u2014 no userId is included.\n\nThe production `registerSessionMapping` in `client.ts` (lines 758-795) uses `PUT` (not POST) and passes `requesterUserId` via the `x-user-id` header. The test helper omits this entirely.\n\nThis means either: (a) the orchestrator has no userId to compare against, so the test will fail at runtime (wrong HTTP method or no user binding), catching a real gap but for the wrong reason; or (b) the orchestrator happens to reject the connection for a different reason (e.g., it doesn't validate userId at all and the test is aspirational). Either way, the test's stated security contract ('registered for the victim's userId') is not actually established by the test setup.",
      "category": "testing",
      "confidence": "high",
      "evidence": "Test helper (line 122): `POST /session/${agentSessionId}/mapping` with body `{ sidecarSessionId, sidecarId: containerId }` \u2014 no userId.\nProduction code (client.ts line 770-774): `PUT /session/${agentSessionId}/mapping` with header `x-user-id: requesterUserId`.\nAssertion at line 322: `expect(attackResult.connected).toBe(false)` \u2014 assumes userId-based rejection that was never set up.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 116,
      "severity": "high",
      "title": "Attack 2 test helper doesn't bind session mapping to userId \u2014 assertion may encode mistaken assumption"
    },
    {
      "body": "The test triggers `executeSidecarCommand(container.containerId, ...)` before connecting Chat A and Chat B, then asserts Chat B's events don't contain the marker string. But it never asserts that Chat A DID receive the marker. If the echo command's stdout doesn't flow through either WebSocket session's event stream (because `executeSidecarCommand` targets the container-level sidecar, not Chat A's specific sidecar session), then both sessions get zero marker events and the contamination check passes trivially.\n\nWithout asserting the positive case (`resultA.events` should contain the marker), this test cannot distinguish 'isolation works' from 'the marker was never delivered to anyone'.",
      "category": "testing",
      "confidence": "medium",
      "evidence": "Line 445: `executeSidecarCommand(container.containerId, ...)` \u2014 routes to container, not to Chat A's sidecar session `sidecarA`.\nLines 456-458: Only checks negative case: `const contaminated = chatBEventStrings.filter(s => s.includes('MESSAGE_FOR_CHAT_A_ONLY')); expect(contaminated).toHaveLength(0);`\nNo assertion that `resultA.events` contains the marker.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 456,
      "severity": "medium",
      "title": "Chat isolation contamination check (Attack 4) can pass vacuously \u2014 no positive assertion that Chat A received the marker"
    },
    {
      "body": "Attack 6 tests that a second `registerSessionMapping` call for the same agentSessionId returns false (line 528: `expect(attackerMapped).toBe(false)`). Like Attack 2, the test's helper doesn't send userId, so this tests whether the orchestrator rejects any overwrite of an existing mapping (idempotency guard), not whether it rejects overwrites from a different user specifically. The comment on line 524-526 says 'events meant for the victim would route to the attacker's sidecar' \u2014 but if the orchestrator simply allows overwrite from the same user (e.g., session reconnect), the attack wouldn't be caught.",
      "category": "testing",
      "confidence": "high",
      "evidence": "Line 521-525: `registerSessionMapping(victimSessionId, attackerSidecar!, container.containerId)` \u2014 no userId passed.\nProduction code passes `requesterUserId` via `x-user-id` header for authorization.\nThe test checks mapping-level rejection but the real security contract is user-level rejection.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 521,
      "severity": "medium",
      "title": "Attack 6 (session mapping overwrite) test also lacks userId binding \u2014 tests idempotency, not authorization"
    },
    {
      "body": "The old test intentionally exposed a real issue: calling `tokenIssuer.issue({ tier: 'enterprise' })` for a free user produces an enterprise-TTL token. The new test issues with `tier: 'free'` and confirms the TTL is 15 minutes \u2014 this only verifies the issuer's happy path. The comment at lines 476-478 acknowledges the real vulnerability is at the route level, which is good documentation, but the test no longer serves as a regression canary for the privilege escalation vector it's named after. The test title says 'Free tier token gets free tier TTL' which is honest, but it's filed under 'ATTACK 5: Privilege Escalation' which is misleading.",
      "category": "testing",
      "confidence": "high",
      "evidence": "Line 471: title says '\ud83d\udd34 MUST ENFORCE: Free tier token gets free tier TTL' but the attack section is 'Privilege Escalation'.\nLines 476-478 acknowledge: 'The real privilege-escalation vulnerability is at the route level... That requires a route-level integration test against /api/sessions/:id/token.'\nThe issuer is tested to do exactly what it's told \u2014 this is a unit test of correct behavior, not a security attack test.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 471,
      "severity": "low",
      "title": "Attack 5 privilege escalation test no longer exercises the actual vulnerability"
    },
    {
      "body": "The rewritten helper no longer uses an authentication-specific signal. In `onclose` it marks `connected` true whenever the close code is `1000`, and in the collection timer it calls `ws.close()` and immediately settles `{ connected: true }` before waiting for the close event. The new IDOR test then treats `attackResult.connected === false` as proof the attacker was rejected and `victimResult.connected === true` as proof the victim was accepted. That is not what this helper measures anymore: a session that never reaches `connection.established` can still be reported as connected if the server closes normally or emits any non-auth frame. A neighboring integration test already documents that the gateway can accept the upgrade and then reject the session with a close code like `4004`, so these assertions need a close code or explicit connection event, not this boolean.",
      "category": "testing",
      "confidence": "high",
      "evidence": "`ws.onclose = (event) => { settle({ connected: events.length > 0 || event.code === 1000, ... }) }` and `collectTimer = setTimeout(() => { ws.close(); settle({ connected: true, events }); }, timeoutMs - 500);` are used by `expect(attackResult.connected).toBe(false)` / `expect(victimResult.connected).toBe(true)` at lines 322-325. Compare with `apps/web/tests/integration/backend/token-lifecycle.test.ts:354-364`, which explicitly treats a post-upgrade `4004` close as the expected rejection path.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 94,
      "severity": "medium",
      "title": "`connected` now reports success for clean shutdowns and arbitrary frames, not authenticated sessions"
    },
    {
      "body": "The new contamination test assumes that emitting output before the websockets connect makes that output belong to Chat A and available for replay during the collection window. The code does not establish either property. It sends the marker with `executeSidecarCommand(container.containerId, ...)`, which is container-scoped rather than session-scoped, then only asserts that Chat B did not see the marker. There is no assertion that Chat A saw it. If pre-connection terminal output is dropped for both sessions, or if this command is not attributable to `chatA` at all, the test still passes while proving nothing about session isolation.",
      "category": "testing",
      "confidence": "high",
      "evidence": "The test does `await executeSidecarCommand(container.containerId, `echo \"${marker}\"`, process.env);` before `Promise.all([connectAndCollectEvents(tokenA.token, 5000), connectAndCollectEvents(tokenB.token, 5000)])`, then only checks `resultB.events` for contamination at lines 457-460. The helper it calls is container-only: `executeSidecarCommand(sidecarId: string, command: string, contextEnv = process.env, timeoutMs = 30000, cwd?: string, requesterUserId?: string)` in `apps/web/src/lib/.server/orchestrator/terminal.ts:49-56`, so no `sidecarSessionId` ties the marker to Chat A.",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 445,
      "severity": "medium",
      "title": "Cross-chat contamination case can pass without proving Chat A owned or even received the marker"
    },
    {
      "body": "The cross-chat contamination test (Attack 4) calls `executeSidecarCommand(container.containerId, ...)` to produce a marker event, then asserts Chat B's events don't contain the marker. But it never asserts Chat A's events DO contain the marker. `executeSidecarCommand` targets the container via HTTP (`/sidecars/:id/terminals/commands`), not a specific sidecar session's WebSocket stream. If the terminal output doesn't route to any WebSocket session \u2014 which is likely since the command is executed via HTTP, not through a session \u2014 then both chats receive zero events containing the marker, the `contaminated` array is empty, and the test passes while proving nothing about session isolation. The test needs a positive control: `expect(resultA.events.some(e => JSON.stringify(e).includes('MESSAGE_FOR_CHAT_A_ONLY'))).toBe(true)` to confirm the marker actually entered the event stream.",
      "category": "testing",
      "confidence": "high",
      "evidence": "const chatBEventStrings = resultB.events.map((e) => JSON.stringify(e));\nconst contaminated = chatBEventStrings.filter((s) => s.includes('MESSAGE_FOR_CHAT_A_ONLY'));\nexpect(contaminated).toHaveLength(0);\n// No assertion that resultA.events contains the marker \u2014 test passes with 0 events in both streams",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 458,
      "severity": "medium",
      "title": "Attack 4 cross-chat test passes vacuously \u2014 never asserts marker was received by Chat A"
    },
    {
      "body": "Line 96: `connected: events.length > 0 || event.code === 1000`. If the orchestrator (or an intermediate proxy) closes the WebSocket with code 1000 after rejecting authentication, `connected` would be `true` even though the session was never authenticated. Security tests for Attacks 1 and 2 assert `expect(attackResult.connected).toBe(false)` \u2014 these assertions would silently stop catching auth bypass regressions if the orchestrator's close behavior changes to use 1000 for rejected connections. This is not a regression (the old code had the same heuristic in onclose), but the new settle pattern now makes onclose's result more reliably delivered (no more race with the collect timer's `resolve`), which slightly increases the surface where this matters.",
      "category": "testing",
      "confidence": "medium",
      "evidence": "ws.onclose = (event) => {\n  settle({\n    connected: events.length > 0 || event.code === 1000,\n    events,\n    error: event.code !== 1000 ? `closed: ${event.code} ${event.reason}` : undefined,\n  });\n};",
      "file": "apps/web/tests/integration/security/session-security-attacks.test.ts",
      "line": 96,
      "severity": "low",
      "title": "onclose `connected` heuristic can report true for unauthenticated connections closing with code 1000"
    },
    {
      "body": "`beforeAll` now aborts the entire suite if `ensureProductRegistered()` cannot recreate the product, but this file still builds its admin auth header from `process.env.ADMIN_API_KEY` only. The repo's orchestrator tooling uses `ORCHESTRATOR_ADMIN_API_KEY` as the canonical variable (`scripts/register_orchestrator_product.sh`, `devscripts/commands/orchestrator.ts`), and another integration suite in this repo already had to fall back to `process.env.ORCHESTRATOR_ADMIN_API_KEY || process.env.ADMIN_API_KEY` for the same reason. In an environment where the admin port requires auth and only `ORCHESTRATOR_ADMIN_API_KEY` is set, this suite now sends no auth header, `POST /products` fails, and `beforeAll` throws before any tests run. The old null/skip path masked that mismatch; the new failure mode turns it into a hard regression.",
      "category": "regression",
      "confidence": "high",
      "evidence": "`const ADMIN_AUTH_HEADERS: Record<string, string> = process.env.ADMIN_API_KEY ? { Authorization: `Bearer ${process.env.ADMIN_API_KEY}` } : {};` together with `productCredentials = await ensureProductRegistered();` in `beforeAll` means missing `ADMIN_API_KEY` now fails the whole suite. By contrast, `scripts/register_orchestrator_product.sh` reads `ORCHESTRATOR_ADMIN_API_KEY`, `devscripts/commands/orchestrator.ts` reads `process.env.ORCHESTRATOR_ADMIN_API_KEY`, and `session-ownership-isolation.test.ts` uses `process.env.ORCHESTRATOR_ADMIN_API_KEY || process.env.ADMIN_API_KEY`.",
      "file": "apps/web/tests/integration/flows/token-endpoint-real.test.ts",
      "line": 39,
      "severity": "medium",
      "title": "Fail-fast bootstrap now depends on the wrong admin-key env var"
    },
    {
      "body": "The 'should reject token with wrong signing secret' test still has `if (!productCredentials) { console.warn(...); return; }` at line 278. With the new non-nullable types, this guard is dead code on the happy path \u2014 but more importantly, it's inconsistent with the rest of the suite where the equivalent guards were removed. If the intent is 'fail loud when setup fails', this test is the one exception that would still silently pass. It should be removed for consistency with the rest of the PR.",
      "category": "correctness",
      "confidence": "high",
      "evidence": "if (!productCredentials) {\n  console.warn('Product credentials not available - skipping');\n  return;\n}",
      "file": "apps/web/tests/integration/flows/token-endpoint-real.test.ts",
      "line": 278,
      "severity": "medium",
      "title": "Stale null guard on productCredentials silently skips one test"
    },
    {
      "body": "`tokenIssuer!.issue(...)` at line 467 uses a non-null assertion that was needed when the type was `ProductTokenIssuer | null`. Now that the type is `ProductTokenIssuer` (non-nullable), the `!` is unnecessary. Not a bug, but it's a lint smell and contradicts the purpose of the type change.",
      "category": "correctness",
      "confidence": "high",
      "evidence": "tokenIssuer!.issue({",
      "file": "apps/web/tests/integration/flows/token-endpoint-real.test.ts",
      "line": 467,
      "severity": "low",
      "title": "Unnecessary non-null assertion on tokenIssuer"
    },
    {
      "body": "`turbo.json` now makes every `typecheck` depend on `^build`, so `apps/web` will build all direct workspace dependencies before running `react-router typegen && tsc --noEmit`. That is broader than what downstream type resolution needs. In particular, `apps/web` declares `@blueprint-agent/context-lego` and `@blueprint/litellm-sdk` as workspace deps, but the checked TypeScript sources do not import them: the only `context-lego` reference in `apps/web` is `vite.config.ts`, and that file is explicitly excluded from `apps/web/tsconfig.json`. `@blueprint-agent/context-lego`'s `build` script also runs a non-type step (`npm run copy-data`). After this change, any failure in that packaging step will turn a pure typecheck job red even though `apps/web`'s TypeScript graph is unchanged. That makes `typecheck` a less reliable signal for 'does the type graph compile?' and surfaces unrelated build failures in workflows that previously only validated types.",
      "category": "operational",
      "confidence": "high",
      "evidence": "turbo.json:22 => \"dependsOn\": [\"^build\", \"^typecheck\"]; apps/web/package.json:48-50 declares \"@blueprint-agent/context-lego\", \"@blueprint-agent/stream-adapter\", and \"@blueprint/litellm-sdk\" as workspace deps; apps/web/tsconfig.json:6 excludes vite.config.ts and :33 path-maps \"@blueprint-agent/stream-adapter\" to local source; the only `@blueprint-agent/context-lego` usage under apps/web is vite.config.ts:284; packages/context-lego/package.json:8-9 => \"build\": \"tsc && npm run copy-data\".",
      "file": "turbo.json",
      "line": 22,
      "severity": "medium",
      "title": "`typecheck` now fails on unrelated upstream build steps, not just missing type artifacts"
    },
    {
      "body": "For several workspace packages, `build` and `typecheck` are the same compiler pass with emit toggled (`tsc` vs `tsc --noEmit`). With `typecheck` depending on both `^build` and `^typecheck`, a downstream check like `apps/web#typecheck` now schedules both tasks for `@blueprint/plans`, `@blueprint/message-parser`, and `@blueprint/litellm-sdk`. There is no correctness benefit from running both when the reason for adding `^build` is just to materialize `dist/*.d.ts`; it only increases CI time for commands that used to be the fast, type-only path.",
      "category": "operational",
      "confidence": "high",
      "evidence": "turbo.json:22 => \"dependsOn\": [\"^build\", \"^typecheck\"]; packages/plans/package.json:20-23 => \"build\": \"tsc -p tsconfig.json\", \"typecheck\": \"tsc -p tsconfig.json --noEmit\"; packages/message-parser/package.json:18-22 => \"build\": \"tsc\", \"typecheck\": \"tsc --noEmit\"; packages/litellm-sdk/package.json:18-22 => \"build\": \"tsc\", \"typecheck\": \"tsc --noEmit\".",
      "file": "turbo.json",
      "line": 22,
      "severity": "low",
      "title": "Upstream `typecheck` now does redundant double `tsc` work on every downstream run"
    }
  ],
  "coverage_gaps": [],
  "coverage_score": 100,
  "evidence_score": 100,
  "recommendation": "needs-work",
  "rejected_findings": [],
  "signal_score": 100,
  "status": "fail"
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment