Three variations, each covering a different angle. All draw from the same product (Workflow DevKit) and the same proof points, but target different reader motivations.
Format: Deep-dive (1200-1500 words) Primary keyword: durable AI streaming Audience: Teams shipping AI chat features who have hit streaming reliability problems
AI streaming breaks in ordinary ways. A tab reloads. A laptop sleeps. A network drops for three seconds. The model is still generating, but the user sees a stalled response and assumes the product is broken.
This is the gap between a demo and a production AI feature. And it is where most teams start writing glue code.
The default pattern is straightforward: open an HTTP connection, stream tokens from the model, render them in the UI. Three lines of code with the AI SDK and you have a working chat experience.
It works until the feature matters. Then the edge cases arrive. A dropped connection kills the in-progress response. A retry starts duplicate work on the server. A long-running agent call outlives the serverless function timeout. The team adds custom retry logic, reconnection handlers, and a separate observability system. Before long, the reliability layer is bigger than the feature it protects.
For AI products, that tradeoff compounds. Streaming is what makes the experience feel responsive. Brittleness underneath is what erodes user trust.
Workflow DevKit uses event sourcing to make workflow execution durable. Every step in a workflow produces events (step_created, step_started, step_completed, step_failed) that are persisted to an append-only log. When a workflow is interrupted and replayed, the framework reads the event log and skips already-completed steps, resuming from the exact point of interruption.
This architecture extends to streaming. Streams in Workflow DevKit are not ephemeral byte pipes tied to a single HTTP connection. They are backed by persistent storage — Redis on Vercel, filesystem locally — and identified by a run ID. The stream exists independently of the client connection. If the browser disappears, the workflow keeps running. When the client comes back, it reconnects to the same stream using the run ID and a startIndex parameter that tells the server where to resume.
The migration from ephemeral to durable streaming is three changes:
1. Move generation into a workflow function using DurableAgent.
DurableAgent replaces the AI SDK's standard Agent. It executes each LLM call as a durable workflow step with automatic retries (3 by default, configurable via step.maxRetries). Results are persisted to the event log, so completed calls are never repeated on replay.
import { DurableAgent } from "@workflow/ai/agent";
import { getWritable } from "workflow";
import type { UIMessageChunk } from "ai";
export async function chatWorkflow(messages: ModelMessage[]) {
"use workflow";
const agent = new DurableAgent({
model: "anthropic/claude-haiku-4.5",
system: "You are a helpful assistant.",
});
await agent.stream({
messages,
writable: getWritable<UIMessageChunk>(),
});
}The "use workflow" directive tells the build system to transform this function for durable execution. The getWritable() call returns a persistent stream attached to the workflow run — not to the HTTP response.
2. Return the run ID so the client can reconnect.
The API route starts the workflow and passes the run ID back in a response header:
const run = await start(chatWorkflow, [modelMessages]);
return createUIMessageStreamResponse({
stream: run.readable,
headers: {
"x-workflow-run-id": run.runId,
},
});A second endpoint handles reconnection by looking up the existing run and returning its stream from a specific position:
const run = getRun(id);
const stream = run.getReadable({ startIndex });3. Use WorkflowChatTransport on the client.
WorkflowChatTransport is a drop-in replacement for the AI SDK's default transport. It stores the run ID, detects when a stream is interrupted (no "finish" chunk received), and automatically reconnects using the startIndex of the last chunk the client received.
const { messages, sendMessage } = useChat({
transport: new WorkflowChatTransport({
api: "/api/chat",
onChatSendMessage: (response) => {
const runId = response.headers.get("x-workflow-run-id");
if (runId) localStorage.setItem("active-run-id", runId);
},
onChatEnd: () => localStorage.removeItem("active-run-id"),
}),
});After these three changes, the user can refresh the page mid-stream and the response picks up where it left off. The workflow does not restart. No data is lost.
The migration adds more than reconnection. Because every step produces events, Workflow DevKit includes built-in observability. The Web UI (npx workflow inspect runs --web) shows the full step trace for every run: status, duration, retry count, input, output, and errors. The CLI exposes the same data for scripting.
Retries are automatic. Steps retry up to 3 times by default. For external API calls that need backoff, RetryableError accepts a retryAfter duration. For errors that should not be retried, FatalError skips the retry queue and fails the step immediately. getStepMetadata() exposes the current attempt number for custom backoff logic.
For agent workflows that need human input — confirmations, approvals, content review — hooks pause execution and wait for an external signal before continuing. The createWebhook() helper generates a URL that the UI or an external service can POST to, resuming the workflow with a typed payload validated against a Zod schema.
Workflow DevKit runs locally with zero configuration. The Local World stores workflow data as JSON files in .workflow-data/, uses an in-memory queue, and serves the full Web UI and step debugger against local runs. No cloud account, no external services, no environment variables required.
This matters because long-running systems are hard to reason about when the only feedback loop is production. The step debugger shows execution state that standard console logs miss: steps that completed but returned unexpected data, steps that were retried and succeeded on the third attempt, streams that were written to but never closed.
The same code runs on the Local World during development and the Vercel World in production. The Vercel World uses Redis-backed streams, distributed queuing, and OIDC authentication, but the workflow code does not change. The "world" abstraction handles the infrastructure differences.
Workflow DevKit gives AI teams durable streaming, reconnectable clients, automatic retries, observability, and local debugging as part of a single migration.
- Workflow DevKit documentation
- Building Durable AI Agents guide
- Resumable Streams guide
- Flight Booking example app
Format: Thought leadership (800-1200 words) Primary keyword: local workflow development Audience: Engineering leads evaluating workflow/orchestration tools for AI features
There is a pattern in workflow tooling that keeps repeating. A team adopts a system for durable execution. The happy path works in a hosted dashboard. Then they try to debug a failure, and the only option is to deploy, wait for the error to surface, read the logs, and guess.
The debugging experience for long-running systems should not be worse than the debugging experience for a React component.
The hidden cost of remote-only workflows
Most orchestration tools work like this: you define workflows in code, deploy them to a managed service, and inspect runs through a web dashboard after the fact. Development means writing code locally, deploying it, triggering a run, and checking the dashboard to see what happened. The feedback loop is minutes, not seconds.
This creates a specific kind of technical debt. Teams stop writing small, focused workflows because the iteration cost is too high. Failures get debugged by reading logs rather than inspecting state. Edge cases in long-running flows go untested because reproducing them requires a deployed environment.
For AI workloads, the problem is sharper. An agent workflow might make 5-10 LLM calls in sequence, each with tool invocations that hit external APIs. A silent failure in step 7 of 10 is nearly impossible to catch without step-level visibility. And step-level visibility that only exists in production is step-level visibility you will not use during development.
Workflow DevKit runs the same execution model locally and in production. The framework uses "worlds" to abstract the infrastructure layer. The Local World stores events as JSON files in .workflow-data/, runs a queue in memory, and serves the full Web UI on your machine. The Vercel World uses Redis-backed streams, distributed queuing, and managed infrastructure. The workflow code is identical.
Local development includes:
Step-level debugging. The Web UI (npx workflow inspect runs --web) shows every run with its full step trace. Each step displays its status, duration, retry count, and the data it returned. If a step completed but returned unexpected data, you see it immediately. If a step was retried three times before succeeding, you see each attempt.
Stream inspection. For AI streaming workflows, the Web UI shows stream chunks as they are written. You can verify that the expected output was produced without adding logging to your code.
Event log visibility. Every workflow produces an append-only event log. Events follow a consistent format: run_created, step_started, step_completed, hook_received, and so on. Entity IDs use a 4-character prefix plus ULID (wrun_, step_, hook_), so events are lexicographically sortable by creation time.
Full retry simulation. Steps retry automatically (3 times by default). Locally, you can watch a step fail, retry, and eventually succeed or exhaust its attempts. This makes it possible to test error handling paths without deploying.
The framework uses event sourcing internally. Every state change in a workflow — starting a step, completing it, receiving a hook payload — produces an event that is persisted to the log. When a workflow needs to resume after interruption, the framework replays the event log, skipping completed steps and resuming from the point of interruption.
This is why local and production behavior match. The event log format is the same. The replay logic is the same. The difference is where the events are stored (filesystem vs Redis) and how the queue is managed (in-memory vs distributed). The workflow code itself is not aware of the difference.
Workflow functions carry a "use workflow" directive and run in a sandboxed environment that enforces determinism — a requirement for reliable replay. Step functions carry "use step" and run with full Node.js access. Parameters pass by value between the two contexts, so mutations in steps do not affect workflow state.
This separation means that workflow orchestration logic is predictable and replayable, while step execution logic has full access to the runtime. The framework handles the boundary.
AI agent workflows are inherently multi-step. A DurableAgent might make an LLM call, invoke a tool, wait for user approval via a hook, make another LLM call, and stream the result. Each of those operations is a step with its own retry behavior and observable state.
When you can inspect all of this locally — before deploying, before production traffic, before the first user hits the edge case — you build confidence in the system faster. You catch the step that silently returns an empty result. You see the retry that masks a flaky API. You verify that the stream reconnection path works by refreshing the browser during a local run.
Workflow tooling should shorten the path from "something is wrong" to "I can see exactly what happened." That path should not require a deploy.
Format: Tutorial (1000-1500 words) Primary keyword: resumable AI streams Audience: Developers building AI chat features who want to add reconnection to an existing app
You built an AI chat feature. It streams responses. Users like the experience. Then someone reports that refreshing the page loses the in-progress response. Someone else reports that switching networks on mobile kills it mid-sentence. A third report: serverless function timeout at 30 seconds, long agent responses cut off.
These are not bugs in your code. They are the default behavior of ephemeral streaming. And they are fixable without a rewrite.
Client ──HTTP connection──▶ Server (streaming tokens)
│
connection breaks
│
Client ──new request──▶ Server (starts from scratch)
The response is tied to the HTTP connection. When the connection breaks, the response is gone. The server may still be generating, but the client has no way to reconnect to the in-progress work.
Client ──HTTP connection──▶ Server (starts workflow run)
│ │
connection breaks run continues
│ │
Client ──reconnect via runId──▶ Server (resumes stream)
The response is tied to a workflow run, not to the HTTP connection. The run has a persistent stream backed by durable storage. If the connection breaks, the client reconnects using the run ID and picks up from the last chunk it received.
Take your existing AI route handler and move the generation into a workflow function. DurableAgent from @workflow/ai/agent replaces the AI SDK's standard Agent — it runs each LLM call as a durable step with automatic retries.
import { DurableAgent } from "@workflow/ai/agent";
import { getWritable } from "workflow";
import type { ModelMessage, UIMessageChunk } from "ai";
export async function chatWorkflow(messages: ModelMessage[]) {
"use workflow";
const writable = getWritable<UIMessageChunk>();
const agent = new DurableAgent({
model: "anthropic/claude-haiku-4.5",
system: "You are a helpful assistant.",
});
await agent.stream({ messages, writable });
}getWritable() returns a stream attached to the workflow run, not the HTTP response. This stream persists independently — on the Vercel World it is backed by Redis, locally it is stored in the filesystem.
Update your API route to start the workflow and pass the run ID back to the client:
import { convertToModelMessages, createUIMessageStreamResponse } from "ai";
import { start } from "workflow/api";
import { chatWorkflow } from "@/workflows/chat/workflow";
import type { UIMessage } from "ai";
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
const modelMessages = convertToModelMessages(messages);
const run = await start(chatWorkflow, [modelMessages]);
return createUIMessageStreamResponse({
stream: run.readable,
headers: { "x-workflow-run-id": run.runId },
});
}Then add a reconnection endpoint. This is the URL the client will hit after an interruption to resume the stream:
import { createUIMessageStreamResponse } from "ai";
import { getRun } from "workflow/api";
export async function GET(
request: Request,
{ params }: { params: Promise<{ id: string }> }
) {
const { id } = await params;
const { searchParams } = new URL(request.url);
const startIndexParam = searchParams.get("startIndex");
const startIndex = startIndexParam
? parseInt(startIndexParam, 10)
: undefined;
const run = getRun(id);
const stream = run.getReadable({ startIndex });
return createUIMessageStreamResponse({ stream });
}The startIndex parameter is key. It tells the server to skip chunks the client already received and stream only what was missed. No duplicate data.
WorkflowChatTransport is a drop-in replacement for the AI SDK's default transport. It stores the run ID from the initial response, detects interruptions (no "finish" chunk received), and automatically reconnects through the reconnection endpoint.
"use client";
import { useChat } from "@ai-sdk/react";
import { WorkflowChatTransport } from "@workflow/ai";
import { useMemo } from "react";
export default function ChatPage() {
const activeRunId = useMemo(() => {
if (typeof window === "undefined") return;
return localStorage.getItem("active-run-id") ?? undefined;
}, []);
const { messages, sendMessage } = useChat({
resume: Boolean(activeRunId),
transport: new WorkflowChatTransport({
api: "/api/chat",
onChatSendMessage: (response) => {
const runId = response.headers.get("x-workflow-run-id");
if (runId) localStorage.setItem("active-run-id", runId);
},
onChatEnd: () => localStorage.removeItem("active-run-id"),
prepareReconnectToStreamRequest: ({ api, ...rest }) => {
const runId = localStorage.getItem("active-run-id");
if (!runId) throw new Error("No active run ID");
return { ...rest, api: `/api/chat/${encodeURIComponent(runId)}/stream` };
},
maxConsecutiveErrors: 3,
}),
});
return (
<div>
{messages.map((m) => (
<div key={m.id}><strong>{m.role}:</strong> {m.content}</div>
))}
</div>
);
}Run the app locally, start a chat, and refresh the page mid-stream. The response continues from where it left off. Open the Workflow Web UI to see the run trace:
npx workflow inspect runs --webEach step shows its status, duration, retry attempts, and stream output. If something fails, you see it here before it reaches production.
This migration adds more than reconnection:
- Automatic retries. Each LLM call inside
DurableAgentretries up to 3 times by default. External API calls in tool steps retry independently. UseFatalErrorfor errors that should not be retried. - Observability. The Web UI shows every run with full step traces. The CLI (
npx workflow inspect runs) exposes the same data for scripting. No separate logging infrastructure required. - Human-in-the-loop. Use
createWebhook()to pause a workflow and wait for user approval, content review, or external signals. The webhook generates a URL that the UI can POST to with a typed payload. - Local debugging. The Local World runs with zero configuration. Same execution model as production. Same Web UI. Same step debugger.