Three variations targeting different reader motivations. All technically grounded in Workflow DevKit documentation and following Vercel blog editorial conventions.
Format: Deep-dive Primary keyword: durable AI streaming
AI streaming breaks when users do ordinary things. A tab reloads. A laptop sleeps. A mobile network switches towers. The model keeps generating, but the user sees a stalled response and assumes the product failed.
We built Vercel Workflow to close this gap. Workflow DevKit makes AI streams durable — the work keeps running when the client disconnects, and the client reconnects without starting over. One team migrated an existing AI chat app from ephemeral to durable streaming and picked up automatic retries, observability, and local debugging in the process.
The default AI SDK pattern ties the response to a single HTTP connection. Open a connection, stream tokens, render them. Three lines of code and you have a working chat experience.
The failure modes surface once the feature has real users:
- Page refresh kills the in-progress response. The server may still be generating, but the client has no way back in.
- Network interruption on mobile — switching from Wi-Fi to cellular, entering a tunnel — drops the connection permanently.
- Serverless function timeout at 30 seconds cuts off long agent responses mid-stream.
- Retries require custom code and risk duplicate work on the server.
Each of these forces teams to build custom recovery infrastructure. Before long, the reliability layer is bigger than the feature it protects.
{TODO: image — side-by-side diagram showing ephemeral streaming (connection breaks, response lost) vs durable streaming (connection breaks, run continues, client reconnects)}
Workflow DevKit uses event sourcing to make execution durable. Every step produces events (step_created, step_started, step_completed) persisted to an append-only log. When a workflow is interrupted and replayed, the framework reads the log, skips completed steps, and resumes from the exact point of interruption.
Streams are part of this model. They are backed by persistent storage — Redis on Vercel, filesystem locally — and identified by a run ID. The stream exists independently of the HTTP connection. If the browser disappears, the workflow continues. When the client reconnects, it picks up from the last chunk it received.
The migration is three changes to an existing AI SDK app.
DurableAgent replaces the AI SDK's standard Agent. It executes each LLM call as a durable step with automatic retries — 3 by default, configurable per function. Completed calls are never repeated on replay.
The "use workflow" directive tells the build system to transform the function for durable execution. getWritable() returns a persistent stream attached to the workflow run, not to the HTTP response.
import { DurableAgent } from "@workflow/ai/agent";
import { getWritable } from "workflow";
import type { ModelMessage, UIMessageChunk } from "ai";
export async function chatWorkflow(messages: ModelMessage[]) {
"use workflow";
const agent = new DurableAgent({
model: "anthropic/claude-haiku-4.5",
system: "You are a helpful assistant.",
});
await agent.stream({
messages,
writable: getWritable<UIMessageChunk>(),
});
}The API route starts the workflow and passes the run ID back in a response header. A second endpoint handles reconnection by looking up the existing run and returning its stream from a specific position using startIndex:
import { convertToModelMessages, createUIMessageStreamResponse } from "ai";
import { start } from "workflow/api";
export async function POST(req: Request) {
const { messages } = await req.json();
const run = await start(chatWorkflow, [convertToModelMessages(messages)]);
return createUIMessageStreamResponse({
stream: run.readable,
headers: { "x-workflow-run-id": run.runId },
});
}WorkflowChatTransport is a drop-in replacement for the AI SDK's default transport. It stores the run ID, detects when a "finish" chunk is missing (indicating an interrupted stream), and automatically reconnects using the startIndex of the last chunk the client received.
import { useChat } from "@ai-sdk/react";
import { WorkflowChatTransport } from "@workflow/ai";
const { messages, sendMessage } = useChat({
resume: Boolean(activeRunId),
transport: new WorkflowChatTransport({
api: "/api/chat",
onChatSendMessage: (response) => {
const runId = response.headers.get("x-workflow-run-id");
if (runId) localStorage.setItem("active-run-id", runId);
},
onChatEnd: () => localStorage.removeItem("active-run-id"),
prepareReconnectToStreamRequest: ({ api, ...rest }) => {
const runId = localStorage.getItem("active-run-id");
if (!runId) throw new Error("No active run ID");
return { ...rest, api: `/api/chat/${encodeURIComponent(runId)}/stream` };
},
}),
});After these three changes, users can refresh the page mid-stream and the response picks up where it left off.
The migration adds more than reconnection. Because every step produces events, we include built-in observability without any additional logging infrastructure.
Retries are automatic. Steps retry up to 3 times by default. For external APIs that need backoff, throw a RetryableError with a retryAfter duration. For errors that should not be retried, FatalError fails the step immediately. getStepMetadata() exposes the current attempt number for custom exponential backoff.
The Web UI shows the full step trace. Run npx workflow inspect runs --web to see every run with its step status, duration, retry count, and stream output. The CLI exposes the same data for scripting.
{TODO: image — screenshot of Workflow Web UI showing a run with step trace, including a retried step}
Human-in-the-loop is built in. For agent workflows that need confirmations or approvals, hooks pause execution and wait for external input. Use defineHook() with a Zod schema for type-safe payloads.
We built Workflow DevKit to run locally with zero configuration. The Local World stores events as JSON files, runs a queue in memory, and serves the full Web UI against local runs. No cloud account required.
This matters because long-running systems are hard to reason about when the only feedback loop is production. The step debugger shows state that console logs miss: steps that completed but returned unexpected data, streams that were written to but never closed, retries that masked a flaky external API.
The same workflow code runs on the Local World during development and the Vercel World in production. The infrastructure differs — Redis vs filesystem, distributed queue vs in-memory — but the workflow code does not change.
- Building Durable AI Agents guide — full walkthrough from AI SDK to DurableAgent
- Resumable Streams guide — step-by-step WorkflowChatTransport setup
- Flight Booking example — production-ready reference app
Format: Thought leadership Primary keyword: local workflow debugging
There is a pattern in workflow tooling that keeps repeating. A team adopts a system for durable execution. The happy path works in a hosted dashboard. Then a failure happens, and the only debugging option is: deploy, trigger the error, read the logs, guess.
We think the debugging experience for long-running AI workflows should match what you already expect from frontend development: inspect state on your machine, reproduce the problem locally, fix it, and verify — all before it reaches production.
Most orchestration tools separate development from execution. You write workflows locally. You deploy them to a managed service. You inspect runs through a remote dashboard.
The feedback loop is minutes, not seconds. Teams stop writing small, focused workflows because the iteration cost is too high. Failures get debugged by reading logs rather than inspecting state. Edge cases in long-running flows go untested because reproducing them requires a deployed environment.
For AI agent workloads, this is worse. A DurableAgent workflow might make 5-10 LLM calls in sequence, each with tool invocations that hit external APIs. A silent failure in step 7 of 10 is nearly impossible to catch without step-level visibility. And step-level visibility that exists only in production is visibility you will not use during development.
Workflow DevKit runs the same execution model locally and in production. We use a "world" abstraction to separate workflow logic from infrastructure. The Local World stores events as JSON files in .workflow-data/, runs a queue in memory, and serves the full Web UI on your machine. The Vercel World uses Redis-backed streams, distributed queuing, and OIDC authentication. Your workflow code is identical in both.
What local development gives you:
Step-level debugging. The Web UI shows every run with its full step trace. Each step displays its status, duration, retry count, and the data it returned. If a step completed but returned unexpected data, you see it immediately.
{TODO: image — Web UI showing a step trace with a failed step highlighted, retry count visible}
Stream chunk inspection. For AI streaming workflows, the Web UI shows stream chunks as they are written. You can verify output without adding logging.
Retry simulation. Steps retry up to 3 times by default. Locally, you can watch a step fail, retry with exponential backoff, and either succeed or exhaust its attempts — testing error handling paths without deploying.
Event log visibility. Every workflow produces an append-only log following the event sourcing model. Events use a consistent format (run_created, step_started, step_completed, hook_received) with ULID-based entity IDs (wrun_, step_, hook_) that are lexicographically sortable by creation time.
The architecture enforces a clean boundary. Workflow functions carry the "use workflow" directive and run in a sandboxed environment that enforces determinism — required for reliable replay. Step functions carry "use step" and run with full Node.js access. Parameters pass by value between the two contexts.
This separation means orchestration logic is predictable and replayable, while step execution has full access to the runtime — npm packages, fetch, databases, external APIs. The framework handles the boundary automatically through a code transform at build time.
When the framework replays a workflow after interruption, it reads the event log, skips completed steps, and resumes from the point of interruption. This is why local and production behavior match: the event log format is the same, the replay logic is the same. Only the storage backend differs.
AI agent workflows are inherently multi-step. A DurableAgent might make an LLM call, invoke a tool, wait for user approval via a hook, make another LLM call, and stream the result. Each operation is a step with its own retry behavior and observable state.
When you can inspect all of this on your machine — catch the step that silently returns an empty result, see the retry masking a flaky API, verify stream reconnection by refreshing the browser during a local run — you ship with more confidence and fewer surprises.
We are expanding the observability surface to include distributed tracing across workflows and cost attribution per step for LLM calls. The goal: the same visibility you expect from application monitoring, applied to every step in a durable workflow.
Format: Tutorial Primary keyword: resumable AI streams
Your AI chat feature streams responses. Users like it. Then the bug reports arrive: refreshing the page loses the in-progress response, switching networks on mobile kills it mid-sentence, and long agent responses get cut off by serverless function timeouts.
These are not bugs in your app. They are the default behavior of ephemeral streaming — the response dies with the HTTP connection. With Vercel Workflow, you can make those streams resumable in three steps, without rewriting your app.
Client ──HTTP connection──▶ Server (streaming tokens)
│
connection breaks
│
Client ──new request──▶ Server (starts from scratch)
The response is tied to the HTTP connection. When it breaks, the response is gone. The model may still be generating, but the client has no way back in.
Client ──HTTP connection──▶ Server (starts durable workflow run)
│ │
connection breaks run continues (backed by persistent stream)
│ │
Client ──reconnect via runId──▶ Server (resumes from last chunk)
The response is tied to a workflow run with a persistent stream. The client reconnects using the run ID and a startIndex that skips chunks it already received. No duplicate data, no restart.
{TODO: image — browser refreshing mid-stream, response continuing after reload}
Move your AI generation into a workflow function using DurableAgent. It replaces the AI SDK's standard Agent and runs each LLM call as a durable step with automatic retries (3 by default).
getWritable() returns a persistent stream attached to the workflow run. On Vercel, this stream is backed by Redis. Locally, it is stored in the filesystem. Either way, it survives client disconnects.
import { DurableAgent } from "@workflow/ai/agent";
import { getWritable } from "workflow";
import type { ModelMessage, UIMessageChunk } from "ai";
export async function chatWorkflow(messages: ModelMessage[]) {
"use workflow";
const writable = getWritable<UIMessageChunk>();
const agent = new DurableAgent({
model: "anthropic/claude-haiku-4.5",
system: "You are a helpful assistant.",
});
await agent.stream({ messages, writable });
}Update your API route to start the workflow and return the run ID in a response header. Then add a second endpoint that returns the stream for an existing run, starting from a specific chunk index:
import type { UIMessage } from "ai";
import { convertToModelMessages, createUIMessageStreamResponse } from "ai";
import { start } from "workflow/api";
import { chatWorkflow } from "@/workflows/chat/workflow";
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
const modelMessages = convertToModelMessages(messages);
const run = await start(chatWorkflow, [modelMessages]);
return createUIMessageStreamResponse({
stream: run.readable,
headers: { "x-workflow-run-id": run.runId },
});
}import { createUIMessageStreamResponse } from "ai";
import { getRun } from "workflow/api";
export async function GET(
request: Request,
{ params }: { params: Promise<{ id: string }> }
) {
const { id } = await params;
const { searchParams } = new URL(request.url);
const startIndexParam = searchParams.get("startIndex");
const startIndex = startIndexParam
? parseInt(startIndexParam, 10)
: undefined;
const run = getRun(id);
const stream = run.getReadable({ startIndex });
return createUIMessageStreamResponse({ stream });
}The startIndex parameter tells the server to skip chunks the client already received.
WorkflowChatTransport is a drop-in replacement for the default AI SDK transport. It stores the run ID from the initial response, detects when a stream is interrupted (no "finish" chunk received), and automatically reconnects through the reconnection endpoint.
"use client";
import { useChat } from "@ai-sdk/react";
import { WorkflowChatTransport } from "@workflow/ai";
import { useMemo } from "react";
export default function ChatPage() {
const activeRunId = useMemo(() => {
if (typeof window === "undefined") return;
return localStorage.getItem("active-run-id") ?? undefined;
}, []);
const { messages, sendMessage } = useChat({
resume: Boolean(activeRunId),
transport: new WorkflowChatTransport({
api: "/api/chat",
onChatSendMessage: (response) => {
const runId = response.headers.get("x-workflow-run-id");
if (runId) localStorage.setItem("active-run-id", runId);
},
onChatEnd: () => localStorage.removeItem("active-run-id"),
prepareReconnectToStreamRequest: ({ api, ...rest }) => {
const runId = localStorage.getItem("active-run-id");
if (!runId) throw new Error("No active run ID");
return {
...rest,
api: `/api/chat/${encodeURIComponent(runId)}/stream`,
};
},
}),
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
</div>
);
}Run the app locally, start a chat, and refresh the page mid-stream. The response continues from where it left off. Open the Workflow Web UI to see the run trace:
npx workflow inspect runs --webEach step shows its status, duration, retry attempts, and stream output.
{TODO: image — Workflow Web UI showing a completed run with step trace and stream output}
This is not a reconnection-only change. Because every step in a workflow produces durable events, the migration adds:
- Automatic retries. Each LLM call inside
DurableAgentretries up to 3 times. For rate-limited external APIs, throwRetryableErrorwith aretryAfterduration. For permanent failures,FatalErrorskips retries. - Observability without extra infrastructure. The Web UI and CLI show full step traces for every run. Use
--backend vercelto inspect production runs remotely. - Human-in-the-loop. Use
defineHook()with a Zod schema to pause a workflow for user approval or content review, then resume with a typed payload. - Local debugging. The Local World runs with zero configuration — same execution model, same Web UI, same step debugger. No cloud account required.
- Resumable Streams guide — full WorkflowChatTransport walkthrough
- Building Durable AI Agents — from AI SDK to DurableAgent
- Flight Booking example — production-ready reference app with resumable streams