Skip to content

Instantly share code, notes, and snippets.

@codenulls
Created November 6, 2025 08:43
Show Gist options
  • Select an option

  • Save codenulls/19f745cea1300e7363622245c5486471 to your computer and use it in GitHub Desktop.

Select an option

Save codenulls/19f745cea1300e7363622245c5486471 to your computer and use it in GitHub Desktop.
Tool call race condition analysis

The original code's strategy of intercepting the finishChunk was absolutely correct in principle. The failure was in the mechanism it used to decide when to release that chunk. It was a manual, fragile system that broke under specific timing conditions.

Think of it like a relay race with two runners who need to finish at the same time.

  • Runner A: The Tool Executor. Its job is to run the tool and report back.
  • Runner B: The Model Stream. Its job is to send all its data and then signal that it's done.
  • The Finish Line: The attemptClose() function, which is supposed to end the step.
  • The Rule: The step can only end when both runners are at the finish line.

The problem is that the two runners were not properly synchronized. They were checking for each other's status using separate flags (canClose and outstandingToolResults.size).

The Race Condition: Step-by-Step Breakdown of the Failure

Here is the exact sequence of events for your Step 24, where the tool was extremely fast:

  1. The Race Starts: The model streams a tool-call for set_task_status.

    • Runner A (Tool Executor) immediately starts running. outstandingToolResults.size becomes 1.
    • Runner B (Model Stream) continues sending its (very short) stream of data.
  2. Runner A Finishes First (The Upset): Your set_task_status tool is so fast that it finishes its work almost instantly.

    • Its promise resolves. The .then() block runs.
    • It decrements the counter: outstandingToolResults.delete(...). The size is now 0.
    • It calls attemptClose().
  3. The First Check Fails: Inside attemptClose(), the code checks the rule: if (canClose && outstandingToolResults.size === 0).

    • outstandingToolResults.size is indeed 0.
    • BUT Runner B (Model Stream) hasn't finished its final internal processing yet, so it hasn't set canClose to true.
    • The check becomes if (false && true), which is false.
    • The code does nothing. Runner A has finished, reported in, but the finish line judge says "Not yet, I'm still waiting for the other guy."

    This is the key moment of desynchronization. The system now incorrectly believes there are no tools running, but it hasn't yet received the signal that the model is truly done.

  4. Runner B Finishes Second: A few milliseconds later, the Model Stream finishes all its work and its flush() method is called.

    • It sets its flag: canClose = true.
    • It calls attemptClose() again.
  5. The Second Check Succeeds Prematurely: Inside attemptClose(), the code checks the rule again: if (canClose && outstandingToolResults.size === 0).

    • canClose is now true.
    • outstandingToolResults.size is already 0 from when the super-fast tool finished moments ago.
    • The check becomes if (true && true), which is true.
    • The code declares the step over! It calls toolResultsStreamController!.enqueue(finishChunk) and toolResultsStreamController!.close().

The Consequence: Why It Skips onStepFinish

This premature closure is catastrophic for the final step.

The part of the AI SDK that is responsible for gathering all the step's information (the final text, the tool results, the usage stats) and calling your onStepFinish callback is listening to this stream.

When the stream closes this early, it signals to the listener, "That's it, this step is over, nothing more to see here!"

The listener then looks at the work it has processed so far. Because the stream closed before the result from the fast tool could be fully processed and aggregated, the listener doesn't have all the pieces needed to construct the final StepResult. More importantly, the main control loop in stream-text.ts sees the finish event and decides the entire multi-step process is complete, terminating everything before the logic that calls onStepFinish for that final step gets a chance to run.

The system essentially packed up and went home before the scorekeepers could write down the final score of the game.

How the Fix Solves This

The fix with Promise.all replaces this fragile, two-runner system with a single, synchronized gate.

  • It collects promises for all the tools that start (Runner A, Runner C, Runner D...).
  • The flush method (Runner B) arrives at the finish line and then simply waits at the gate (await Promise.all(...)).
  • The gate only opens when every single tool promise has resolved.
  • Only then does it proceed to enqueue(finishChunk) and close the stream.

This makes the timing irrelevant. The process is now deterministic: the model finishes, then we wait for all tools, then we close the step. The race is eliminated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment