Skip to content

Instantly share code, notes, and snippets.

@bartlomieju
Created March 9, 2026 22:30
Show Gist options
  • Select an option

  • Save bartlomieju/474cc7f6a2bdc8cdd5e3c1e3af7c408f to your computer and use it in GitHub Desktop.

Select an option

Save bartlomieju/474cc7f6a2bdc8cdd5e3c1e3af7c408f to your computer and use it in GitHub Desktop.
Architecture: Fix node:worker_threads idle termination (denoland/deno#23169)

Fix: node:worker_threads Worker Idle Termination (denoland/deno#23169)

Problem

Workers created with node:worker_threads are terminated when idle, even if they have ref'd transferable objects like MessagePort or SharedArrayBuffer that should keep them alive.

Root Cause: Two Competing Keepalive Systems

There are two independent keepalive decision systems, and that's the root problem:

System 1 — deno_core event loop (EventLoopPendingState::is_pending()): Tracks refed ops, refed timers, refed immediates, libuv handles, background tasks, module evaluation, tick scheduling, external ops. When ALL are zero → returns Poll::Ready ("idle").

System 2 — JS-level bolt-on check (hasMessageEventListener()): Called from Rust AFTER System 1 says "idle." Checks: does globalThis have "message" listeners AND is parentPort not unref'd, OR are there any refedMessagePortsCount > 0? If yes → override the idle verdict, return Poll::Pending.

Key Code Paths

Worker creation (ext/node/polyfills/worker_threads.ts:405):

closeOnIdle: true,  // Hardcoded for all node workers

Rust event loop check (runtime/web_worker.rs:1032-1037):

if self.close_on_idle {
    if self.has_message_event_listener() {
        return Poll::Pending;  // Override idle verdict
    }
    return Poll::Ready(Ok(()));  // Worker exits
}

JS secondary check (runtime/js/99_main.js:198-205):

function hasMessageEventListener() {
  return (event.listenerCount(globalThis, "message") > 0 &&
    !globalThis[messagePort.unrefParentPort]) ||
    messagePort.refedMessagePortsCount > 0;
}

parentPort recv always unrefed (runtime/js/99_main.js:219-220):

if (closeOnIdle) {
  core.unrefOpPromise(recvMessage);  // Always unref for node workers
}

Why This Is Broken

1. Poll::Pending without a proper waker contract

When has_message_event_listener() overrides to Poll::Pending at web_worker.rs:1034, the only registered waker is terminate_waker. The worker only re-wakes when a message arrives (completing the unrefed op_worker_recv_message) or when terminated. This happens to work today, but violates the Future::poll contract — the implementation relies on an unrefed op waking the event loop as a side effect.

2. The JS check is an incomplete proxy for "alive handles"

It knows about message listeners and refedMessagePortsCount, but not about:

  • Atomics.waitAsync() on SharedArrayBuffer (returns a promise, not tracked as a refed op)
  • A worker that receives a MessagePort via workerData but hasn't called .on('message') yet (async setup race)
  • Any future keepalive mechanism added to deno_core that this JS check doesn't know about

3. pollForMessages() unconditionally unrefs its op, then Rust compensates

This inverts the correct design — in Node.js, parentPort is ref'd by default and keeps the worker alive. The Deno implementation unrefs it then uses a second system to pretend it's still ref'd.

4. parentPort.ref()/unref() are no-ops for the event loop

They only set a boolean flag (unrefParentPort). They don't actually ref/unref the pending op_worker_recv_message promise. The boolean is only checked later by hasMessageEventListener(). Compare to transferred MessagePorts where [refMessagePort]() actually calls core.refOpPromise()/core.unrefOpPromise().

Target Architecture

Single source of truth: EventLoopPendingState::is_pending() decides everything. No secondary JS check from Rust.

                    ┌─────────────────────────────────┐
                    │   EventLoopPendingState          │
                    │                                  │
                    │  refed ops (includes parentPort  │
                    │    recv when parentPort is ref'd │
                    │    + transferred MessagePorts)   │
                    │  refed timers                    │
                    │  refed immediates                │
                    │  libuv handles                   │
                    │  external ops                    │
                    │  ...                             │
                    └──────────────┬───────────────────┘
                                   │
                            is_pending()?
                              /       \
                           yes          no
                            |            |
                      Poll::Pending   closeOnIdle?
                                       /      \
                                     yes       no
                                      |         |
                              Poll::Ready    panic/log
                              (worker exits)  (bug)

No hasMessageEventListener() call. No JS→Rust cross-boundary check after the event loop decides. The event loop IS the decision.

Implementation Plan

Step 1: Make pollForMessages() respect parentPort ref state

In runtime/js/99_main.js:

let currentParentRecvPromise = null;

// Called when parentPort ref state changes
function updateParentPortRecvRef() {
  if (!closeOnIdle || !currentParentRecvPromise) return;
  const shouldBeRefed = event.listenerCount(globalThis, "message") > 0
    && !globalThis[messagePort.unrefParentPort];
  if (shouldBeRefed) {
    core.refOpPromise(currentParentRecvPromise);
  } else {
    core.unrefOpPromise(currentParentRecvPromise);
  }
}

async function pollForMessages() {
  // ...setup...
  while (!isClosing) {
    currentParentRecvPromise = op_worker_recv_message();

    // Ref or unref based on current parentPort state
    if (closeOnIdle) {
      const shouldBeRefed = event.listenerCount(globalThis, "message") > 0
        && !globalThis[messagePort.unrefParentPort];
      if (!shouldBeRefed) {
        core.unrefOpPromise(currentParentRecvPromise);
      }
      // else: op stays refed (default), keeping event loop alive
    }

    const data = await currentParentRecvPromise;
    currentParentRecvPromise = null;
    // ...dispatch as before...
  }
}

Key change: the recv op is refed by default and only unrefed when parentPort has no listeners or is explicitly unref'd. This is the opposite of today where it's always unrefed.

Step 2: Wire parentPort.ref()/unref() to the event loop

In ext/node/polyfills/worker_threads.ts, change the parentPort ref/unref stubs to actually affect the recv op:

parentPort.unref = () => {
  parentPort[unrefParentPort] = true;
  // Tell 99_main.js to unref the current recv promise
  internals.__updateParentPortRecvRef?.();
};
parentPort.ref = () => {
  parentPort[unrefParentPort] = false;
  // Tell 99_main.js to re-ref the current recv promise
  internals.__updateParentPortRecvRef?.();
};

And expose updateParentPortRecvRef via internals from 99_main.js:

internals.__updateParentPortRecvRef = updateParentPortRecvRef;

Step 3: Hook listener add/remove on globalThis

When parentPort.on('message', ...) is called, it calls globalThis.addEventListener('message', ...). We need to detect this and re-ref/unref the recv promise. Intercept add/removeEventListener on globalThis inside the worker:

In 99_main.js bootstrapWorkerRuntime(), after setting up pollForMessages:

if (closeOnIdle) {
  const origAddEventListener = globalThis.addEventListener;
  const origRemoveEventListener = globalThis.removeEventListener;

  globalThis.addEventListener = function(type, ...args) {
    const result = origAddEventListener.call(this, type, ...args);
    if (type === "message") {
      updateParentPortRecvRef();
    }
    return result;
  };

  globalThis.removeEventListener = function(type, ...args) {
    const result = origRemoveEventListener.call(this, type, ...args);
    if (type === "message") {
      updateParentPortRecvRef();
    }
    return result;
  };
}

Step 4: Remove the secondary JS check from Rust

In runtime/web_worker.rs, simplify poll_event_loop():

fn poll_event_loop(
  &mut self,
  cx: &mut Context,
  poll_options: PollEventLoopOptions,
) -> Poll<Result<(), CoreError>> {
  if self.internal_handle.terminate_if_needed() {
    return Poll::Ready(Ok(()));
  }

  self.internal_handle.terminate_waker.register(cx.waker());

  match self.js_runtime.poll_event_loop(cx, poll_options) {
    Poll::Ready(r) => {
      if self.internal_handle.terminate_if_needed() {
        return Poll::Ready(Ok(()));
      }
      if let Err(e) = r {
        return Poll::Ready(Err(e));
      }
      if self.close_on_idle {
        // Event loop is truly idle — all refed ops, timers, handles
        // are gone. Worker exits naturally, matching Node.js behavior.
        return Poll::Ready(Ok(()));
      }
      // non-closeOnIdle workers (standard Deno web workers)
      if self.worker_type == WorkerThreadType::Module {
        panic!(
          "coding error: either js is polling or the worker is terminated"
        );
      } else {
        log::error!("classic worker terminated unexpectedly");
        Poll::Ready(Ok(()))
      }
    }
    Poll::Pending => Poll::Pending,
  }
}

Step 5: Clean up dead code

  • Remove hasMessageEventListener function from 99_main.js
  • Remove has_message_event_listener_fn field from WebWorker struct
  • Remove the globalThis.hasMessageEventListener = ... line in bootstrapWorkerRuntime
  • Remove has_message_event_listener() method from impl WebWorker

Correctness Analysis

Scenario What keeps worker alive Mechanism
parentPort.on('message', fn) recv op is refed pollForMessages refs op when listeners > 0
parentPort.on('message', fn); parentPort.unref() nothing — worker exits updateParentPortRecvRef unrefs op
Transferred MessagePort with listener port's op_message_port_recv_message is refed Existing 13_message_port.js ref logic (already works)
setInterval(fn, 1000) refed timer deno_core timer ref tracking (already works)
TCP server / net socket libuv handle or refed op Existing deno_core tracking (already works)
No listeners, no timers nothing — worker exits Event loop idle → Poll::Ready

Edge Case: Race Between Script Execution and Event Loop

A worker that does:

await someAsyncSetup();
parentPort.on('message', handler);

During someAsyncSetup(), the recv op is unrefed (no listeners yet), but the async op from someAsyncSetup() is refed, keeping the loop alive. When it resolves, JS runs synchronously to completion (including registering the listener), then the event loop polls again — now seeing the refed recv op.

No race condition. The event loop only polls after all synchronous JS and microtasks complete.

Migration Risk

The main risk is subtle behavioral changes for workers that today rely on hasMessageEventListener() as a safety net. Specifically, a worker with parentPort.on('message') where the addEventListener call on globalThis doesn't get intercepted (e.g., monkey-patched addEventListener). The Step 3 intercept handles the common path; edge cases with direct EventTarget.prototype.addEventListener calls would need the ref to be managed from the parentPort.on() wrapper in worker_threads.ts as well.

Testing Strategy

  1. Existing spec tests — run cargo test specs to verify no regressions
  2. New spec test: worker stays alive with parentPort listener — worker with parentPort.on('message') that receives a message 1 second later, verifying it doesn't exit early
  3. New spec test: worker exits when idle — worker that does nothing, verifying it exits
  4. New spec test: parentPort.unref() allows exit — worker with listener but parentPort.unref(), verifying it exits
  5. New spec test: transferred MessagePort keeps worker alive — worker receives a port, listens on it, stays alive
  6. New spec test: worker with only timers stays alive — worker with setInterval, no message listeners

Files Changed

File Change
runtime/js/99_main.js Replace hasMessageEventListener with updateParentPortRecvRef, modify pollForMessages, intercept globalThis addEventListener
runtime/web_worker.rs Remove has_message_event_listener() call and method, simplify poll_event_loop
ext/node/polyfills/worker_threads.ts Wire parentPort.ref()/unref() to internals.__updateParentPortRecvRef
tests/specs/node/worker_threads_idle_* New spec tests for all scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment