Fix: node:worker_threads Worker Idle Termination (denoland/deno#23169)
Workers created with node:worker_threads are terminated when idle, even if they have ref'd transferable objects like MessagePort or SharedArrayBuffer that should keep them alive.
There are two independent keepalive decision systems, and that's the root problem:
System 1 — deno_core event loop (EventLoopPendingState::is_pending()):
Tracks refed ops, refed timers, refed immediates, libuv handles, background tasks, module evaluation, tick scheduling, external ops. When ALL are zero → returns Poll::Ready ("idle").
System 2 — JS-level bolt-on check (hasMessageEventListener()):
Called from Rust AFTER System 1 says "idle." Checks: does globalThis have "message" listeners AND is parentPort not unref'd, OR are there any refedMessagePortsCount > 0? If yes → override the idle verdict, return Poll::Pending.
Worker creation (ext/node/polyfills/worker_threads.ts:405):
closeOnIdle: true, // Hardcoded for all node workersRust event loop check (runtime/web_worker.rs:1032-1037):
if self.close_on_idle {
if self.has_message_event_listener() {
return Poll::Pending; // Override idle verdict
}
return Poll::Ready(Ok(())); // Worker exits
}JS secondary check (runtime/js/99_main.js:198-205):
function hasMessageEventListener() {
return (event.listenerCount(globalThis, "message") > 0 &&
!globalThis[messagePort.unrefParentPort]) ||
messagePort.refedMessagePortsCount > 0;
}parentPort recv always unrefed (runtime/js/99_main.js:219-220):
if (closeOnIdle) {
core.unrefOpPromise(recvMessage); // Always unref for node workers
}When has_message_event_listener() overrides to Poll::Pending at web_worker.rs:1034, the only registered waker is terminate_waker. The worker only re-wakes when a message arrives (completing the unrefed op_worker_recv_message) or when terminated. This happens to work today, but violates the Future::poll contract — the implementation relies on an unrefed op waking the event loop as a side effect.
It knows about message listeners and refedMessagePortsCount, but not about:
Atomics.waitAsync()on SharedArrayBuffer (returns a promise, not tracked as a refed op)- A worker that receives a MessagePort via
workerDatabut hasn't called.on('message')yet (async setup race) - Any future keepalive mechanism added to deno_core that this JS check doesn't know about
This inverts the correct design — in Node.js, parentPort is ref'd by default and keeps the worker alive. The Deno implementation unrefs it then uses a second system to pretend it's still ref'd.
They only set a boolean flag (unrefParentPort). They don't actually ref/unref the pending op_worker_recv_message promise. The boolean is only checked later by hasMessageEventListener(). Compare to transferred MessagePorts where [refMessagePort]() actually calls core.refOpPromise()/core.unrefOpPromise().
Single source of truth: EventLoopPendingState::is_pending() decides everything. No secondary JS check from Rust.
┌─────────────────────────────────┐
│ EventLoopPendingState │
│ │
│ refed ops (includes parentPort │
│ recv when parentPort is ref'd │
│ + transferred MessagePorts) │
│ refed timers │
│ refed immediates │
│ libuv handles │
│ external ops │
│ ... │
└──────────────┬───────────────────┘
│
is_pending()?
/ \
yes no
| |
Poll::Pending closeOnIdle?
/ \
yes no
| |
Poll::Ready panic/log
(worker exits) (bug)
No hasMessageEventListener() call. No JS→Rust cross-boundary check after the event loop decides. The event loop IS the decision.
In runtime/js/99_main.js:
let currentParentRecvPromise = null;
// Called when parentPort ref state changes
function updateParentPortRecvRef() {
if (!closeOnIdle || !currentParentRecvPromise) return;
const shouldBeRefed = event.listenerCount(globalThis, "message") > 0
&& !globalThis[messagePort.unrefParentPort];
if (shouldBeRefed) {
core.refOpPromise(currentParentRecvPromise);
} else {
core.unrefOpPromise(currentParentRecvPromise);
}
}
async function pollForMessages() {
// ...setup...
while (!isClosing) {
currentParentRecvPromise = op_worker_recv_message();
// Ref or unref based on current parentPort state
if (closeOnIdle) {
const shouldBeRefed = event.listenerCount(globalThis, "message") > 0
&& !globalThis[messagePort.unrefParentPort];
if (!shouldBeRefed) {
core.unrefOpPromise(currentParentRecvPromise);
}
// else: op stays refed (default), keeping event loop alive
}
const data = await currentParentRecvPromise;
currentParentRecvPromise = null;
// ...dispatch as before...
}
}Key change: the recv op is refed by default and only unrefed when parentPort has no listeners or is explicitly unref'd. This is the opposite of today where it's always unrefed.
In ext/node/polyfills/worker_threads.ts, change the parentPort ref/unref stubs to actually affect the recv op:
parentPort.unref = () => {
parentPort[unrefParentPort] = true;
// Tell 99_main.js to unref the current recv promise
internals.__updateParentPortRecvRef?.();
};
parentPort.ref = () => {
parentPort[unrefParentPort] = false;
// Tell 99_main.js to re-ref the current recv promise
internals.__updateParentPortRecvRef?.();
};And expose updateParentPortRecvRef via internals from 99_main.js:
internals.__updateParentPortRecvRef = updateParentPortRecvRef;When parentPort.on('message', ...) is called, it calls globalThis.addEventListener('message', ...). We need to detect this and re-ref/unref the recv promise. Intercept add/removeEventListener on globalThis inside the worker:
In 99_main.js bootstrapWorkerRuntime(), after setting up pollForMessages:
if (closeOnIdle) {
const origAddEventListener = globalThis.addEventListener;
const origRemoveEventListener = globalThis.removeEventListener;
globalThis.addEventListener = function(type, ...args) {
const result = origAddEventListener.call(this, type, ...args);
if (type === "message") {
updateParentPortRecvRef();
}
return result;
};
globalThis.removeEventListener = function(type, ...args) {
const result = origRemoveEventListener.call(this, type, ...args);
if (type === "message") {
updateParentPortRecvRef();
}
return result;
};
}In runtime/web_worker.rs, simplify poll_event_loop():
fn poll_event_loop(
&mut self,
cx: &mut Context,
poll_options: PollEventLoopOptions,
) -> Poll<Result<(), CoreError>> {
if self.internal_handle.terminate_if_needed() {
return Poll::Ready(Ok(()));
}
self.internal_handle.terminate_waker.register(cx.waker());
match self.js_runtime.poll_event_loop(cx, poll_options) {
Poll::Ready(r) => {
if self.internal_handle.terminate_if_needed() {
return Poll::Ready(Ok(()));
}
if let Err(e) = r {
return Poll::Ready(Err(e));
}
if self.close_on_idle {
// Event loop is truly idle — all refed ops, timers, handles
// are gone. Worker exits naturally, matching Node.js behavior.
return Poll::Ready(Ok(()));
}
// non-closeOnIdle workers (standard Deno web workers)
if self.worker_type == WorkerThreadType::Module {
panic!(
"coding error: either js is polling or the worker is terminated"
);
} else {
log::error!("classic worker terminated unexpectedly");
Poll::Ready(Ok(()))
}
}
Poll::Pending => Poll::Pending,
}
}- Remove
hasMessageEventListenerfunction from99_main.js - Remove
has_message_event_listener_fnfield fromWebWorkerstruct - Remove the
globalThis.hasMessageEventListener = ...line inbootstrapWorkerRuntime - Remove
has_message_event_listener()method fromimpl WebWorker
| Scenario | What keeps worker alive | Mechanism |
|---|---|---|
parentPort.on('message', fn) |
recv op is refed | pollForMessages refs op when listeners > 0 |
parentPort.on('message', fn); parentPort.unref() |
nothing — worker exits | updateParentPortRecvRef unrefs op |
| Transferred MessagePort with listener | port's op_message_port_recv_message is refed |
Existing 13_message_port.js ref logic (already works) |
setInterval(fn, 1000) |
refed timer | deno_core timer ref tracking (already works) |
| TCP server / net socket | libuv handle or refed op | Existing deno_core tracking (already works) |
| No listeners, no timers | nothing — worker exits | Event loop idle → Poll::Ready |
A worker that does:
await someAsyncSetup();
parentPort.on('message', handler);During someAsyncSetup(), the recv op is unrefed (no listeners yet), but the async op from someAsyncSetup() is refed, keeping the loop alive. When it resolves, JS runs synchronously to completion (including registering the listener), then the event loop polls again — now seeing the refed recv op.
No race condition. The event loop only polls after all synchronous JS and microtasks complete.
The main risk is subtle behavioral changes for workers that today rely on hasMessageEventListener() as a safety net. Specifically, a worker with parentPort.on('message') where the addEventListener call on globalThis doesn't get intercepted (e.g., monkey-patched addEventListener). The Step 3 intercept handles the common path; edge cases with direct EventTarget.prototype.addEventListener calls would need the ref to be managed from the parentPort.on() wrapper in worker_threads.ts as well.
- Existing spec tests — run
cargo test specsto verify no regressions - New spec test: worker stays alive with parentPort listener — worker with
parentPort.on('message')that receives a message 1 second later, verifying it doesn't exit early - New spec test: worker exits when idle — worker that does nothing, verifying it exits
- New spec test: parentPort.unref() allows exit — worker with listener but
parentPort.unref(), verifying it exits - New spec test: transferred MessagePort keeps worker alive — worker receives a port, listens on it, stays alive
- New spec test: worker with only timers stays alive — worker with
setInterval, no message listeners
| File | Change |
|---|---|
runtime/js/99_main.js |
Replace hasMessageEventListener with updateParentPortRecvRef, modify pollForMessages, intercept globalThis addEventListener |
runtime/web_worker.rs |
Remove has_message_event_listener() call and method, simplify poll_event_loop |
ext/node/polyfills/worker_threads.ts |
Wire parentPort.ref()/unref() to internals.__updateParentPortRecvRef |
tests/specs/node/worker_threads_idle_* |
New spec tests for all scenarios |