vanbasten23’s gists

vanbasten23 / gist:03a9580d499fa3205ee30762e527b528

Created January 23, 2026 04:08

	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] Traceback (most recent call last):
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 926, in run_engine_core
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] engine_core = EngineCoreProc(args, engine_index=dp_rank, *kwargs)
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 691, in __init__
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] super().__init__(
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 105, in __init__
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] self.model_executor = executor_class(vllm_config)
	(EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935]

vanbasten23 / gist:e09f6c5388fade15ee035228d4995fc9

Created January 16, 2026 00:34

	tests/layers/vllm/test_unquantized.py::test_fused_moe[False-silu-False-2-8-128-1024-8-1-True] FAILED
	=================================== FAILURES ===================================
	____________ test_fused_moe[False-silu-False-2-8-128-1024-8-1-True] ____________
	use_ep = True, num_devices = 1, num_tokens = 8, intermediate_size = 1024
	hidden_size = 128, num_experts = 8, topk = 2, has_bias = False
	activation = 'silu', enable_attn_dp = False
	@pytest.mark.parametrize("use_ep", [True, False])
	@pytest.mark.parametrize("num_devices", [1, jax.local_device_count()])
	@pytest.mark.parametrize("num_tokens", [8])
	@pytest.mark.parametrize("intermediate_size", [1024, 2048])

vanbasten23 / gist:abc3b2db22748e1f16aa0e92c6b91404

Created December 4, 2025 18:29

	exec ${PAGER:-/usr/bin/less -R} "$0" \|\| exit 1

	Test settings: forge with network access
	Host details: itmm4.prod.google.com Linux 6.6.65-smp-1300.170.0.0 x86_64 astoria-genoa-base
	executor.INFO: analog/view?storage=borgremote&bns=/bns/it/borg/it/bns/build-forge-executor-tpu/prod-cbf-ghostlite.forge-executor/0&min_time=1764872604000000&ts=1764872614000000
	Test command:
	cd /build/work/aef67bf50706fee86777a93cc065340a246c/google3/runfiles/google3 && \
	env - \
	BORG_CELL=it \
	CUSTOM_METRICS_DIR=/build/work/aef67bf50706fee86777a93cc065340a246c/google3/../custom_metrics \

vanbasten23 / gist:42e056748dabb38e81d34bb3d3cc19b3

Created November 19, 2025 18:31

	Let's trace the values for my_id = 1 with num_devices = 4:

	outer_step phase Accumulation Source left_copy_device right_copy_device Device providing the data
	0 LEFT x_ref[left_copy_device, ...] (1+0+1)%4 = 2 (1-0-1)%4 = 0 Device 2
	0 RIGHT x_ref[right_copy_device, ...] (1+0+1)%4 = 2 (1-0-1)%4 = 0 Device 0
	1 LEFT x_ref[left_copy_device, ...] (1+1+1)%4 = 3 (1-1-1)%4 = 3 Device 3
	1 RIGHT x_ref[right_copy_device, ...] (1+1+1)%4 = 3 (1-1-1)%4 = 3 Device 3
	2 LEFT x_ref[left_copy_device, ...] (1+2+1)%4 = 0 (1-2-1)%4 = 2 Device 0
	2 RIGHT x_ref[right_copy_device, ...] (1+2+1)%4 = 0 (1-2-1)%4 = 2 Device 2
	As you can see, with each outer_step, the *_copy_device variables change, ensuring that the reduction operation fetches data from a new, distinct device. This systematic progression guarantees that by the end of all steps, each device has accumulated its required portion of the total sum from all other devices.

vanbasten23 / gist:23096fb5a117e09587a5d5f34232ae7d

Created November 14, 2025 19:01

	import jax
	from jax import export
	import jax.numpy as jnp
	import pickle
	import time
	import statistics

	with open("/home/xiowei_google_com/new_exports.pkl", "rb") as f:
	data = pickle.load(f)

vanbasten23 / gist:1dd9a30a26fda4d7d112cf7611bcbaa4

Created November 14, 2025 19:00

	import jax
	from jax import export
	import jax.numpy as jnp
	import pickle
	import time
	import statistics

	with open("/home/xiowei_google_com/old_exports.pkl", "rb") as f:
	data = pickle.load(f)

vanbasten23 / gist:4a983ec0b95ce28a823b8217193ad1f8

Created November 14, 2025 18:29

	1. Start the benchmark server in vscode as [this](https://gist.github.com/vanbasten23/dd4f3cbb314a7b9cf6c003103c23c019). Select the correct python intepreter.
	2. Then start the vllm server in debugger.
	3. After the server is up and running.
	4. Add the breakpoint (remember to turn of dynamo and jax jit)
	5. Use the [script](https://gist.github.com/vanbasten23/726b28f072993fb7587482672b9c96a9) to send benchmarking request. Make sure to use the correct conda/python.
	6. Then dump the input and output.


	=========================
	pip install flatbuffers

vanbasten23 / gist:726b28f072993fb7587482672b9c96a9

Created November 14, 2025 18:18

	#!/bin/bash

	# Usage:
	# bash run_tpu_benchmark_client.sh --model Qwen/Qwen2.5-1.5B-Instruct --tp 1

	LONGOPTS=model:,tp:,profile
	# Parse arguments
	PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@")
	if [[ $? -ne 0 ]]; then
	exit 2

vanbasten23 / gist:dd4f3cbb314a7b9cf6c003103c23c019

Created November 14, 2025 18:16

	{
	"name": "newjax_benchmark_server",
	"type": "debugpy",
	"request": "launch",
	"program": "/home/xiowei_google_com/miniconda3/envs/vllm_newjax/bin/vllm",
	"console": "integratedTerminal",
	"justMyCode": false,
	"env": {
	"MODEL_IMPL_TYPE": "vllm",
	"TPU_BACKEND_TYPE": "jax",

vanbasten23 / gist:7618a394435a024594fd58de57fa337d

Created November 14, 2025 06:24

	import jax
	from jax import export
	import jax.numpy as jnp
	import pickle
	import time
	import statistics

	with open("/home/xiowei_google_com/old_exports.pkl", "rb") as f:
	data = pickle.load(f)

XiongfeiWei vanbasten23