Created
September 26, 2025 16:08
-
-
Save rizar/9e74742a1e86faa078a5e40b1b49e994 to your computer and use it in GitHub Desktop.
vllm main B200 oom
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| INFO 09-26 08:47:55 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:47:55 [__init__.py:216] Automatically detected platform cuda. | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_log_balancedness' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_window_size' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_step_interval' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'num_redundant_experts' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'num_redundant_experts' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_window_size' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_log_balancedness' is deprecated | |
| WARNING 09-26 08:47:57 [__init__.py:1748] argument 'eplb_step_interval' is deprecated | |
| (APIServer pid=81938) INFO 09-26 08:47:57 [api_server.py:1818] vLLM API server version 0.11.0rc2.dev153+gdb1e42f62 | |
| (APIServer pid=81938) INFO 09-26 08:47:57 [utils.py:233] non-default args: {'model_tag': 'deepseek-ai/DeepSeek-V3.1', 'model': 'deepseek-ai/DeepSeek-V3.1', 'data_parallel_size': 16, 'data_parallel_start_rank': 0, 'data_parallel_size_local': 8, 'data_parallel_address': 'dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local', 'data_parallel_rpc_port': 3389, 'enable_expert_parallel': True, 'enable_dbo': True, 'enable_eplb': True, 'num_redundant_experts': 32, 'eplb_window_size': 1000, 'eplb_step_interval': 3000, 'eplb_log_balancedness': True, 'compilation_config': {"level":null,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":null,"local_cache_dir":null}} | |
| INFO 09-26 08:47:58 [config.py:613] Detected quantization_config.scale_fmt=ue8m0; enabling Hopper UE8M0. | |
| (APIServer pid=81938) INFO 09-26 08:47:58 [config.py:613] Detected quantization_config.scale_fmt=ue8m0; enabling Hopper UE8M0. | |
| INFO 09-26 08:47:59 [model.py:544] Resolved architecture: DeepseekV3ForCausalLM | |
| INFO 09-26 08:47:59 [model.py:1507] Using max model len 163840 | |
| `torch_dtype` is deprecated! Use `dtype` instead! | |
| (APIServer pid=81938) `torch_dtype` is deprecated! Use `dtype` instead! | |
| (APIServer pid=81938) INFO 09-26 08:47:59 [model.py:544] Resolved architecture: DeepseekV3ForCausalLM | |
| (APIServer pid=81938) INFO 09-26 08:47:59 [model.py:1507] Using max model len 163840 | |
| INFO 09-26 08:47:59 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192. | |
| INFO 09-26 08:47:59 [cuda.py:169] Forcing kv cache block size to 128 for CUTLASS_MLA backend. | |
| INFO 09-26 08:47:59 [serve.py:117] Launching 8 data parallel engine(s) in headless mode, with head node address tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:3389. | |
| (APIServer pid=81938) INFO 09-26 08:47:59 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192. | |
| (APIServer pid=81938) INFO 09-26 08:47:59 [cuda.py:169] Forcing kv cache block size to 128 for CUTLASS_MLA backend. | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| (APIServer pid=81938) INFO 09-26 08:48:01 [utils.py:651] Started DP Coordinator process (PID: 82738) | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. | |
| import pynvml # type: ignore[import] | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:02 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:03 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:03 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:03 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| INFO 09-26 08:48:04 [__init__.py:216] Automatically detected platform cuda. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:05 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:06 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:07 [core.py:644] Waiting for init message from front-end. | |
| [Gloo] Rank 12 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 8 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 14 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 11 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 10 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1[Gloo] Rank 9 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| [Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 13 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 15 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc2.dev153+gdb1e42f62) with config: model='deepseek-ai/DeepSeek-V3.1', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-V3.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=16, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=deepseek-ai/DeepSeek-V3.1, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=5 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=2 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=12 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=0 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=15 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world(EngineCore_DP9 pid=80454) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=9 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| _size=16 rank=4 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=14 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=13 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=8 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=3 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=10 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=7 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=11 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=1 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:17 [parallel_state.py:1040] Adjusting world_size=16 rank=6 distributed_init_method=tcp://dima-many-nodes-workers-0-0.dima-many-nodes.default.svc.cluster.local:44105 for DP | |
| [Gloo] Rank 8 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 9 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 10 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 11 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 12 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 14 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 13 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 15 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connect[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| ed to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 14 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 8 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 11 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 9 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 10 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 12 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 13 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 15 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:18 [__init__.py:1382] Found nccl from library libnccl.so.2 | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:18 [pynccl.py:103] vLLM is using nccl==2.27.3 | |
| [Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 8 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 10 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 11 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 9 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 12 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 14 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 15 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| [Gloo] Rank 13 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 1 in world size 16 is assigned as DP rank 1, PP rank 0, TP rank 0, EP rank 1 | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 0 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 3 in world size 16 is assigned as DP rank 3, PP rank 0, TP rank 0, EP rank 3 | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 2 in world size 16 is assigned as DP rank 2, PP rank 0, TP rank 0, EP rank 2 | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 4 in world size 16 is assigned as DP rank 4, PP rank 0, TP rank 0, EP rank 4 | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 5 in world size 16 is assigned as DP rank 5, PP rank 0, TP rank 0, EP rank 5 | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 6 in world size 16 is assigned as DP rank 6, PP rank 0, TP rank 0, EP rank 6 | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 14 in world size 16 is assigned as DP rank 14, PP rank 0, TP rank 0, EP rank 14 | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 15 in world size 16 is assigned as DP rank 15, PP rank 0, TP rank 0, EP rank 15 | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 13 in world size 16 is assigned as DP rank 13, PP rank 0, TP rank 0, EP rank 13 | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 7 in world size 16 is assigned as DP rank 7, PP rank 0, TP rank 0, EP rank 7 | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 12 in world size 16 is assigned as DP rank 12, PP rank 0, TP rank 0, EP rank 12 | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 11 in world size 16 is assigned as DP rank 11, PP rank 0, TP rank 0, EP rank 11 | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 10 in world size 16 is assigned as DP rank 10, PP rank 0, TP rank 0, EP rank 10 | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 9 in world size 16 is assigned as DP rank 9, PP rank 0, TP rank 0, EP rank 9 | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:52 [cuda_communicator.py:116] Using DeepEP Low-Latency all2all manager. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:52 [parallel_state.py:1201] rank 8 in world size 16 is assigned as DP rank 8, PP rank 0, TP rank 0, EP rank 8 | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:52 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:52 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [gpu_model_runner.py:2596] Starting to load model deepseek-ai/DeepSeek-V3.1... | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP14 pid=80459) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP15 pid=80460) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP11 pid=80456) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 14/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->252, 1->253, 2->254, 3->255, 4->256, 5->257, 6->258, 7->259, 8->260, 9->261, 10->262, 11->263, 12->264, 13->265, 14->266, 15->267, 16->268, 17->269. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP1 pid=82742) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 15/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->270, 1->271, 2->272, 3->273, 4->274, 5->275, 6->276, 7->277, 8->278, 9->279, 10->280, 11->281, 12->282, 13->283, 14->284, 15->285, 16->286, 17->287. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP3 pid=82744) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 11/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->198, 1->199, 2->200, 3->201, 4->202, 5->203, 6->204, 7->205, 8->206, 9->207, 10->208, 11->209, 12->210, 13->211, 14->212, 15->213, 16->214, 17->215. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP9 pid=80454) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP4 pid=82745) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP5 pid=82746) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP10 pid=80455) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 3/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->54, 1->55, 2->56, 3->57, 4->58, 5->59, 6->60, 7->61, 8->62, 9->63, 10->64, 11->65, 12->66, 13->67, 14->68, 15->69, 16->70, 17->71. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 1/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->18, 1->19, 2->20, 3->21, 4->22, 5->23, 6->24, 7->25, 8->26, 9->27, 10->28, 11->29, 12->30, 13->31, 14->32, 15->33, 16->34, 17->35. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 9/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->162, 1->163, 2->164, 3->165, 4->166, 5->167, 6->168, 7->169, 8->170, 9->171, 10->172, 11->173, 12->174, 13->175, 14->176, 15->177, 16->178, 17->179. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 4/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->72, 1->73, 2->74, 3->75, 4->76, 5->77, 6->78, 7->79, 8->80, 9->81, 10->82, 11->83, 12->84, 13->85, 14->86, 15->87, 16->88, 17->89. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [gpu_model_runner.py:2628] Loading model from scratch... | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 5/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->90, 1->91, 2->92, 3->93, 4->94, 5->95, 6->96, 7->97, 8->98, 9->99, 10->100, 11->101, 12->102, 13->103, 14->104, 15->105, 16->106, 17->107. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP0 pid=82741) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 10/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->180, 1->181, 2->182, 3->183, 4->184, 5->185, 6->186, 7->187, 8->188, 9->189, 10->190, 11->191, 12->192, 13->193, 14->194, 15->195, 16->196, 17->197. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP8 pid=80453) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP12 pid=80457) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP2 pid=82743) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 0/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->0, 1->1, 2->2, 3->3, 4->4, 5->5, 6->6, 7->7, 8->8, 9->9, 10->10, 11->11, 12->12, 13->13, 14->14, 15->15, 16->16, 17->17. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP13 pid=80458) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 8/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->144, 1->145, 2->146, 3->147, 4->148, 5->149, 6->150, 7->151, 8->152, 9->153, 10->154, 11->155, 12->156, 13->157, 14->158, 15->159, 16->160, 17->161. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 2/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->36, 1->37, 2->38, 3->39, 4->40, 5->41, 6->42, 7->43, 8->44, 9->45, 10->46, 11->47, 12->48, 13->49, 14->50, 15->51, 16->52, 17->53. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP6 pid=82747) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 12/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->216, 1->217, 2->218, 3->219, 4->220, 5->221, 6->222, 7->223, 8->224, 9->225, 10->226, 11->227, 12->228, 13->229, 14->230, 15->231, 16->232, 17->233. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [cuda.py:252] Using Cutlass MLA backend on V1 engine. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 13/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->234, 1->235, 2->236, 3->237, 4->238, 5->239, 6->240, 7->241, 8->242, 9->243, 10->244, 11->245, 12->246, 13->247, 14->248, 15->249, 16->250, 17->251. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP7 pid=82748) WARNING 09-26 08:48:53 [cutlass_mla.py:127] Forcing num_kv_splits to 1 | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 6/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->108, 1->109, 2->110, 3->111, 4->112, 5->113, 6->114, 7->115, 8->116, 9->117, 10->118, 11->119, 12->120, 13->121, 14->122, 15->123, 16->124, 17->125. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [layer.py:1052] [EP Rank 7/16] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 18/288. Experts local to global index map: 0->126, 1->127, 2->128, 3->129, 4->130, 5->131, 6->132, 7->133, 8->134, 9->135, 10->136, 11->137, 12->138, 13->139, 14->140, 15->141, 16->142, 17->143. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [fp8.py:462] Using DeepGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:53 [fp8.py:475] Using CutlassBlockScaledGroupedGemm kernels for Fp8MoEMethod. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:48:54 [weight_utils.py:392] Using model weights format ['*.safetensors'] | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:48:56 [weight_utils.py:413] Time spent downloading weights for deepseek-ai/DeepSeek-V3.1: 0.714183 seconds | |
| Loading safetensors checkpoint shards: 0% Completed | 0/163 [00:00<?, ?it/s] | |
| Loading safetensors checkpoint shards: 1% Completed | 2/163 [00:00<00:21, 7.47it/s] | |
| Loading safetensors checkpoint shards: 2% Completed | 4/163 [00:00<00:13, 11.77it/s] | |
| Loading safetensors checkpoint shards: 4% Completed | 6/163 [00:00<00:17, 9.01it/s] | |
| Loading safetensors checkpoint shards: 5% Completed | 8/163 [00:00<00:19, 8.03it/s] | |
| Loading safetensors checkpoint shards: 6% Completed | 10/163 [00:01<00:20, 7.57it/s] | |
| Loading safetensors checkpoint shards: 7% Completed | 11/163 [00:01<00:23, 6.36it/s] | |
| Loading safetensors checkpoint shards: 8% Completed | 13/163 [00:01<00:22, 6.54it/s] | |
| Loading safetensors checkpoint shards: 9% Completed | 15/163 [00:01<00:17, 8.43it/s] | |
| Loading safetensors checkpoint shards: 12% Completed | 19/163 [00:02<00:10, 13.10it/s] | |
| Loading safetensors checkpoint shards: 13% Completed | 21/163 [00:02<00:14, 9.91it/s] | |
| Loading safetensors checkpoint shards: 14% Completed | 23/163 [00:02<00:15, 8.91it/s] | |
| Loading safetensors checkpoint shards: 15% Completed | 25/163 [00:02<00:17, 8.11it/s] | |
| Loading safetensors checkpoint shards: 16% Completed | 26/163 [00:03<00:34, 3.93it/s] | |
| Loading safetensors checkpoint shards: 17% Completed | 27/163 [00:04<00:38, 3.56it/s] | |
| Loading safetensors checkpoint shards: 17% Completed | 28/163 [00:04<00:39, 3.44it/s] | |
| Loading safetensors checkpoint shards: 18% Completed | 30/163 [00:04<00:31, 4.19it/s] | |
| Loading safetensors checkpoint shards: 19% Completed | 31/163 [00:05<00:33, 3.96it/s] | |
| Loading safetensors checkpoint shards: 21% Completed | 34/163 [00:05<00:19, 6.55it/s] | |
| Loading safetensors checkpoint shards: 23% Completed | 37/163 [00:05<00:13, 9.36it/s] | |
| Loading safetensors checkpoint shards: 25% Completed | 40/163 [00:05<00:12, 9.76it/s] | |
| Loading safetensors checkpoint shards: 26% Completed | 42/163 [00:05<00:10, 11.17it/s] | |
| Loading safetensors checkpoint shards: 27% Completed | 44/163 [00:06<00:12, 9.80it/s] | |
| Loading safetensors checkpoint shards: 28% Completed | 46/163 [00:06<00:13, 8.58it/s] | |
| Loading safetensors checkpoint shards: 29% Completed | 48/163 [00:06<00:14, 8.17it/s] | |
| Loading safetensors checkpoint shards: 30% Completed | 49/163 [00:06<00:17, 6.61it/s] | |
| Loading safetensors checkpoint shards: 31% Completed | 51/163 [00:07<00:17, 6.55it/s] | |
| Loading safetensors checkpoint shards: 33% Completed | 53/163 [00:07<00:13, 8.30it/s] | |
| Loading safetensors checkpoint shards: 34% Completed | 55/163 [00:07<00:18, 5.83it/s] | |
| Loading safetensors checkpoint shards: 35% Completed | 57/163 [00:08<00:18, 5.64it/s] | |
| Loading safetensors checkpoint shards: 36% Completed | 58/163 [00:08<00:20, 5.24it/s] | |
| Loading safetensors checkpoint shards: 37% Completed | 60/163 [00:08<00:18, 5.66it/s] | |
| Loading safetensors checkpoint shards: 38% Completed | 62/163 [00:08<00:13, 7.33it/s] | |
| Loading safetensors checkpoint shards: 39% Completed | 64/163 [00:09<00:14, 6.88it/s] | |
| Loading safetensors checkpoint shards: 40% Completed | 65/163 [00:09<00:19, 4.92it/s] | |
| Loading safetensors checkpoint shards: 41% Completed | 67/163 [00:10<00:17, 5.47it/s] | |
| Loading safetensors checkpoint shards: 43% Completed | 70/163 [00:10<00:11, 8.07it/s] | |
| Loading safetensors checkpoint shards: 44% Completed | 72/163 [00:10<00:11, 7.63it/s] | |
| Loading safetensors checkpoint shards: 45% Completed | 74/163 [00:10<00:12, 6.99it/s] | |
| Loading safetensors checkpoint shards: 47% Completed | 77/163 [00:11<00:11, 7.41it/s] | |
| Loading safetensors checkpoint shards: 48% Completed | 79/163 [00:11<00:09, 8.88it/s] | |
| Loading safetensors checkpoint shards: 50% Completed | 82/163 [00:11<00:06, 11.64it/s] | |
| Loading safetensors checkpoint shards: 52% Completed | 85/163 [00:11<00:05, 14.22it/s] | |
| Loading safetensors checkpoint shards: 53% Completed | 87/163 [00:11<00:06, 11.47it/s] | |
| Loading safetensors checkpoint shards: 55% Completed | 89/163 [00:12<00:07, 9.36it/s] | |
| Loading safetensors checkpoint shards: 56% Completed | 91/163 [00:12<00:08, 8.24it/s] | |
| Loading safetensors checkpoint shards: 57% Completed | 93/163 [00:12<00:09, 7.25it/s] | |
| Loading safetensors checkpoint shards: 58% Completed | 95/163 [00:12<00:07, 8.82it/s] | |
| Loading safetensors checkpoint shards: 60% Completed | 98/163 [00:13<00:07, 9.03it/s] | |
| Loading safetensors checkpoint shards: 61% Completed | 100/163 [00:13<00:07, 8.33it/s] | |
| Loading safetensors checkpoint shards: 62% Completed | 101/163 [00:13<00:09, 6.56it/s] | |
| Loading safetensors checkpoint shards: 64% Completed | 104/163 [00:14<00:06, 9.14it/s] | |
| Loading safetensors checkpoint shards: 66% Completed | 107/163 [00:14<00:06, 9.08it/s] | |
| Loading safetensors checkpoint shards: 67% Completed | 109/163 [00:14<00:06, 8.00it/s] | |
| Loading safetensors checkpoint shards: 68% Completed | 111/163 [00:14<00:06, 7.60it/s] | |
| Loading safetensors checkpoint shards: 70% Completed | 114/163 [00:15<00:04, 10.25it/s] | |
| Loading safetensors checkpoint shards: 71% Completed | 116/163 [00:15<00:05, 8.40it/s] | |
| Loading safetensors checkpoint shards: 72% Completed | 118/163 [00:15<00:07, 6.30it/s] | |
| Loading safetensors checkpoint shards: 73% Completed | 119/163 [00:16<00:07, 5.62it/s] | |
| Loading safetensors checkpoint shards: 74% Completed | 120/163 [00:16<00:08, 5.06it/s] | |
| Loading safetensors checkpoint shards: 77% Completed | 125/163 [00:16<00:03, 9.99it/s] | |
| Loading safetensors checkpoint shards: 79% Completed | 128/163 [00:17<00:03, 9.55it/s] | |
| Loading safetensors checkpoint shards: 80% Completed | 130/163 [00:17<00:03, 8.27it/s] | |
| Loading safetensors checkpoint shards: 82% Completed | 133/163 [00:17<00:03, 7.98it/s] | |
| Loading safetensors checkpoint shards: 83% Completed | 136/163 [00:18<00:03, 8.08it/s] | |
| Loading safetensors checkpoint shards: 85% Completed | 138/163 [00:18<00:02, 9.33it/s] | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:14 [default_loader.py:267] Loading weights took 20.01 seconds | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:14 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:15 [default_loader.py:267] Loading weights took 20.09 seconds | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:15 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| Loading safetensors checkpoint shards: 87% Completed | 142/163 [00:18<00:02, 9.81it/s] | |
| Loading safetensors checkpoint shards: 88% Completed | 144/163 [00:18<00:01, 11.07it/s] | |
| Loading safetensors checkpoint shards: 90% Completed | 146/163 [00:19<00:01, 9.20it/s] | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:15 [default_loader.py:267] Loading weights took 20.00 seconds | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:15 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| Loading safetensors checkpoint shards: 91% Completed | 149/163 [00:19<00:01, 11.44it/s] | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:15 [default_loader.py:267] Loading weights took 20.22 seconds | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:15 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| Loading safetensors checkpoint shards: 93% Completed | 151/163 [00:19<00:01, 7.58it/s] | |
| Loading safetensors checkpoint shards: 94% Completed | 153/163 [00:19<00:01, 9.00it/s] | |
| Loading safetensors checkpoint shards: 95% Completed | 155/163 [00:20<00:00, 8.04it/s] | |
| Loading safetensors checkpoint shards: 96% Completed | 157/163 [00:20<00:00, 6.23it/s] | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:17 [default_loader.py:267] Loading weights took 21.00 seconds | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:17 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:17 [default_loader.py:267] Loading weights took 20.43 seconds | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:17 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| Loading safetensors checkpoint shards: 97% Completed | 158/163 [00:20<00:00, 5.41it/s] | |
| Loading safetensors checkpoint shards: 98% Completed | 160/163 [00:21<00:00, 7.07it/s] | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:17 [default_loader.py:267] Loading weights took 19.56 seconds | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:17 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| Loading safetensors checkpoint shards: 99% Completed | 162/163 [00:21<00:00, 6.90it/s] | |
| Loading safetensors checkpoint shards: 100% Completed | 163/163 [00:21<00:00, 7.61it/s] | |
| (EngineCore_DP0 pid=82741) | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:17 [default_loader.py:267] Loading weights took 21.42 seconds | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:17 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:18 [default_loader.py:267] Loading weights took 20.94 seconds | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:18 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:18 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 24.902930 seconds | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:18 [default_loader.py:267] Loading weights took 19.31 seconds | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:18 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:18 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 25.091802 seconds | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:18 [default_loader.py:267] Loading weights took 19.36 seconds | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:18 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:18 [default_loader.py:267] Loading weights took 20.34 seconds | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:18 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:19 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 25.479601 seconds | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:19 [default_loader.py:267] Loading weights took 21.16 seconds | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:19 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:19 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 25.535524 seconds | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:20 [default_loader.py:267] Loading weights took 22.63 seconds | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:20 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:20 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 27.125275 seconds | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:21 [default_loader.py:267] Loading weights took 23.58 seconds | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:21 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:21 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 27.441479 seconds | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:21 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 27.809751 seconds | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:21 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 27.949052 seconds | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:21 [default_loader.py:267] Loading weights took 22.33 seconds | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:21 [deep_gemm.py:51] DeepGEMM E8M0 enabled on Blackwell GPU. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:21 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 28.261098 seconds | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:21 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 28.256103 seconds | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:22 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 28.985281 seconds | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:22 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 28.906650 seconds | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:22 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 29.233658 seconds | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:23 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 30.283362 seconds | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:24 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 31.243887 seconds | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:25 [gpu_model_runner.py:2647] Model loading took 63.0667 GiB and 31.532850 seconds | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:37 [gpu_model_runner.py:2658] EPLB is enabled for model deepseek-ai/DeepSeek-V3.1. | |
| (EngineCore_DP2 pid=82743) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP3 pid=82744) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP11 pid=80456) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP10 pid=80455) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP7 pid=82748) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP4 pid=82745) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP0 pid=82741) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP6 pid=82747) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP9 pid=80454) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP14 pid=80459) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP15 pid=80460) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP8 pid=80453) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP1 pid=82742) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP5 pid=82746) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP13 pid=80458) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP12 pid=80457) WARNING 09-26 08:49:40 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used. | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_14/backbone for vLLM's torch.compile | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 7.78 s | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_10/backbone for vLLM's torch.compile | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.11 s | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_0/backbone for vLLM's torch.compile | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.29 s | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_4/backbone for vLLM's torch.compile | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_9/backbone for vLLM's torch.compile | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.41 s | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_8/backbone for vLLM's torch.compile | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.42 s | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.43 s | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_1/backbone for vLLM's torch.compile | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.46 s | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_5/backbone for vLLM's torch.compile | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.47 s | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_6/backbone for vLLM's torch.compile | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.49 s | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_3/backbone for vLLM's torch.compile | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.55 s | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_13/backbone for vLLM's torch.compile | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.57 s | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_7/backbone for vLLM's torch.compile | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.59 s | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_12/backbone for vLLM's torch.compile | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.58 s | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_2/backbone for vLLM's torch.compile | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.62 s | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:48 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_11/backbone for vLLM's torch.compile | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:48 [backends.py:559] Dynamo bytecode transform time: 8.65 s | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:49 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_15/backbone for vLLM's torch.compile | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:49 [backends.py:559] Dynamo bytecode transform time: 8.69 s | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:50 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.297 s | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.316 s | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.326 s | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.435 s | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.441 s | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.400 s | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.421 s | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.469 s | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.491 s | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.463 s | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.451 s | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.465 s | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.481 s | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.479 s | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.463 s | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:51 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 2.475 s | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.42 s in total | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.49 s in total | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.59 s in total | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.47 s in total | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.55 s in total | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.29 s in total | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.46 s in total | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.62 s in total | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 7.78 s in total | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.11 s in total | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.58 s in total | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.41 s in total | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.65 s in total | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.57 s in total | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.43 s in total | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:49:59 [monitor.py:34] torch.compile takes 8.69 s in total | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:50:00 [eplb_state.py:433] Rearranging experts (profile)... | |
| [rank8]:[W926 08:50:03.656220602 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 8] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank9]:[W926 08:50:03.715266028 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 9] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank10]:[W926 08:50:03.738368168 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 10] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank11]:[W926 08:50:03.744325836 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 11] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank5]:[W926 08:50:03.073912577 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 5] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank12]:[W926 08:50:03.751552613 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 12] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank1]:[W926 08:50:03.098485900 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 1] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| [rank14]:[W926 08:50:03.778711217 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 14] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank13]:[W926 08:50:03.782639455 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 13] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank7]:[W926 08:50:03.114977437 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 7] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank15]:[W926 08:50:03.804044008 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 15] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank4]:[W926 08:50:03.133765818 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 4] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank3]:[W926 08:50:03.147932894 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 3] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank0]:[W926 08:50:03.154566795 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| [rank6]:[W926 08:50:03.157637202 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 6] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| [rank2]:[W926 08:50:03.199464975 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 2] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. | |
| Warning: please use at least NVCC 12.9 for the best DeepGEMM performance | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:50:06 [eplb_state.py:549] Rearranged experts (profile) in 6.45 seconds. | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.17 GiB | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP5 pid=82746) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP15 pid=80460) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP12 pid=80457) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.17 GiB | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 7%|▋ | 577/8192 [00:00<00:01, 5754.76it/s](EngineCore_DP14 pid=80459) INFO 09-26 08:50:07 [gpu_worker.py:298] Available KV cache memory: 75.16 GiB | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 7%|▋ | 578/8192 [00:00<00:01, 5771.36it/s](EngineCore_DP10 pid=80455) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP2 pid=82743) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP6 pid=82747) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 14%|█▍ | 1153/8192 [00:00<00:01, 4340.69it/s](EngineCore_DP9 pid=80454) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 14%|█▍ | 1154/8192 [00:00<00:01, 4290.84it/s](EngineCore_DP7 pid=82748) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 14%|█▍ | 1168/8192 [00:00<00:01, 4403.93it/s](EngineCore_DP12 pid=80457) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 14%|█▍ | 1150/8192 [00:00<00:01, 4357.20it/s](EngineCore_DP11 pid=80456) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 20%|█▉ | 1606/8192 [00:00<00:01, 4117.02it/s](EngineCore_DP8 pid=80453) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP13 pid=80458) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 20%|█▉ | 1630/8192 [00:00<00:01, 4242.09it/s](EngineCore_DP0 pid=82741) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 0%| | 0/8192 [00:00<?, ?it/s](EngineCore_DP3 pid=82744) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 14%|█▍ | 1165/8192 [00:00<00:01, 4395.15it/s](EngineCore_DP14 pid=80459) INFO 09-26 08:50:07 [kv_cache_utils.py:1087] GPU KV cache size: 1,148,416 tokens | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:50:07 [kv_cache_utils.py:1091] Maximum concurrency for 163,840 tokens per request: 7.01x | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6130.56it/s](EngineCore_DP14 pid=80459) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6099.72it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6005.28it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6136.38it/s](EngineCore_DP5 pid=82746) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6104.86it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6120.15it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6099.29it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6217.70it/s](EngineCore_DP9 pid=80454) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6078.16it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6253.83it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6167.01it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6093.79it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6100.55it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6138.88it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6103.18it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([24576, 1536])): 100%|██████████| 8192/8192 [00:01<00:00, 6114.68it/s]EngineCore_DP3 pid=82744) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7807.13it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7753.27it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7769.11it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7736.04it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7738.89it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7700.22it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7631.36it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7815.85it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7631.47it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7707.75it/s]] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7790.89it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7684.88it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7745.49it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7660.61it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7684.63it/s]DP0 pid=82741) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([32768, 512])): 100%|██████████| 8192/8192 [00:01<00:00, 7715.28it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2494.36it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2499.46it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2529.99it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2487.16it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2387.96it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2488.90it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2529.98it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2469.80it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2471.28it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2507.36it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2470.61it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2498.99it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2494.81it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2480.47it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2502.96it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 16384])): 100%|██████████| 8192/8192 [00:03<00:00, 2490.07it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1162.02it/s]gemm_nt) warmup (W=torch.Size([36864, 7168])): 57%|█████▋ | 4664/8192 [00:05<00:02, 1201.34it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1160.10it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1167.52it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 80%|███████▉ | 6521/8192 [00:06<00:00, 1953.30it/s]emm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 2%|▏ | 170/8192 [00:00<00:04, 1699.56iDeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1158.02it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:06<00:00, 1172.63it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1148.40it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1159.96it/s](EngineCore_DP6 pid=82747) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1164.04it/s]EngineCore_DP0 pid=82741) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 10%|▉ | 784/8192 [00:00<00:05, 1424.68it/s]████| 8192/8192 [00:07<00:00, 1149.30it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1159.38it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1145.35it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1153.61it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1153.84it/s]EngineCore_DP8 pid=80453) | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1155.17it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1157.63it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([36864, 7168])): 100%|██████████| 8192/8192 [00:07<00:00, 1100.75it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2155.90it/s]=torch.Size([7168, 18432])): 50%|█████ | 4102/8192 [00:02<00:02, 1887.51it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2156.80it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2168.96it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2182.65it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2159.42it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2155.59it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2116.85it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2125.33it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2141.26it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2124.17it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2157.19it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2115.61it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2127.40it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2154.01it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:03<00:00, 2136.67it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 18432])): 100%|██████████| 8192/8192 [00:04<00:00, 2017.10it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8288.00it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8259.94it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8324.57it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8411.72it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8390.72it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8412.21it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8374.28it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8312.17it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8236.60it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8413.90it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8270.14it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8258.84it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8211.23it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:01<00:00, 8181.40it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:00<00:00, 8207.04it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12832.42it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 14%|█▍ | 1151/8192 [00:00<00:00, 11500.99it/s](EngineCore_DP15 pid=80460) 2025-09-26 08:50:25,542 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12271.68it/s] | |
| (EngineCore_DP4 pid=82745) 2025-09-26 08:50:25,555 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12813.71it/s] | |
| (EngineCore_DP10 pid=80455) 2025-09-26 08:50:25,560 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12942.22it/s] | |
| (EngineCore_DP2 pid=82743) 2025-09-26 08:50:25,715 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12592.89it/s] | |
| (EngineCore_DP12 pid=80457) 2025-09-26 08:50:25,726 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([4096, 7168])): 100%|██████████| 8192/8192 [00:01<00:00, 8179.73it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 13036.02it/s] | |
| (EngineCore_DP11 pid=80456) 2025-09-26 08:50:25,924 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 13038.50it/s] | |
| (EngineCore_DP6 pid=82747) 2025-09-26 08:50:25,931 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12868.11it/s] | |
| (EngineCore_DP9 pid=80454) 2025-09-26 08:50:26,023 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12434.36it/s] | |
| (EngineCore_DP13 pid=80458) 2025-09-26 08:50:26,101 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12736.70it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 42%|████▏ | 3419/8192 [00:00<00:00, 10700.07it/s](EngineCore_DP0 pid=82741) 2025-09-26 08:50:26,105 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12374.24it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 95%|█████████▍| 7780/8192 [00:00<00:00, 13825.11it/s](EngineCore_DP14 pid=80459) 2025-09-26 08:50:26,111 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12789.09it/s] | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12522.23it/s] | |
| (EngineCore_DP1 pid=82742) 2025-09-26 08:50:26,143 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| (EngineCore_DP3 pid=82744) 2025-09-26 08:50:26,147 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12384.20it/s] | |
| (EngineCore_DP8 pid=80453) 2025-09-26 08:50:26,155 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12298.02it/s] | |
| (EngineCore_DP7 pid=82748) 2025-09-26 08:50:26,165 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| DeepGemm(fp8_gemm_nt) warmup (W=torch.Size([7168, 2048])): 100%|██████████| 8192/8192 [00:00<00:00, 12723.18it/s] | |
| (EngineCore_DP5 pid=82746) 2025-09-26 08:50:26,434 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| (EngineCore_DP2 pid=82743) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP7 pid=82748) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP4 pid=82745) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP5 pid=82746) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP0 pid=82741) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP6 pid=82747) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP3 pid=82744) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP13 pid=80458) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP9 pid=80454) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP11 pid=80456) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP10 pid=80455) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP14 pid=80459) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP12 pid=80457) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP8 pid=80453) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| (EngineCore_DP15 pid=80460) 2025-09-26 08:50:31,994 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| 1;36m(EngineCore_DP1 pid=82742) 2025-09-26 08:50:31,995 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| Capturing CUDA graphs (decode, FULL): 100%|██████████| 67/67 [01:14<00:00, 1.11s/it] | |
| (EngineCore_DP15 pid=80460) INFO 09-26 08:51:46 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP3 pid=82744) INFO 09-26 08:51:46 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP8 pid=80453) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP5 pid=82746) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.58 GiB | |
| (EngineCore_DP2 pid=82743) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP7 pid=82748) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP9 pid=80454) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.58 GiB | |
| (EngineCore_DP10 pid=80455) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP0 pid=82741) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP12 pid=80457) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.58 GiB | |
| (EngineCore_DP11 pid=80456) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP1 pid=82742) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.58 GiB | |
| (EngineCore_DP6 pid=82747) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP13 pid=80458) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP4 pid=82745) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.58 GiB | |
| (EngineCore_DP14 pid=80459) INFO 09-26 08:51:47 [gpu_model_runner.py:3443] Graph capturing finished in 75 secs, took -4.57 GiB | |
| (EngineCore_DP9 pid=80454) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP9 pid=80454) Traceback (most recent call last): | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP9 pid=80454) self.run() | |
| (EngineCore_DP0 pid=82741) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP0 pid=82741) Traceback (most recent call last): | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP0 pid=82741) self.run() | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP9 pid=80454) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP0 pid=82741) self._target(*s(EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| elf._args, **self._kwargs) | |
| (EngineCore_DP9 pid=80454) model_output = model( | |
| (EngineCore_DP9 pid=80454) ^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP0 pid=82741) model_output = model( | |
| (EngineCore_DP0 pid=82741) ^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP0 pid=82741) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP0 pid=82741) def forward( | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP0 pid=82741) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP0 pid=82741) return fn(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP0 pid=82741) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP0 pid=82741) raise e | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP0 pid=82741) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP0 pid=82741) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP0 pid=82741) raise e | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP0 pid=82741) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP0 pid=82741) return self._op(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP0 pid=82741) self.impl.forward(self, | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP0 pid=82741) _ = torch.empty( | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP6 pid=82747) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP6 pid=82747) Traceback (most recent call last): | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP6 pid=82747) self.run() | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP6 pid=82747) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP6 pid=82747) return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP6 pid=82747) model_output = model( | |
| (EngineCore_DP6 pid=82747) ^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP15 pid=80460) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP15 pid=80460) Traceback (most recent call last): | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP15 pid=80460) self.run() | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP15 pid=80460) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP15 pid=80460) return func(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP12 pid=80457) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP12 pid=80457) Traceback (most recent call last): | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP12 pid=80457) self.run() | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP12 pid=80457) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP12 pid=80457) return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP2 pid=82743) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP2 pid=82743) Traceback (most recent call last): | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP2 pid=82743) self.run() | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP2 pid=82743) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP2 pid=82743) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP8 pid=80453) Traceback (most recent call last): | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_in(EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP2 pid=82743) model_output = model( | |
| (EngineCore_DP2 pid=82743) ^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| ner | |
| (EngineCore_DP8 pid=80453) self.run() | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP8 pid=80453) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP8 pid=80453) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP8 pid=80453) model_output = model( | |
| (EngineCore_DP8 pid=80453) ^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP8 pid=80453) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP2 pid=82743) def forward( | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP8 pid=80453) def forward( | |
| (EngineCore_DP2 pid=82743) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP8 pid=80453) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP8 pid=80453) return fn(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP8 pid=80453) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP8 pid=80453) raise e | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP8 pid=80453) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP8 pid=80453) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP8 pid=80453) raise e | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP8 pid=80453) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP8 pid=80453) return self._op(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP8 pid=80453) self.impl.forward(self, | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP8 pid=80453) _ = torch.empty( | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP0 pid=82741) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP0 pid=82741) Traceback (most recent call last): | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP0 pid=82741) self.run() | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP0 pid=82741) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP0 pid=82741) model_output = model( | |
| (EngineCore_DP0 pid=82741) ^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP0 pid=82741) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP0 pid=82741) def forward( | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP0 pid=82741) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return fn(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP0 pid=82741) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) raise e | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP0 pid=82741) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP2 pid=82743) return fn(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP5 pid=82746) self.run() | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP2 pid=82743) raise e | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP0 pid=82741) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP5 pid=82746) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP0 pid=82741) return self.current_callable(inputs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_0/inductor_cache/e7/ce7tkyorzlwqu6v6irtyxtkchaiwdr7h2f7nx7n7bd2dqfeorxj7.py", line 620, in call | |
| (EngineCore_DP0 pid=82741) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP5 pid=82746) return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) return self._op(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP5 pid=82746) model_output = model( | |
| (EngineCore_DP5 pid=82746) ^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP0 pid=82741) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP5 pid=82746) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP0 pid=82741) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP6 pid=82747) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP0 pid=82741) process_chunk(chunk_start, | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP2 pid=82743) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP6 pid=82747) def forward( | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP0 pid=82741) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP2 pid=82743) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP2 pid=82743) Traceback (most recent call last): | |
| (EngineCore_DP6 pid=82747) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP0 pid=82741) result = self.fused_experts( | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP2 pid=82743) raise e | |
| (EngineCore_DP3 pid=82744) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP3 pid=82744) Traceback (most recent call last): | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP0 pid=82741) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| self.run() | |
| (EngineCore_DP3 pid=82744) self.run() | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP6 pid=82747) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) dbo_register_recv_hook(hook) | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP2 pid=82743) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP5 pid=82746) def forward( | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP6 pid=82747) return fn(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP3 pid=82744) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) model_output = model( | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP2 pid=82743) ^^^^^^ | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) raise e | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP2 pid=82743) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP2 pid=82743) return self._op(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/py(EngineCore_DP13 pid=80458) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP13 pid=80458) Traceback (most recent call last): | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP13 pid=80458) self.run() | |
| thon3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP13 pid=80458) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP5 pid=82746) return fn(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) def forward( | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| self.impl.forward(self, | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP6 pid=82747) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP13 pid=80458) model_output = model( | |
| (EngineCore_DP13 pid=80458) ^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP13 pid=80458) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP13 pid=80458) def forward( | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP13 pid=80458) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP9 pid=80454) def forward( | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP9 pid=80454) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) return fn(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP2 pid=82743) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP6 pid=82747) Traceback (most recent call last): | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) raise e | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) return fn(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP9 pid=80454) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP5 pid=82746) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP5 pid=82746) raise e | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "<eval_with_key>.127", line 696, in forward | |
| _ = torch.empty( | |
| (EngineCore_DP5 pid=82746) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP2 pid=82743) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP6 pid=82747) self.run() | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP0 pid=82741) Process EngineCore_DP0: | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP6 pid=82747) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP6 pid=82747) return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) model_output = model( | |
| (EngineCore_DP6 pid=82747) ^^^^^^ | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP9 pid=80454) raise e | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP14 pid=80459) Traceback (most recent call last): | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP9 pid=80454) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP9 pid=80454) Traceback (most recent call last): | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP14 pid=80459) self.run() | |
| (EngineCore_DP13 pid=80458) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP13 pid=80458) Traceback (most recent call last): | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP6 pid=82747) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) Traceback (most recent call last): | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) return fn(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP6 pid=82747) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP9 pid=80454) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP13 pid=80458) self.run() | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP9 pid=80454) self.run() | |
| (EngineCore_DP13 pid=80458) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP13 pid=80458) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP13 pid=80458) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP0 pid=82741) self.run() | |
| (EngineCore_DP0 pid=82741) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP0 pid=82741) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP0 pid=82741) raise e | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP0 pid=82741) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP0 pid=82741) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP0 pid=82741) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP0 pid=82741) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP2 pid=82743) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP0 pid=82741) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP0 pid=82741) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP0 pid=82741) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP0 pid=82741) return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) self.model_runner._dummy_run( | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP0 pid=82741) return func(*args, **kwargs) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP0 pid=82741) outputs = self.model( | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP0 pid=82741) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP0 pid=82741) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP0 pid=82741) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=82741) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP6 pid=82747) def forward( | |
| (EngineCore_DP2 pid=82743) raise e | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP2 pid=82743) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| raise e | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP6 pid=82747) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) self.run() | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) raise e | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP13 pid=80458) model_output = model( | |
| (EngineCore_DP13 pid=80458) ^^^^^^ | |
| (EngineCore_DP11 pid=80456) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP11 pid=80456) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP14 pid=80459) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| model_output = model( | |
| (EngineCore_DP13 pid=80458) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP11 pid=80456) self.run() | |
| (EngineCore_DP13 pid=80458) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP9 pid=80454) raise e | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP5 pid=82746) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP5 pid=82746) return func(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP2 pid=82743) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP6 pid=82747) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) model_output = model( | |
| (EngineCore_DP2 pid=82743) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP5 pid=82746) ^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP14 pid=80459) return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP14 pid=80459) model_output = model( | |
| (EngineCore_DP14 pid=80459) ^^^^^^ | |
| model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP12 pid=80457) model_output = model( | |
| (EngineCore_DP12 pid=80457) ^^^^^^ | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP15 pid=80460) model_output = model( | |
| (EngineCore_DP15 pid=80460) ^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP12 pid=80457) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) def forward( | |
| (EngineCore_DP15 pid=80460) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) def forward( | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP11 pid=80456) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) def forward( | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP12 pid=80457) def forward( | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP15 pid=80460) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP10 pid=80455) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP10 pid=80455) Traceback (most recent call last): | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP11 pid=80456) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP14 pid=80459) def forward( | |
| (EngineCore_DP8 pid=80453) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP2 pid=82743) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP5 pid=82746) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP2 pid=82743) return self.current_callable(inputs) | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_2/inductor_cache/q4/cq425g5xnesdfzbq6bq3zas7apd2ofpm6cb5qwqsalvrhvcuwe2u.py", line 620, in call | |
| (EngineCore_DP5 pid=82746) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| raise e | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP2 pid=82743) return self._op(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP5 pid=82746) def forward( | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP6 pid=82747) return fn(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP5 pid=82746) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP2 pid=82743) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP6 pid=82747) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) process_chunk(chunk_start, | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP6 pid=82747) raise e | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP2 pid=82743) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP6 pid=82747) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) result = self.fused_experts( | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP6 pid=82747) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "<eval_with_key>.5", line 5, in forward | |
| return self._op(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP2 pid=82743) dbo_register_recv_hook(hook) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP2 pid=82743) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP5 pid=82746) return fn(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP6 pid=82747) self.impl.forward(self, | |
| (EngineCore_DP5 pid=82746) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP5 pid=82746) raise e | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP5 pid=82746) return self._op(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) _ = torch.empty( | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP6 pid=82747) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP5 pid=82746) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP6 pid=82747) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP6 pid=82747) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP6 pid=82747) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP5 pid=82746) self.impl.forward(self, | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP6 pid=82747) return self.current_callable(inputs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_6/inductor_cache/ts/cts2ar5qhxathybomqk65dtff5qcvqwumjrzvt6ah5b5646revcf.py", line 620, in call | |
| (EngineCore_DP6 pid=82747) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP6 pid=82747) return self._op(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP5 pid=82746) _ = torch.empty( | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.11 GiB is free. Including non-PyTorch memory, this process has 176.22 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP6 pid=82747) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP2 pid=82743) Process EngineCore_DP2: | |
| (EngineCore_DP5 pid=82746) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP5 pid=82746) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP5 pid=82746) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP6 pid=82747) process_chunk(chunk_start, | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP5 pid=82746) return self.current_callable(inputs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_5/inductor_cache/xp/cxphdunaxc7bqs7uvqt4kyvfr6ektkeozuv5juzj55d3wvfhuhco.py", line 620, in call | |
| (EngineCore_DP5 pid=82746) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP6 pid=82747) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP2 pid=82743) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) return self._op(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) result = self.fused_experts( | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP2 pid=82743) self.run() | |
| (EngineCore_DP2 pid=82743) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP2 pid=82743) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP2 pid=82743) raise e | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP2 pid=82743) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP2 pid=82743) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP2 pid=82743) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP2 pid=82743) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP2 pid=82743) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP2 pid=82743) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP2 pid=82743) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP6 pid=82747) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) return func(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP2 pid=82743) self.model_runner._dummy_run( | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP2 pid=82743) return func(*args, **kwargs) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP2 pid=82743) outputs = self.model( | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP2 pid=82743) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP2 pid=82743) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP2 pid=82743) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP2 pid=82743) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP2 pid=82743) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP6 pid=82747) return forward_call(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP5 pid=82746) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP6 pid=82747) dbo_register_recv_hook(hook) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP6 pid=82747) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP5 pid=82746) process_chunk(chunk_start, | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP5 pid=82746) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP5 pid=82746) result = self.fused_experts( | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP5 pid=82746) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP5 pid=82746) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP5 pid=82746) dbo_register_recv_hook(hook) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP5 pid=82746) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP6 pid=82747) Process EngineCore_DP6: | |
| (EngineCore_DP6 pid=82747) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) Process EngineCore_DP5: | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP6 pid=82747) self.run() | |
| (EngineCore_DP6 pid=82747) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP6 pid=82747) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP6 pid=82747) raise e | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP6 pid=82747) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP6 pid=82747) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP6 pid=82747) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP6 pid=82747) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP6 pid=82747) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP6 pid=82747) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP6 pid=82747) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP6 pid=82747) return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP6 pid=82747) self.model_runner._dummy_run( | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP6 pid=82747) return func(*args, **kwargs) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP6 pid=82747) outputs = self.model( | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP6 pid=82747) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP6 pid=82747) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP6 pid=82747) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP6 pid=82747) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP5 pid=82746) Traceback (most recent call last): | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP5 pid=82746) self.run() | |
| (EngineCore_DP5 pid=82746) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP5 pid=82746) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP5 pid=82746) raise e | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP5 pid=82746) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP5 pid=82746) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP5 pid=82746) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP5 pid=82746) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP5 pid=82746) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP13 pid=80458) return self._op(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) self.run() | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| m(EngineCore_DP5 pid=82746) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP5 pid=82746) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP5 pid=82746) return func(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP5 pid=82746) self.model_runner._dummy_run( | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP5 pid=82746) return func(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) self.run() | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP12 pid=80457) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) self.impl.forward(self, | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP8 pid=80453) model_output = model( | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP11 pid=80456) model_output = model( | |
| (EngineCore_DP11 pid=80456) ^^^^^^ | |
| (EngineCore_DP8 pid=80453) def forward( | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP11 pid=80456) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP5 pid=82746) outputs = self.model( | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP5 pid=82746) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP5 pid=82746) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP5 pid=82746) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP5 pid=82746) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP13 pid=80458) _ = torch.empty( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP4 pid=82745) Traceback (most recent call last): | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP4 pid=82745) self.run() | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP13 pid=80458) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP3 pid=82744) model_output = model( | |
| (EngineCore_DP3 pid=82744) ^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP4 pid=82745) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP4 pid=82745) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP7 pid=82748) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP3 pid=82744) def forward( | |
| (EngineCore_DP7 pid=82748) Traceback (most recent call last): | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP3 pid=82744) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) self.run() | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP3 pid=82744) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP3 pid=82744) Traceback (most recent call last): | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP7 pid=82748) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) self.run() | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP3 pid=82744) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP3 pid=82744) return fn(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP3 pid=82744) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP3 pid=82744) return self._wrapped_call(self, *args, **kwargs) | |
| model_output = model( | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP3 pid=82744) raise e | |
| model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) def forward( | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP3 pid=82744) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP3 pid=82744) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) model_output = model( | |
| (EngineCore_DP7 pid=82748) ^^^^^^ | |
| (EngineCore_DP4 pid=82745) model_output = model( | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP4 pid=82745) ^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^(EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| ^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) model_output = self.forward(*args, **kwargs) | |
| File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP3 pid=82744) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP4 pid=82745) def forward( | |
| (EngineCore_DP7 pid=82748) def forward( | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| return fn(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP11 pid=80456) def forward( | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP9 pid=80454) return fn(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP15 pid=80460) return fn(*args, **kwargs) | |
| (EngineCore_DP(EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP4 pid=82745) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) raise e | |
| 13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP1 pid=82742) Exception in thread Thread-253 (_ubatch_thread): | |
| (EngineCore_DP1 pid=82742) Traceback (most recent call last): | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP12 pid=80457) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP12 pid=80457) Traceback (most recent call last): | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP9 pid=80454) return self._op(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return fn(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) return fn(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP8 pid=80453) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return fn(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP8 pid=80453) raise e | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP14 pid=80459) return fn(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP13 pid=80458) raise e | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP12 pid=80457) self.run() | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP13 pid=80458) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP9 pid=80454) self.impl.forward(self, | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP14 pid=80459) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP15 pid=80460) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP9 pid=80454) raise e | |
| (EngineCore_DP15 pid=80460) Traceback (most recent call last): | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP12 pid=80457) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP10 pid=80455) model_output = model( | |
| (EngineCore_DP10 pid=80455) ^^^^^^ | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) raise e | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP10 pid=80455) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) self.run() | |
| (EngineCore_DP8 pid=80453) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP12 pid=80457) raise e | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return self.compiled_graph_for_general_shape(*args) | |
| _ = torch.empty( | |
| (EngineCore_DP12 pid=80457) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP13 pid=80458) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP14 pid=80459) raise e | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP10 pid=80455) def forward( | |
| (EngineCore_DP9 pid=80454) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.11 GiB is free. Including non-PyTorch memory, this process has 176.22 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP13 pid=80458) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP13 pid=80458) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) return self.current_callable(inputs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_8/inductor_cache/yf/cyfhx55dn4jky4v4iy6ontf4vx3bs6f4hq7pcgy64uytl3cvcjdb.py", line 620, in call | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP12 pid=80457) model_output = model( | |
| (EngineCore_DP12 pid=80457) ^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP10 pid=80455) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) return self.current_callable(inputs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_13/inductor_cache/66/c66lq3ueoq356oy4abzniyhekxbcgt6tpupnwwkwbvfmhj6aupia.py", line 620, in call | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP8 pid=80453) return self._op(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP11 pid=80456) return fn(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP14 pid=80459) Traceback (most recent call last): | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) return self._op(*args, **kwargs) | |
| Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) Traceback (most recent call last): | |
| (EngineCore_DP15 pid=80460) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP9 pid=80454) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP15 pid=80460) return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) def forward( | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP14 pid=80459) self.run() | |
| (EngineCore_DP9 pid=80454) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP8 pid=80453) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP11 pid=80456) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP15 pid=80460) model_output = model( | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP15 pid=80460) ^^^^^^ | |
| (EngineCore_DP9 pid=80454) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP15 pid=80460) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) graph_output = inductor_compiled_graph(list_args) | |
| return forward_call(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP8 pid=80453) process_chunk(chunk_start, | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP11 pid=80456) self.run() | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP14 pid=80459) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP9 pid=80454) return self.current_callable(inputs) | |
| (EngineCore_DP10 pid=80455) return fn(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_9/inductor_cache/vx/cvxnnlhpaoc5bn2pc4pnxyijhhriljqjrszhf3y5mduog5t66joz.py", line 620, in call | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP4 pid=82745) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) raise e | |
| (EngineCore_DP1 pid=82742) self.run() | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP1 pid=82742) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP7 pid=82748) return fn(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return fn(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP4 pid=82745) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP1 pid=82742) return func(*args, **kwargs) | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP7 pid=82748) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) raise e | |
| (EngineCore_DP7 pid=82748) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) Traceback (most recent call last): | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP7 pid=82748) raise e | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP4 pid=82745) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP1 pid=82742) model_output = model( | |
| (EngineCore_DP1 pid=82742) ^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP7 pid=82748) self.run() | |
| (EngineCore_DP3 pid=82744) return self._op(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP1 pid=82742) model_output = self.forward(*args, **kwargs) | |
| Exception in thread Thread-252 (_ubatch_thread): | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) Traceback (most recent call last): | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP3 pid=82744) self.impl.forward(self, | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) def forward( | |
| (EngineCore_DP7 pid=82748) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP4 pid=82745) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) _ = torch.empty( | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP1 pid=82742) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP4 pid=82745) self.run() | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP3 pid=82744) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return self.current_callable(inputs) | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_3/inductor_cache/2b/c2bbqbchd5dnq5st3dfruy4bsquispr76wkjkfpu3aevmgxjjrer.py", line 620, in call | |
| (EngineCore_DP4 pid=82745) File "<eval_with_key>.127", line 696, in forward | |
| model_output = model( | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP3 pid=82744) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP1 pid=82742) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return self._op(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP7 pid=82748) def forward( | |
| (EngineCore_DP4 pid=82745) model_output = model( | |
| (EngineCore_DP4 pid=82745) ^^^^^^ | |
| (EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP3 pid=82744) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP7 pid=82748) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP3 pid=82744) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP1 pid=82742) return fn(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP7 pid=82748) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP4 pid=82745) raise e | |
| (EngineCore_DP1 pid=82742) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP3 pid=82744) process_chunk(chunk_start, | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) def forward( | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP3 pid=82744) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP4 pid=82745) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP7 pid=82748) raise e | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP3 pid=82744) result = self.fused_experts( | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP4 pid=82745) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) raise e | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP1 pid=82742) Traceback (most recent call last): | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP4 pid=82745) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP3 pid=82744) return forward_call(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP7 pid=82748) return fn(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) dbo_register_recv_hook(hook) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP3 pid=82744) next_ctx.recv_hook = recv_hook | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP1 pid=82742) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP1 pid=82742) self.run() | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP4 pid=82745) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP1 pid=82742) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP1 pid=82742) return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) raise e | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP4 pid=82745) return fn(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) model_output = model( | |
| (EngineCore_DP1 pid=82742) ^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP4 pid=82745) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP1 pid=82742) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| return self._op(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) def forward( | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP1 pid=82742) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) raise e | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP7 pid=82748) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP1 pid=82742) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) self.impl.forward(self, | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP7 pid=82748) return self._op(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) raise e | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP1 pid=82742) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP7 pid=82748) self.impl.forward(self, | |
| File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP7 pid=82748) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) _ = torch.empty( | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.11 GiB is free. Including non-PyTorch memory, this process has 176.22 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^(EngineCore_DP15 pid=80460) def forward( | |
| (EngineCore_DP12 pid=80457) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) raise e | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| ^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP14 pid=80459) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP9 pid=80454) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| _ = torch.empty( | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP3 pid=82744) Process EngineCore_DP3: | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP15 pid=80460) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) result = self.fused_experts( | |
| (EngineCore_DP11 pid=80456) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) return self._op(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) process_chunk(chunk_start, | |
| (EngineCore_DP14 pid=80459) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) return self._wrapped_call(self, *args, **kwargs) | |
| return self._wrapped_call(self, *args, **kwargs) | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) model_output = model( | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/d(EngineCore_DP1 pid=82742) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| ist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP14 pid=80459) ^^^^^^ | |
| (EngineCore_DP8 pid=80453) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP4 pid=82745) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP7 pid=82748) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP4 pid=82745) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP4 pid=82745) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP14 pid=80459) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) raise e | |
| (EngineCore_DP9 pid=80454) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP7 pid=82748) return self.current_callable(inputs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_7/inductor_cache/ux/cux7bnmqblvexqg4t7swsrwo2moahaum3v4v5g23ozklv4odtnlt.py", line 620, in call | |
| (EngineCore_DP3 pid=82744) Traceback (most recent call last): | |
| (EngineCore_DP4 pid=82745) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP12 pid=80457) return forward_call(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP7 pid=82748) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) return self.current_callable(inputs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP4 pid=82745) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_4/inductor_cache/2i/c2ivbxlqmiln2bmllilb3v5ciyvgfqx57dyjnno4k3sf27ctcs4b.py", line 620, in call | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP3 pid=82744) self.run() | |
| (EngineCore_DP3 pid=82744) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP3 pid=82744) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP3 pid=82744) raise e | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP3 pid=82744) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP3 pid=82744) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP4 pid=82745) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP3 pid=82744) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP3 pid=82744) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP7 pid=82748) return self._op(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP3 pid=82744) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP3 pid=82744) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP3 pid=82744) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP3 pid=82744) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP3 pid=82744) self.model_runner._dummy_run( | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP3 pid=82744) return func(*args, **kwargs) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP3 pid=82744) outputs = self.model( | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP3 pid=82744) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP3 pid=82744) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP3 pid=82744) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP3 pid=82744) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP1 pid=82742) return fn(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP4 pid=82745) return self._op(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP7 pid=82748) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP4 pid=82745) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP7 pid=82748) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP1 pid=82742) return self._op(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP1 pid=82742) raise e | |
| (EngineCore_DP4 pid=82745) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP7 pid=82748) process_chunk(chunk_start, | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP1 pid=82742) self.impl.forward(self, | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP13 pid=80458) result = self.fused_experts( | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) def forward( | |
| (EngineCore_DP8 pid=80453) dbo_register_recv_hook(hook) | |
| return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP9 pid=80454) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP12 pid=80457) return fn(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP8 pid=80453) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP13 pid=80458) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP1 pid=82742) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP4 pid=82745) process_chunk(chunk_start, | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP7 pid=82748) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP7 pid=82748) result = self.fused_experts( | |
| (EngineCore_DP4 pid=82745) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP1 pid=82742) _ = torch.empty( | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.11 GiB is free. Including non-PyTorch memory, this process has 176.22 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP4 pid=82745) result = self.fused_experts( | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP7 pid=82748) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP4 pid=82745) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP1 pid=82742) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) return forward_call(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP4 pid=82745) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) dbo_register_recv_hook(hook) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP7 pid=82748) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP4 pid=82745) dbo_register_recv_hook(hook) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP4 pid=82745) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP1 pid=82742) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP1 pid=82742) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP1 pid=82742) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP1 pid=82742) return self.current_callable(inputs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_1/inductor_cache/bt/cbtdwmyu445dkgu57fxtufiuuij3nakbvycwmdarkck4zd3qsxwi.py", line 620, in call | |
| (EngineCore_DP1 pid=82742) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP1 pid=82742) return self._op(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP1 pid=82742) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP1 pid=82742) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP1 pid=82742) process_chunk(chunk_start, | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP1 pid=82742) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP1 pid=82742) result = self.fused_experts( | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP1 pid=82742) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) Process EngineCore_DP7: | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP10 pid=80455) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^ | |
| raise e | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP9 pid=80454) process_chunk(chunk_start, | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP13 pid=80458) return forward_call(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP15 pid=80460) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP12 pid=80457) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP10 pid=80455) Exception in thread Thread-252 (_ubatch_thread): | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP13 pid=80458) dbo_register_recv_hook(hook) | |
| (EngineCore_DP9 pid=80454) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP7 pid=82748)(EngineCore_DP1 pid=82742) return forward_call(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) Process EngineCore_DP4: | |
| ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP4 pid=82745)(EngineCore_DP1 pid=82742) dbo_register_recv_hook(hook) | |
| ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP11 pid=80456) model_output = model( | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP11 pid=80456) ^^^^^^ | |
| (EngineCore_DP12 pid=80457) raise e | |
| return fn(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP13 pid=80458) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| raise e | |
| (EngineCore_DP9 pid=80454) result = self.fused_experts( | |
| (EngineCore_DP11 pid=80456) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP10 pid=80455) Traceback (most recent call last): | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP1 pid=82742) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP7 pid=82748) Traceback (most recent call last): | |
| (EngineCore_DP4 pid=82745) Traceback (most recent call last): | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP7 pid=82748) self.run() | |
| (EngineCore_DP7 pid=82748) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP7 pid=82748) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP7 pid=82748) raise e | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP7 pid=82748) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP7 pid=82748) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP7 pid=82748) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP7 pid=82748) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP7 pid=82748) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP7 pid=82748) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner | |
| (EngineCore_DP14 pid=80459) raise e | |
| (EngineCore_DP12 pid=80457) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP9 pid=80454) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) self.run() | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/threading.py", line 1012, in run | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| def forward( | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP8 pid=80453)(EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP9 pid=80454) return forward_call(*args, **kwargs) | |
| ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP7 pid=82748) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP7 pid=82748) return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP7 pid=82748) self.model_runner._dummy_run( | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP7 pid=82748) return func(*args, **kwargs) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP7 pid=82748) outputs = self.model( | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP7 pid=82748) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP7 pid=82748) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP7 pid=82748) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP7 pid=82748) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP4 pid=82745) self.run() | |
| (EngineCore_DP4 pid=82745) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP4 pid=82745) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP4 pid=82745) raise e | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP4 pid=82745) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP4 pid=82745) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP4 pid=82745) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP4 pid=82745) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP4 pid=82745) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP4 pid=82745) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP4 pid=82745) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP4 pid=82745) return func(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP4 pid=82745) self.model_runner._dummy_run( | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP4 pid=82745) return func(*args, **kwargs) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP4 pid=82745) outputs = self.model( | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP4 pid=82745) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP4 pid=82745) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP4 pid=82745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP4 pid=82745) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP1 pid=82742) Process EngineCore_DP1: | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP1 pid=82742) Traceback (most recent call last): | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP1 pid=82742) self.run() | |
| (EngineCore_DP1 pid=82742) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP1 pid=82742) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP1 pid=82742) raise e | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP1 pid=82742) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP1 pid=82742) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP1 pid=82742) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP1 pid=82742) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP1 pid=82742) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP1 pid=82742) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP1 pid=82742) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP1 pid=82742) return func(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP1 pid=82742) self.model_runner._dummy_run( | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP1 pid=82742) return func(*args, **kwargs) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP1 pid=82742) outputs = self.model( | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP1 pid=82742) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP1 pid=82742) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP1 pid=82742) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP1 pid=82742) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP14 pid=80459) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP9 pid=80454) dbo_register_recv_hook(hook) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP10 pid=80455) File "<eval_with_key>.127", line 696, in forward | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| return fn(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP11 pid=80456) raise e | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP15 pid=80460) raise e | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP15 pid=80460) return self._op(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP10 pid=80455) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP15 pid=80460) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP10 pid=80455) return func(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 234, in _ubatch_thread | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) raise e | |
| self.impl.forward(self, | |
| return forward_call(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP9 pid=80454)(EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP12 pid=80457) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| ^^^ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP12 pid=80457) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP10 pid=80455) model_output = model( | |
| (EngineCore_DP10 pid=80455) ^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 317, in __call__ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP15 pid=80460) _ = torch.empty( | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) raise e | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) return self._op(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP14 pid=80459) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP8 pid=80453) Process EngineCore_DP8: | |
| (EngineCore_DP12 pid=80457) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP11 pid=80456) return fn(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) model_output = self.forward(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) self.impl.forward(self, | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 764, in forward | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP14 pid=80459) return self._op(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP8 pid=80453) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP15 pid=80460) return self.runnable(*args, **kwargs) | |
| def forward( | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| _ = torch.empty( | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) self.impl.forward(self, | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP15 pid=80460) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 375, in __call__ | |
| (EngineCore_DP13 pid=80458) Process EngineCore_DP13: | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "<eval_with_key>.5", line 5, in forward | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.23 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP15 pid=80460) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP8 pid=80453) self.run() | |
| (EngineCore_DP8 pid=80453) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP8 pid=80453) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP8 pid=80453) raise e | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) raise e | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP8 pid=80453) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP8 pid=80453) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP8 pid=80453) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP12 pid=80457) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP8 pid=80453) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP10 pid=80455) return super().__call__(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP8 pid=80453) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP8 pid=80453) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP8 pid=80453) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP8 pid=80453) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) self.model_runner._dummy_run( | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP8 pid=80453) return func(*args, **kwargs) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP8 pid=80453) outputs = self.model( | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) Process EngineCore_DP9: | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP8 pid=80453) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP8 pid=80453) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP8 pid=80453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP8 pid=80453) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP15 pid=80460) return self.current_callable(inputs) | |
| (EngineCore_DP11 pid=80456) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP15 pid=80460) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_15/inductor_cache/p6/cp6ogimz7bw7ycjdndhhjmwbrtrhv7xzscjoes2b3nls6mysw74h.py", line 620, in call | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__ | |
| (EngineCore_DP14 pid=80459) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP15 pid=80460) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP12 pid=80457) return self.current_callable(inputs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_12/inductor_cache/y7/cy7itu4tzfawlb2w2u6yidjl4zzuuouuxfmoncvfp6omfez2ocof.py", line 620, in call | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP15 pid=80460) return self._op(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP13 pid=80458) Traceback (most recent call last): | |
| (EngineCore_DP14 pid=80459) _ = torch.empty( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP12 pid=80457) return self._op(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP10 pid=80455) return self._op(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) return self.forward_impl(hidden_states, router_logits) | |
| return self._op(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP13 pid=80458) self.run() | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP13 pid=80458) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP13 pid=80458) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP13 pid=80458) raise e | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 611, in unified_attention_with_output | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP13 pid=80458) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP13 pid=80458) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP13 pid=80458) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP13 pid=80458) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP13 pid=80458) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP13 pid=80458) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP13 pid=80458) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP13 pid=80458) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP13 pid=80458) self.model_runner._dummy_run( | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP13 pid=80458) return func(*args, **kwargs) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) outputs = self.model( | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP14 pid=80459) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP13 pid=80458) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP13 pid=80458) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP13 pid=80458) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP13 pid=80458) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP13 pid=80458) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP9 pid=80454) self.run() | |
| (EngineCore_DP9 pid=80454) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP14 pid=80459) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP9 pid=80454) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP9 pid=80454) raise e | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP9 pid=80454) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP9 pid=80454) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP9 pid=80454) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP9 pid=80454) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP9 pid=80454) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP9 pid=80454) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP9 pid=80454) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP9 pid=80454) return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP9 pid=80454) self.model_runner._dummy_run( | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP9 pid=80454) return func(*args, **kwargs) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP9 pid=80454) outputs = self.model( | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP9 pid=80454) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP9 pid=80454) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP9 pid=80454) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP9 pid=80454) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP10 pid=80455) self.impl.forward(self, | |
| (EngineCore_DP9 pid=80454) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) return self.current_callable(inputs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP15 pid=80460) process_chunk(chunk_start, | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 929, in _fn | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_14/inductor_cache/mb/cmbxxvzn32p3patwd35docgkqlxuxkytbvg5xhyg6lcv6p73heqf.py", line 620, in call | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP14 pid=80459) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) process_chunk(chunk_start, | |
| (EngineCore_DP15 pid=80460) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP14 pid=80459) return self._op(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) _ = torch.empty( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) result = self.fused_experts( | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) self.impl.forward(self, | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 1537, in forward | |
| (EngineCore_DP12 pid=80457) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP15 pid=80460) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) result = self.fused_experts( | |
| (EngineCore_DP10 pid=80455) return fn(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 848, in call_wrapped | |
| (EngineCore_DP14 pid=80459) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP10 pid=80455) return self._wrapped_call(self, *args, **kwargs) | |
| (EngineCore_DP15 pid=80460) return forward_call(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) _ = torch.empty( | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 424, in __call__ | |
| (EngineCore_DP11 pid=80456) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 178.36 GiB of which 2.10 GiB is free. Including non-PyTorch memory, this process has 176.24 GiB memory in use. Of the allocated memory 145.20 GiB is allocated by PyTorch, with 2.14 GiB allocated in private pools (e.g., CUDA Graphs), and 8.50 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) | |
| (EngineCore_DP14 pid=80459) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP11 pid=80456) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP15 pid=80460) dbo_register_recv_hook(hook) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP10 pid=80455) raise e | |
| (EngineCore_DP12 pid=80457) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 411, in __call__ | |
| (EngineCore_DP15 pid=80460) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP15 pid=80460) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP11 pid=80456) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP10 pid=80455) return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP14 pid=80459) process_chunk(chunk_start, | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP12 pid=80457) dbo_register_recv_hook(hook) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| 6m(EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP12 pid=80457) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP11 pid=80456) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP14 pid=80459) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP11 pid=80456) return self.current_callable(inputs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_11/inductor_cache/hq/chqj27nb2yidxldowt52rr42qhd653iywffvped725a72sqftflk.py", line 620, in call | |
| (EngineCore_DP14 pid=80459) result = self.fused_experts( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "<eval_with_key>.127", line 718, in forward | |
| (EngineCore_DP11 pid=80456) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 121, in __call__ | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) return self.runnable(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP14 pid=80459) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 96, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) return self.compiled_graph_for_general_shape(*args) | |
| (EngineCore_DP11 pid=80456) return self._op(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 518, in compiled_graph | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP10 pid=80455) graph_output = inductor_compiled_graph(list_args) | |
| (EngineCore_DP14 pid=80459) return forward_call(*args, **kwargs) | |
| 6m(EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 584, in __call__ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP10 pid=80455) return self.current_callable(inputs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/root/.cache/vllm/torch_compile_cache/2256bad88c/rank_0_10/inductor_cache/qv/cqvawgu5evibcb7arqadczzmaealan5bfpkaliplmhqdfjdxzp6x.py", line 620, in call | |
| (EngineCore_DP14 pid=80459) dbo_register_recv_hook(hook) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP11 pid=80456) return self.forward_impl(hidden_states, router_logits) | |
| (EngineCore_DP14 pid=80459) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP14 pid=80459) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP10 pid=80455) buf5 = torch.ops.vllm.moe_forward_shared.default(buf3, buf4, 'model.layers.3.mlp.experts') | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__ | |
| (EngineCore_DP10 pid=80455) return self._op(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2163, in moe_forward_shared | |
| (EngineCore_DP11 pid=80456) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP10 pid=80455) return self.forward_impl(hidden_states, router_logits) | |
| 6m(EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1998, in forward_impl | |
| (EngineCore_DP11 pid=80456) process_chunk(chunk_start, | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] EngineCore failed to start. | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] Traceback (most recent call last): | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] self.model_runner._dummy_run( | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] return func(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] outputs = self.model( | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) ERROR 09-26 08:51:47 [core.py:708] RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP10 pid=80455) return self.forward_impl_chunked(hidden_states, router_logits) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1971, in forward_impl_chunked | |
| (EngineCore_DP11 pid=80456) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP12 pid=80457) Process EngineCore_DP12: | |
| (EngineCore_DP11 pid=80456) result = self.fused_experts( | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP10 pid=80455) process_chunk(chunk_start, | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1903, in process_chunk | |
| (EngineCore_DP15 pid=80460) Process EngineCore_DP15: | |
| (EngineCore_DP11 pid=80456) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP10 pid=80455) final_hidden_states = self.quant_method.apply( | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 1036, in apply | |
| (EngineCore_DP10 pid=80455) result = self.fused_experts( | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
| (EngineCore_DP11 pid=80456) return forward_call(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP12 pid=80457) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) dbo_register_recv_hook(hook) | |
| (EngineCore_DP15 pid=80460) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP10 pid=80455) return self._call_impl(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
| (EngineCore_DP11 pid=80456) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP12 pid=80457) self.run() | |
| (EngineCore_DP12 pid=80457) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP12 pid=80457) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP12 pid=80457) raise e | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP12 pid=80457) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP12 pid=80457) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP12 pid=80457) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP12 pid=80457) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP12 pid=80457) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP12 pid=80457) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP12 pid=80457) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP12 pid=80457) return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP12 pid=80457) self.model_runner._dummy_run( | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP12 pid=80457) return func(*args, **kwargs) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP12 pid=80457) outputs = self.model( | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP12 pid=80457) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP12 pid=80457) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP12 pid=80457) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP12 pid=80457) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP15 pid=80460) self.run() | |
| (EngineCore_DP15 pid=80460) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP15 pid=80460) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP15 pid=80460) raise e | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP15 pid=80460) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) return forward_call(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP15 pid=80460) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP15 pid=80460) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP15 pid=80460) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP15 pid=80460) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP15 pid=80460) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1027, in forward | |
| (EngineCore_DP15 pid=80460) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP15 pid=80460) return func(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP15 pid=80460) self.model_runner._dummy_run( | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP15 pid=80460) return func(*args, **kwargs) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP15 pid=80460) outputs = self.model( | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP15 pid=80460) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP15 pid=80460) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP15 pid=80460) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP15 pid=80460) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP14 pid=80459) Process EngineCore_DP14: | |
| (EngineCore_DP10 pid=80455) dbo_register_recv_hook(hook) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/ubatching.py", line 184, in dbo_register_recv_hook | |
| (EngineCore_DP10 pid=80455) next_ctx.recv_hook = recv_hook | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) AttributeError: 'NoneType' object has no attribute 'recv_hook' | |
| (EngineCore_DP14 pid=80459) Traceback (most recent call last): | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP14 pid=80459) self.run() | |
| (EngineCore_DP14 pid=80459) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP14 pid=80459) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP14 pid=80459) raise e | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP14 pid=80459) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP14 pid=80459) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP14 pid=80459) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP14 pid=80459) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP14 pid=80459) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP14 pid=80459) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP14 pid=80459) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP14 pid=80459) return func(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP14 pid=80459) self.model_runner._dummy_run( | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP14 pid=80459) return func(*args, **kwargs) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP14 pid=80459) outputs = self.model( | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP14 pid=80459) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP14 pid=80459) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP14 pid=80459) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP14 pid=80459) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP11 pid=80456) Process EngineCore_DP11: | |
| (EngineCore_DP10 pid=80455) Process EngineCore_DP10: | |
| (EngineCore_DP11 pid=80456) Traceback (most recent call last): | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP11 pid=80456) self.run() | |
| (EngineCore_DP11 pid=80456) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP11 pid=80456) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP11 pid=80456) raise e | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP11 pid=80456) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP11 pid=80456) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP11 pid=80456) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP11 pid=80456) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP11 pid=80456) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP11 pid=80456) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP11 pid=80456) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP11 pid=80456) return func(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP11 pid=80456) self.model_runner._dummy_run( | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP11 pid=80456) return func(*args, **kwargs) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP11 pid=80456) outputs = self.model( | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP11 pid=80456) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP11 pid=80456) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP11 pid=80456) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP11 pid=80456) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| (EngineCore_DP10 pid=80455) Traceback (most recent call last): | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
| (EngineCore_DP10 pid=80455) self.run() | |
| (EngineCore_DP10 pid=80455) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run | |
| (EngineCore_DP10 pid=80455) self._target(*self._args, **self._kwargs) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core | |
| (EngineCore_DP10 pid=80455) raise e | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 695, in run_engine_core | |
| (EngineCore_DP10 pid=80455) engine_core = DPEngineCoreProc(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 965, in __init__ | |
| (EngineCore_DP10 pid=80455) super().__init__(vllm_config, local_client, handshake_address, | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__ | |
| (EngineCore_DP10 pid=80455) super().__init__(vllm_config, executor_class, log_stats, | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 92, in __init__ | |
| (EngineCore_DP10 pid=80455) self._initialize_kv_caches(vllm_config) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches | |
| (EngineCore_DP10 pid=80455) self.model_executor.initialize_from_config(kv_cache_configs) | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in initialize_from_config | |
| (EngineCore_DP10 pid=80455) self.collective_rpc("compile_or_warm_up_model") | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc | |
| (EngineCore_DP10 pid=80455) return [run_method(self.driver_worker, method, args, kwargs)] | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3120, in run_method | |
| (EngineCore_DP10 pid=80455) return func(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 406, in compile_or_warm_up_model | |
| (EngineCore_DP10 pid=80455) self.model_runner._dummy_run( | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
| (EngineCore_DP10 pid=80455) return func(*args, **kwargs) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3132, in _dummy_run | |
| (EngineCore_DP10 pid=80455) outputs = self.model( | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 387, in __call__ | |
| (EngineCore_DP10 pid=80455) return self._run_ubatches(ubatch_metadata, self.model) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_ubatch_wrapper.py", line 263, in _run_ubatches | |
| (EngineCore_DP10 pid=80455) result = torch.cat(sorted_results, dim=0) | |
| (EngineCore_DP10 pid=80455) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP10 pid=80455) RuntimeError: torch.cat(): expected a non-empty list of Tensors | |
| [rank15]:[W926 08:51:48.063156927 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank3]:[W926 08:51:48.413411483 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank8]:[W926 08:51:48.123121933 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank5]:[W926 08:51:48.452768152 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank2]:[W926 08:51:48.464782528 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank9]:[W926 08:51:49.186449406 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank0]:[W926 08:51:49.513193401 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank7]:[W926 08:51:49.546589145 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank10]:[W926 08:51:49.231472370 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank6]:[W926 08:51:49.594014921 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank1]:[W926 08:51:49.608585279 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank12]:[W926 08:51:49.295293051 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank4]:[W926 08:51:49.630535240 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank11]:[W926 08:51:49.417741295 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank13]:[W926 08:51:49.458375329 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [rank14]:[W926 08:51:49.475619028 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| (periodic-mono) mac :: ~/dist/gcp_benchmarking % E0926 08:51:52.719141 80453 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out. | |
| E0926 08:51:52.711594 82744 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out. | |
| E0926 08:51:52.785455 82741 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment