- Cluster:
<cluster-name> - Instance Type: g6e.2xlarge
- Source Image:
public.ecr.aws/aws-containers/aiml/ray-2.43.0-py311-vllm0.7.3:latest - Test Image:
<account-id>.dkr.ecr.<region>.amazonaws.com/ray-vllm-soci:latest(source image with SOCI index added and pushed to private ECR) - Image Size: 7,976,980,513 bytes (~7.4 GB)
The performance tests were conducted with the following Ray setup:
- Model Storage: Mistral-7B model pre-downloaded on FSx for Lustre filesystem
- Ray Head Node: Deployed on managed node group (non-GPU nodes with image already cached)
- Ray Worker Nodes: GPU nodes provisioned by Karpenter during test execution
- Image Pull Focus: Tests specifically measure GPU worker node provisioning and large container image pull times
This configuration isolates the performance impact of SOCI on GPU worker node startup, as the non-GPU nodes and model files are already available.
Test Date: Mon Aug 11 19:38:00 CDT 2025 (Clean run - no scheduling issues)
| Phase | Time | Notes |
|---|---|---|
| Head Pod Startup | 6s | Ray head pod initialization |
| Node Provisioning | 43s | Karpenter GPU node creation |
| Image Pull | 282s | Standard Docker pull |
| Service Initialization | 63s | Ray service becoming ready |
| Inference Readiness | 4s | First successful inference |
| Total Time to Inference | 401s | Complete time to inference |
{
"id": "chatcmpl-35a287e4-99fd-41b2-bc33-7606d1d41e56",
"object": "chat.completion",
"created": 1754957852,
"model": "/models/mistral-7b-v0-3",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! How can I help you today? Is"
},
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 4,
"total_tokens": 14,
"completion_tokens": 10
}
}Test Date: Mon Aug 11 19:29:06 CDT 2025
| Phase | Time | Notes |
|---|---|---|
| Head Pod Startup | 6s | Ray head pod initialization |
| Node Provisioning | 53s | Karpenter GPU node creation |
| Image Pull | 46s | SOCI lazy loading |
| Service Initialization | 104s | Ray service becoming ready |
| Inference Readiness | 5s | First successful inference |
| Total Time to Inference | 216s | Complete time to inference |
{
"id": "chatcmpl-5b271c74-a94f-47a4-a070-dff660db0f66",
"object": "chat.completion",
"created": 1754958759,
"model": "/models/mistral-7b-v0-3",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! How can I help you today? Is"
},
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 4,
"total_tokens": 14,
"completion_tokens": 10
}
}| Metric | SOCI | Standard | Improvement |
|---|---|---|---|
| Image Pull Time | 46s | 282s | π 84% faster |
| Node Provisioning | 53s | 43s | -10s |
| Service Initialization | 104s | 63s | -41s |
| Total Time to Inference | 216s | 401s | π― 46% faster |
π― SOCI achieved 84% faster image pull time: 46s vs 282s
β‘ Overall time to inference 46% faster: 216s vs 401s
π¦ Same 7.4GB image size pulled in both tests
- Lazy Loading: Only pulls required layers initially
- Faster Container Startup: Reduced time to running state
- Network Efficiency: Less data transfer during startup
- Consistent Performance: No scheduling issues encountered