Skip to content

Instantly share code, notes, and snippets.

@lusoal
Created August 12, 2025 13:19
Show Gist options
  • Select an option

  • Save lusoal/71fcc9935998a8302c5eadda0b1e94e6 to your computer and use it in GitHub Desktop.

Select an option

Save lusoal/71fcc9935998a8302c5eadda0b1e94e6 to your computer and use it in GitHub Desktop.

SOCI vs Standard Time to Inference Performance Test Results

Test Environment

  • Cluster: <cluster-name>
  • Instance Type: g6e.2xlarge
  • Source Image: public.ecr.aws/aws-containers/aiml/ray-2.43.0-py311-vllm0.7.3:latest
  • Test Image: <account-id>.dkr.ecr.<region>.amazonaws.com/ray-vllm-soci:latest (source image with SOCI index added and pushed to private ECR)
  • Image Size: 7,976,980,513 bytes (~7.4 GB)

Ray Configuration Context

The performance tests were conducted with the following Ray setup:

  • Model Storage: Mistral-7B model pre-downloaded on FSx for Lustre filesystem
  • Ray Head Node: Deployed on managed node group (non-GPU nodes with image already cached)
  • Ray Worker Nodes: GPU nodes provisioned by Karpenter during test execution
  • Image Pull Focus: Tests specifically measure GPU worker node provisioning and large container image pull times

This configuration isolates the performance impact of SOCI on GPU worker node startup, as the non-GPU nodes and model files are already available.

Standard (Non-SOCI) Results

Test Date: Mon Aug 11 19:38:00 CDT 2025 (Clean run - no scheduling issues)

Timing Breakdown

Phase Time Notes
Head Pod Startup 6s Ray head pod initialization
Node Provisioning 43s Karpenter GPU node creation
Image Pull 282s Standard Docker pull
Service Initialization 63s Ray service becoming ready
Inference Readiness 4s First successful inference
Total Time to Inference 401s Complete time to inference

Inference Test

{
  "id": "chatcmpl-35a287e4-99fd-41b2-bc33-7606d1d41e56",
  "object": "chat.completion",
  "created": 1754957852,
  "model": "/models/mistral-7b-v0-3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": " Hello! How can I help you today? Is"
    },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 14,
    "completion_tokens": 10
  }
}

SOCI Results

Test Date: Mon Aug 11 19:29:06 CDT 2025

Timing Breakdown

Phase Time Notes
Head Pod Startup 6s Ray head pod initialization
Node Provisioning 53s Karpenter GPU node creation
Image Pull 46s SOCI lazy loading
Service Initialization 104s Ray service becoming ready
Inference Readiness 5s First successful inference
Total Time to Inference 216s Complete time to inference

Inference Test

{
  "id": "chatcmpl-5b271c74-a94f-47a4-a070-dff660db0f66",
  "object": "chat.completion",
  "created": 1754958759,
  "model": "/models/mistral-7b-v0-3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": " Hello! How can I help you today? Is"
    },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 14,
    "completion_tokens": 10
  }
}

Performance Comparison

Metric SOCI Standard Improvement
Image Pull Time 46s 282s πŸš€ 84% faster
Node Provisioning 53s 43s -10s
Service Initialization 104s 63s -41s
Total Time to Inference 216s 401s 🎯 46% faster

Key Findings

🎯 SOCI achieved 84% faster image pull time: 46s vs 282s
⚑ Overall time to inference 46% faster: 216s vs 401s
πŸ“¦ Same 7.4GB image size pulled in both tests

SOCI Benefits

  • Lazy Loading: Only pulls required layers initially
  • Faster Container Startup: Reduced time to running state
  • Network Efficiency: Less data transfer during startup
  • Consistent Performance: No scheduling issues encountered
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment