SOCI vs Standard Time to Inference Performance Test Results

Test Environment

Cluster: <cluster-name>
Instance Type: g6e.2xlarge
Source Image: public.ecr.aws/aws-containers/aiml/ray-2.43.0-py311-vllm0.7.3:latest
Test Image: <account-id>.dkr.ecr.<region>.amazonaws.com/ray-vllm-soci:latest (source image with SOCI index added and pushed to private ECR)
Image Size: 7,976,980,513 bytes (~7.4 GB)

Ray Configuration Context

The performance tests were conducted with the following Ray setup:

Model Storage: Mistral-7B model pre-downloaded on FSx for Lustre filesystem
Ray Head Node: Deployed on managed node group (non-GPU nodes with image already cached)
Ray Worker Nodes: GPU nodes provisioned by Karpenter during test execution
Image Pull Focus: Tests specifically measure GPU worker node provisioning and large container image pull times

This configuration isolates the performance impact of SOCI on GPU worker node startup, as the non-GPU nodes and model files are already available.

Standard (Non-SOCI) Results

Test Date: Mon Aug 11 19:38:00 CDT 2025 (Clean run - no scheduling issues)

Timing Breakdown

Phase	Time	Notes
Head Pod Startup	6s	Ray head pod initialization
Node Provisioning	43s	Karpenter GPU node creation
Image Pull	282s	Standard Docker pull
Service Initialization	63s	Ray service becoming ready
Inference Readiness	4s	First successful inference
Total Time to Inference	401s	Complete time to inference

Inference Test

{
  "id": "chatcmpl-35a287e4-99fd-41b2-bc33-7606d1d41e56",
  "object": "chat.completion",
  "created": 1754957852,
  "model": "/models/mistral-7b-v0-3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": " Hello! How can I help you today? Is"
    },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 14,
    "completion_tokens": 10
  }
}

SOCI Results

Test Date: Mon Aug 11 19:29:06 CDT 2025

Timing Breakdown

Phase	Time	Notes
Head Pod Startup	6s	Ray head pod initialization
Node Provisioning	53s	Karpenter GPU node creation
Image Pull	46s	SOCI lazy loading
Service Initialization	104s	Ray service becoming ready
Inference Readiness	5s	First successful inference
Total Time to Inference	216s	Complete time to inference

Inference Test

{
  "id": "chatcmpl-5b271c74-a94f-47a4-a070-dff660db0f66",
  "object": "chat.completion",
  "created": 1754958759,
  "model": "/models/mistral-7b-v0-3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": " Hello! How can I help you today? Is"
    },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 14,
    "completion_tokens": 10
  }
}

Performance Comparison

Metric	SOCI	Standard	Improvement
Image Pull Time	46s	282s	🚀 84% faster
Node Provisioning	53s	43s	-10s
Service Initialization	104s	63s	-41s
Total Time to Inference	216s	401s	🎯 46% faster

Key Findings

🎯 SOCI achieved 84% faster image pull time: 46s vs 282s
⚡ Overall time to inference 46% faster: 216s vs 401s
📦 Same 7.4GB image size pulled in both tests

SOCI Benefits

Lazy Loading: Only pulls required layers initially
Faster Container Startup: Reduced time to running state
Network Efficiency: Less data transfer during startup
Consistent Performance: No scheduling issues encountered

lusoal/soci-lazy-loading-performance-test.md

Select an option

No results found

Select an option

No results found

SOCI vs Standard Time to Inference Performance Test Results

Test Environment

Ray Configuration Context

Standard (Non-SOCI) Results

Timing Breakdown

Inference Test

SOCI Results

Timing Breakdown

Inference Test

Performance Comparison

Key Findings

SOCI Benefits