EKS Karpenter with G6f Fractional GPU - Complete Setup Guide

Complete configuration for running GPU workloads on AWS EKS with Karpenter v1.8 and G6f fractional GPU instances using multiple NodePools for dynamic instance selection.

Prerequisites

EKS cluster (v1.34+) already created
kubectl configured to access the cluster
helm installed
AWS CLI configured
Cluster has OIDC provider enabled

Architecture Overview

This setup uses multiple NodePools to dynamically select G6f instance types based on workload GPU memory requirements:

Small GPU (3GB): g6f.large, g6f.xlarge → 1/8 fractional GPU
Medium GPU (6GB): g6f.2xlarge → 1/4 fractional GPU
Large GPU (12GB): g6f.4xlarge, gr6f.4xlarge → 1/2 fractional GPU

Workloads use node selectors to target specific GPU sizes, and Karpenter provisions the appropriate instance type.

Important: NodeOverlay is required because AWS reports GPU count as 0 for fractional GPU instances in the EC2 API.

Step 1: Install Karpenter

1.1 Set Environment Variables

export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.8.6"
export CLUSTER_NAME="<your-cluster-name>"
export AWS_DEFAULT_REGION="<your-region>"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export AWS_PARTITION="aws"  # or aws-cn, aws-us-gov

1.2 Create IAM Resources

Download and deploy the CloudFormation template:

curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > karpenter-cloudformation.yaml

aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file karpenter-cloudformation.yaml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

1.3 Create Service-Linked Role for Spot

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true

1.4 Install Karpenter via Helm

# Logout of helm registry
helm registry logout public.ecr.aws

# Install Karpenter with NodeOverlay feature enabled
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace "${KARPENTER_NAMESPACE}" \
  --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set "settings.featureGates.nodeOverlay=true" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

Note: nodeOverlay=true is required for fractional GPU support.

1.5 Verify Karpenter Installation

kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller --tail=20

Step 2: Install NVIDIA Device Plugin (AWS Recommended)

2.1 Add Helm Repository

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

2.2 Install NVIDIA Device Plugin with GPU Feature Discovery

helm install nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia \
  --create-namespace \
  --version 0.18.2 \
  --set gfd.enabled=true \
  --set-json 'nfd.worker.tolerations=[{"key":"nvidia.com/gpu","operator":"Exists","effect":"NoSchedule"},{"key":"node-role.kubernetes.io/master","operator":"Equal","effect":"NoSchedule"}]'

This installs the device plugin with GPU Feature Discovery and configures NFD worker to tolerate GPU node taints.

2.3 Verify Device Plugin

kubectl get pods -n nvidia
kubectl get daemonset -n nvidia

Step 3: Create NodeOverlay for G6f Fractional GPUs

Why NodeOverlay is Required: AWS EC2 API reports GPU count as 0 for fractional GPU instances. NodeOverlay tells Karpenter these instances actually have GPU capacity during scheduling simulation.

File: g6f-nodeoverlay.yaml

# NodeOverlay for G6f fractional GPU instances
# REQUIRED: AWS reports GPU count as 0 for fractional GPUs
# This tells Karpenter these instances have GPU capacity

apiVersion: karpenter.sh/v1alpha1
kind: NodeOverlay
metadata:
  name: g6f-fractional-gpu
spec:
  weight: 100
  requirements:
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - "g6f.large"
        - "g6f.xlarge"
        - "g6f.2xlarge"
        - "g6f.4xlarge"
        - "gr6f.4xlarge"
  capacity:
    # Critical: Override AWS's "0" GPU count
    nvidia.com/gpu: "1"

Apply the NodeOverlay configuration:

kubectl apply -f g6f-nodeoverlay.yaml
kubectl get nodeoverlay

Verification:

# Check NodeOverlay status
kubectl describe nodeoverlay g6f-fractional-gpu

# Should show Ready=True

Step 4: Create EC2NodeClass for G6f Instances

File: g6f-ec2nodeclass.yaml

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: g6f-gpu
spec:
  # Replace with your Karpenter node role
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  
  # Use GPU-optimized AMI (NVIDIA drivers pre-installed)
  amiSelectorTerms:
    - alias: "al2023@latest"
  amiFamily: AL2023
  
  # Note: The AL2023 GPU AMI includes:
  # - NVIDIA drivers pre-installed and configured
  # - NVIDIA container toolkit configured
  # - Containerd configured for GPU support
  # No custom userData needed for basic GPU functionality
  
  # Subnet and security group discovery
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  
  # Block device configuration
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  
  # Metadata options
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required

Apply with environment variable substitution:

envsubst < g6f-ec2nodeclass.yaml | kubectl apply -f -

Step 5: Create Multiple NodePools for Dynamic Instance Selection

This is the key to dynamic G6f instance selection. Each NodePool targets specific instance types and labels nodes accordingly.

File: g6f-nodepools.yaml

---
# NodePool for Small GPU workloads (1/8 GPU, 3GB)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: g6f-gpu-small
spec:
  template:
    metadata:
      labels:
        gpu-memory-size: "3gb"
        gpu-fraction: "0.125"
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - "g6f.large"
            - "g6f.xlarge"
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: g6f-gpu
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule
      expireAfter: 720h
  limits:
    cpu: 50
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m

---
# NodePool for Medium GPU workloads (1/4 GPU, 6GB)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: g6f-gpu-medium
spec:
  template:
    metadata:
      labels:
        gpu-memory-size: "6gb"
        gpu-fraction: "0.25"
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - "g6f.2xlarge"
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: g6f-gpu
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule
      expireAfter: 720h
  limits:
    cpu: 50
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m

---
# NodePool for Large GPU workloads (1/2 GPU, 12GB)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: g6f-gpu-large
spec:
  template:
    metadata:
      labels:
        gpu-memory-size: "12gb"
        gpu-fraction: "0.5"
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - "g6f.4xlarge"
            - "gr6f.4xlarge"
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: g6f-gpu
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule
      expireAfter: 720h
  limits:
    cpu: 50
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m

Apply the NodePools:

kubectl apply -f g6f-nodepools.yaml
kubectl get nodepool

Step 6: Deploy Sample GPU Workloads

File: sample-gpu-workloads.yaml

---
# Small GPU workload (3GB) - Will provision g6f.large or g6f.xlarge
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-small-inference
spec:
  replicas: 0
  selector:
    matchLabels:
      app: gpu-small-inference
  template:
    metadata:
      labels:
        app: gpu-small-inference
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Target small GPU NodePool
      nodeSelector:
        gpu-memory-size: "3gb"
      containers:
      - name: inference
        image: nvidia/cuda:12.3.0-base-ubuntu22.04
        command: ["sleep", "infinity"]
        resources:
          requests:
            cpu: 2
            memory: 8Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 2
            memory: 8Gi
            nvidia.com/gpu: 1

---
# Medium GPU workload (6GB) - Will provision g6f.2xlarge
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-medium-inference
spec:
  replicas: 0
  selector:
    matchLabels:
      app: gpu-medium-inference
  template:
    metadata:
      labels:
        app: gpu-medium-inference
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Target medium GPU NodePool
      nodeSelector:
        gpu-memory-size: "6gb"
      containers:
      - name: inference
        image: nvidia/cuda:12.3.0-base-ubuntu22.04
        command: ["sleep", "infinity"]
        resources:
          requests:
            cpu: 4
            memory: 16Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 4
            memory: 16Gi
            nvidia.com/gpu: 1

---
# Large GPU workload (12GB) - Will provision g6f.4xlarge or gr6f.4xlarge
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-large-inference
spec:
  replicas: 0
  selector:
    matchLabels:
      app: gpu-large-inference
  template:
    metadata:
      labels:
        app: gpu-large-inference
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Target large GPU NodePool
      nodeSelector:
        gpu-memory-size: "12gb"
      containers:
      - name: inference
        image: nvidia/cuda:12.3.0-base-ubuntu22.04
        command: ["sleep", "infinity"]
        resources:
          requests:
            cpu: 8
            memory: 32Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 8
            memory: 32Gi
            nvidia.com/gpu: 1

Deploy and test:

# Deploy workloads
kubectl apply -f sample-gpu-workloads.yaml

# Test small GPU (3GB)
kubectl scale deployment gpu-small-inference --replicas 1

# Monitor provisioning
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller -f

# Wait for pod to be ready
kubectl wait --for=condition=ready pod -l app=gpu-small-inference --timeout=300s

# Verify GPU
kubectl exec deployment/gpu-small-inference -- nvidia-smi

Expected output:

NVIDIA L4-3Q, 3072 MiB  # 1/8 fractional GPU

Verification Commands

Check Karpenter Status

kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller --tail=50

Check NodeOverlay

kubectl get nodeoverlay
kubectl describe nodeoverlay g6f-fractional-gpu

Check NodePools

kubectl get nodepool
kubectl describe nodepool g6f-gpu-small

Check NVIDIA Device Plugin

kubectl get pods -n nvidia
kubectl get daemonset -n nvidia

Check GPU Nodes

kubectl get nodes -L node.kubernetes.io/instance-type,gpu-memory-size,gpu-fraction

Check GPU Capacity on Nodes

kubectl get nodes -o json | jq '.items[] | select(.status.capacity["nvidia.com/gpu"] != null) | {name: .metadata.name, instance: .metadata.labels["node.kubernetes.io/instance-type"], gpu: .status.capacity["nvidia.com/gpu"]}'

How Dynamic Selection Works

Pod requests GPU with node selector:

nodeSelector:
  gpu-memory-size: "3gb"
resources:
  requests:
    nvidia.com/gpu: 1

Karpenter matches to NodePool:
- Sees gpu-memory-size: "3gb" requirement
- Matches to g6f-gpu-small NodePool
- NodePool only allows g6f.large or g6f.xlarge
NodeOverlay enables GPU detection:
- AWS API reports GPU count = 0 for fractional GPUs
- NodeOverlay overrides this to GPU count = 1
- Karpenter knows instance can satisfy GPU request
Instance provisioned:
- Karpenter provisions most cost-effective option
- Node gets labeled with gpu-memory-size: "3gb"
- Pod schedules to the new node
GPU registered:
- NVIDIA device plugin starts on node
- Detects actual GPU hardware
- Registers nvidia.com/gpu: 1 on node
- Pod can access fractional GPU

Why NodeOverlay is Required

The Problem

AWS EC2 API reports GPU count as 0 for fractional GPU instances:

aws ec2 describe-instance-types --instance-types g6f.xlarge

"GpuInfo": {
  "Gpus": [{
    "Name": "L4",
    "Count": 0,  // ← AWS reports 0!
    "MemoryInfo": {"SizeInMiB": 2861}
  }]
}

Without NodeOverlay

Pod requests: nvidia.com/gpu: 1
↓
Karpenter checks AWS API
↓
AWS says: GPU Count = 0
↓
Karpenter: "Instance has 0 GPUs, can't satisfy pod"
↓
❌ Won't provision g6f instance

With NodeOverlay

Pod requests: nvidia.com/gpu: 1
↓
Karpenter checks NodeOverlay
↓
NodeOverlay says: nvidia.com/gpu = "1"
↓
Karpenter: "Instance has 1 GPU, can satisfy pod"
↓
✅ Provisions g6f instance

Cost Optimization

Instance Pricing (Approximate)

Instance	GPU Fraction	GPU Memory	On-Demand	Spot (avg)
g6f.large	1/8	3 GB	$0.08/hr	$0.03/hr
g6f.xlarge	1/8	3 GB	$0.16/hr	$0.05/hr
g6f.2xlarge	1/4	6 GB	$0.32/hr	$0.10/hr
g6f.4xlarge	1/2	12 GB	$0.64/hr	$0.20/hr

Cost Savings Example

Without dynamic selection:

10 small workloads → 10x g6f.2xlarge = $3.20/hr

With dynamic selection:

10 small workloads → 10x g6f.xlarge = $1.60/hr
Savings: 50%

Troubleshooting

Pod Stuck in Pending

# Check pod events
kubectl describe pod <pod-name>

# Common issues:
# 1. Missing node selector
# 2. Node selector doesn't match NodePool labels
# 3. NodePool limits reached
# 4. NodeOverlay not applied

GPU Not Detected

# Check device plugin on node
kubectl get pods -n nvidia -o wide

# Check GPU capacity
kubectl describe node <node-name> | grep nvidia

# Check device plugin logs
kubectl logs -n nvidia -l app.kubernetes.io/name=nvidia-device-plugin

NodeOverlay Not Working

# Check NodeOverlay status
kubectl get nodeoverlay
kubectl describe nodeoverlay g6f-fractional-gpu

# Verify feature gate is enabled
kubectl get deployment -n kube-system karpenter -o yaml | grep nodeOverlay

# Check Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller | grep overlay

Best Practices

1. Always Use Node Selectors

# Required for dynamic selection
nodeSelector:
  gpu-memory-size: "3gb"  # or "6gb", "12gb"

2. Match CPU/Memory to Instance

g6f.large/xlarge: 2-4 vCPUs, 8-16GB RAM
g6f.2xlarge: 4-8 vCPUs, 16-32GB RAM
g6f.4xlarge: 8-16 vCPUs, 32-64GB RAM

3. Use Spot for Cost Savings

NodePools are configured for both Spot and On-Demand. Karpenter prefers Spot.

4. Monitor GPU Utilization

Install DCGM exporter for GPU metrics:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/main/dcgm-exporter.yaml

Cleanup

# Delete workloads
kubectl delete deployment gpu-small-inference gpu-medium-inference gpu-large-inference

# Delete NodePools (will drain nodes)
kubectl delete nodepool g6f-gpu-small g6f-gpu-medium g6f-gpu-large

# Delete NodeOverlay
kubectl delete nodeoverlay g6f-fractional-gpu

# Delete EC2NodeClass
kubectl delete ec2nodeclass g6f-gpu

# Uninstall NVIDIA device plugin
helm uninstall nvdp --namespace nvidia
kubectl delete namespace nvidia

# Uninstall Karpenter
helm uninstall karpenter --namespace kube-system

# Delete CloudFormation stack
aws cloudformation delete-stack --stack-name "Karpenter-${CLUSTER_NAME}"

Summary

This setup provides:

✅ Dynamic G6f instance selection based on workload requirements
✅ Cost optimization through right-sizing (50%+ savings)
✅ NodeOverlay support for fractional GPU detection
✅ Simple node selector approach
✅ Karpenter v1.8 with NodeOverlay feature enabled
✅ AWS-recommended NVIDIA device plugin via Helm
✅ Production-ready configuration

Key Files:

g6f-nodeoverlay.yaml - Required for fractional GPU support
g6f-ec2nodeclass.yaml - EC2NodeClass for GPU nodes
g6f-nodepools.yaml - Multiple NodePools for dynamic selection
sample-gpu-workloads.yaml - Example workloads

Critical Insight: NodeOverlay is required because AWS reports GPU count as 0 for fractional GPU instances. Without it, Karpenter won't provision G6f instances for GPU workloads.

Result: Workloads automatically get the right-sized G6f instance based on their GPU memory requirements, optimizing both cost and GPU utilization.

askulkarni2/karpenter-g6f-workaroun.md