Skip to content

Instantly share code, notes, and snippets.

@Halliax
Last active January 10, 2020 21:15
Show Gist options
  • Select an option

  • Save Halliax/a16e0a0028e1a2dcd9ccededfaaa1968 to your computer and use it in GitHub Desktop.

Select an option

Save Halliax/a16e0a0028e1a2dcd9ccededfaaa1968 to your computer and use it in GitHub Desktop.
A section of a CloudFormation template for a GPU mixed-instance node group
GPUSpotNodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub "${ClusterName}-${NodeGroupName}"
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredSize # 0
MinSize: !Ref NodeAutoScalingGroupMinSize # 0
MaxSize: !Ref NodeAutoScalingGroupMaxSize # 10, arbitrarily
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: !Ref OnDemandBaseCapacity # 0
OnDemandPercentageAboveBaseCapacity: !Ref OnDemandPercentageAboveBaseCapacity # 0
SpotAllocationStrategy: lowest-price
SpotInstancePools: !Ref SpotInstancePools
SpotMaxPrice: !Ref SpotMaxPrice
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref NodeLaunchTemplate # defined later in the template
Version: !GetAtt NodeLaunchTemplate.LatestVersionNumber
Overrides: # InstanceTypesOverride = "p2.xlarge,g3s.xlarge,g3.4xlarge"
- InstanceType: !Select [0, !Split [ ",", !Ref InstanceTypesOverride ] ]
- InstanceType: !Select [1, !Split [ ",", !Ref InstanceTypesOverride ] ]
- InstanceType: !Select [2, !Split [ ",", !Ref InstanceTypesOverride ] ]
VPCZoneIdentifier:
- !Select [0, !Ref Subnets] # Subnets is of type List<AWS::EC2::Subnet::Id>
Tags:
- Key: Name
Value: !Sub "${ClusterName}-${NodeGroupName}-Node"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
Value: 'owned'
PropagateAtLaunch: 'true'
- Key: k8s.io/cluster-autoscaler/node-template/label/nvidia.com/gpu
Value: 'true'
PropagateAtLaunch: 'true'
- Key: k8s.io/cluster-autoscaler/node-template/taint/dedicated
Value: nvidia.com/gpu=true
PropagateAtLaunch: 'true'
- Key: k8s.io/cluster-autoscaler/enabled
Value: 'true'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredSize
MaxBatchSize: '1'
PauseTime: 'PT5M'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment