Created
January 28, 2026 12:29
-
-
Save oglok/bcfb6a11252030e1fed50d66eb768bf3 to your computer and use it in GitHub Desktop.
AWS EC2 Instance with A10 GPU on RHEL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Spin up the AWS EC2 instance: | |
| ```# 1. Create a secure Key Pair and save the private key | |
| aws ec2 create-key-pair --key-name rnoriega --query 'KeyMaterial' --output text > rnoriega.pem | |
| chmod 400 rnoriega.pem | |
| # 2. Find a Subnet in an Availability Zone that supports G5 hardware | |
| SUBNET_ID=$(aws ec2 describe-subnets --filters "Name=availability-zone,Values=$(aws ec2 describe-instance-type-offerings --location-type availability-zone --filters Name=instance-type,Values=g5.xlarge --query 'InstanceTypeOfferings[0].Location' --output text)" --query 'Subnets[0].SubnetId' --output text) | |
| # 3. Get the 'default' Security Group for that specific Subnet's VPC | |
| VPC_ID=$(aws ec2 describe-subnets --subnet-ids $SUBNET_ID --query 'Subnets[0].VpcId' --output text) | |
| SG_ID=$(aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC_ID Name=group-name,Values=default --query "SecurityGroups[0].GroupId" --output text) | |
| # 4. Authorize SSH (Port 22) access for your Security Group | |
| aws ec2 authorize-security-group-ingress --group-id $SG_ID --protocol tcp --port 22 --cidr 0.0.0.0/0 | |
| # 5. Launch the g5.xlarge instance with RHEL 10 | |
| aws ec2 run-instances \ | |
| --image-id ami-0167eca66536f2900 \ | |
| --count 1 \ | |
| --instance-type g5.xlarge \ | |
| --key-name rnoriega \ | |
| --security-group-ids $SG_ID \ | |
| --subnet-id $SUBNET_ID \ | |
| --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \ | |
| --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=RHEL10-GPU}]' | |
| # Get instance IP address | |
| INSTANCE_IP=$(aws ec2 describe-instances \ | |
| --filters "Name=tag:Name,Values=RHEL10-GPU" "Name=instance-state-name,Values=running" \ | |
| --query "Reservations[0].Instances[0].PublicIpAddress" --output text) | |
| ssh -i "~/.ssh/rnoriega.pem" ec2-user@$INSTANCE_IP | |
| ``` | |
| ``` ============================================================================== | |
| # AWS EC2 RHEL 10 GPU SETUP (G5 / A10G) | |
| # Optimized for vLLM and High-Performance Compute | |
| # ============================================================================== | |
| # 1. Update and install kernel headers matching your current version | |
| sudo dnf update -y | |
| sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make | |
| # 2. Unlock the Red Hat repositories for NVIDIA (AWS RHUI path) | |
| # Note: subscription-manager is often disabled on AWS RHEL; this ensures access. | |
| sudo dnf install -y rhel-drivers | |
| # 3. Add the official NVIDIA Repository for RHEL 10 | |
| sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel10/x86_64/cuda-rhel10.repo | |
| # 4. Install EPEL (Required for DKMS support) | |
| sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm | |
| sudo dnf install -y dkms | |
| # 5. Install the NVIDIA Driver (Headless/Compute-only version) | |
| # Using 'cuda-drivers' avoids the GUI/Wayland dependencies that crash reboots. | |
| sudo dnf install -y cuda-drivers --allowerasing | |
| # 6. Disable the open-source Nouveau driver to prevent conflicts | |
| echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf | |
| echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf | |
| sudo dracut --force | |
| # 7. REBOOT (Required to load the new NVIDIA kernel modules) | |
| sudo reboot | |
| # 8. VERIFICATION (Run after logging back in) | |
| # Should display the NVIDIA A10G table. | |
| nvidia-smi | |
| ``` | |
| In AWS, there is a local storage that comes for free, let's mount it: | |
| ```# Find the 250GB local drive (usually /dev/nvme1n1) | |
| lsblk | |
| # Format and mount to /mnt/models | |
| sudo mkfs.xfs /dev/nvme1n1 | |
| sudo mkdir -p /mnt/models | |
| sudo mount /dev/nvme1n1 /mnt/models | |
| sudo chown ec2-user:ec2-user /mnt/models | |
| # Set vLLM to use this drive for weights | |
| export VLLM_CACHE_ROOT=/mnt/models/vllm_cache | |
| ``` | |
| Create AMI disk from the instance: | |
| ``` | |
| aws ec2 create-image \ | |
| --instance-id <your-instance-id> \ | |
| --name "RHEL10-vLLM-Omni-Backup-Jan2026" \ | |
| --description "Verified G5 A10G setup with drivers and vllm-omni" \ | |
| --no-reboot | |
| ami-01df964fa6e78559b | |
| ``` | |
| Recreate the VM later: | |
| ``` | |
| aws ec2 run-instances \ | |
| --image-id <your-new-personal-ami-id> \ | |
| --count 1 \ | |
| --instance-type g5.xlarge \ | |
| --key-name rnoriega \ | |
| --security-group-ids $SG_ID \ | |
| --subnet-id $SUBNET_ID | |
| ``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment