Version: 1.0
Last Updated: February 2026
Target Audience: Platform Engineers, DevOps Teams, Migration Specialists
Migrating Kubernetes workloads from Amazon EKS to Azure AKS appears straightforwardβboth are managed Kubernetes services running the same core platform. However, cloud provider-specific integrations, CSI drivers, networking models, and authentication mechanisms create significant friction points that can cause application failures post-migration.
This document catalogs real-world migration pain points encountered when moving stateful and stateless workloads from EKS to AKS, with detailed remediation strategies, code examples, and automated detection patterns for migration tooling.
- Identity & Access: IRSA vs Workload Identity requires application-level changes
- Storage: Different CSI drivers, performance characteristics, and access modes
- Networking: Security Groups for Pods don't translate to Network Policies
- Secrets: AWS Secrets Manager vs Azure Key Vault require different CSI configurations
- Ingress: ALB-specific features need AGIC or nginx equivalents
- Observability: CloudWatch vs Azure Monitor have different collection mechanisms
- Cost: Different pricing models for storage, networking, and compute
- Identity and Authentication
- Persistent Storage
- Secrets Management
- Ingress and Load Balancing
- Networking and Security
- Observability and Logging
- Container Registry
- Backup and Disaster Recovery
- Compute and Node Configuration
- Service Mesh Integration
- Database-Specific Integrations
- GitOps and CI/CD
- Detection Patterns
- Migration Strategies
- Quick Reference Tables
Severity: π΄ High - Application breaking
Frequency: Very Common
Impact: Authentication failures, unable to access cloud resources
EKS uses IRSA to provide AWS credentials to pods via service account annotations. This integrates seamlessly with AWS SDK libraries. AKS uses Azure Workload Identity (formerly Azure AD Pod Identity), which has a completely different configuration model.
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/prod-s3-reader-role
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: document-processor
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: doc-processor
template:
metadata:
labels:
app: doc-processor
spec:
serviceAccountName: s3-reader
containers:
- name: processor
image: myregistry/doc-processor:v1.2.3
env:
- name: AWS_REGION
value: us-east-1
- name: S3_BUCKET
value: production-documents
- name: AWS_DEFAULT_REGION
value: us-east-1Application Code (Python):
import boto3
# This just works - AWS SDK automatically uses IRSA credentials
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='production-documents')# Pod starts but fails at runtime
kubectl logs document-processor-7d9f8b5c4-x8k2m
# Output:
# botocore.exceptions.NoCredentialsError: Unable to locate credentials
# or
# botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation- Service Account annotation is AWS-specific - AKS doesn't recognize
eks.amazonaws.com/role-arn - OIDC provider is different - EKS OIDC endpoint vs Azure AD
- Token format differs - AWS STS tokens vs Azure AD tokens
- SDK credential chain changes - AWS SDK won't find Azure credentials automatically
Prerequisites:
- Create Azure Storage Account
- Create Managed Identity with Storage Blob Data Contributor role
- Set up Workload Identity federation
Azure Configuration:
# Create storage account
az storage account create \
--name prodstorageacct \
--resource-group production-rg \
--location eastus \
--sku Standard_ZRS
# Create container
az storage container create \
--name documents \
--account-name prodstorageacct
# Create managed identity
az identity create \
--name doc-processor-identity \
--resource-group production-rg
# Get identity client ID
IDENTITY_CLIENT_ID=$(az identity show \
--name doc-processor-identity \
--resource-group production-rg \
--query clientId -o tsv)
# Assign storage permissions
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee $IDENTITY_CLIENT_ID \
--scope /subscriptions/<subscription-id>/resourceGroups/production-rg/providers/Microsoft.Storage/storageAccounts/prodstorageacctAKS Configuration:
apiVersion: v1
kind: ServiceAccount
metadata:
name: blob-reader
namespace: production
annotations:
azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
azure.workload.identity/tenant-id: "87654321-4321-4321-4321-210987654321"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: document-processor
namespace: production
labels:
azure.workload.identity/use: "true" # Required!
spec:
replicas: 3
selector:
matchLabels:
app: doc-processor
template:
metadata:
labels:
app: doc-processor
azure.workload.identity/use: "true" # Required on pod!
spec:
serviceAccountName: blob-reader
containers:
- name: processor
image: myregistry/doc-processor:v2.0.0 # Updated image
env:
- name: AZURE_STORAGE_ACCOUNT_NAME
value: prodstorageacct
- name: AZURE_STORAGE_CONTAINER_NAME
value: documents
# Note: No explicit credentials - Workload Identity handles itApplication Code Changes (Python):
# NEW: Azure SDK instead of boto3
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
# DefaultAzureCredential automatically uses Workload Identity
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(
account_url=f"https://prodstorageacct.blob.core.windows.net",
credential=credential
)
container_client = blob_service_client.get_container_client("documents")
# List blobs (equivalent to S3 list_objects_v2)
blob_list = container_client.list_blobs()
for blob in blob_list:
print(f"Blob name: {blob.name}")Testing:
# Verify workload identity is working
kubectl run -it --rm debug \
--image=mcr.microsoft.com/azure-cli \
--serviceaccount=blob-reader \
--labels=azure.workload.identity/use=true \
-- bash
# Inside pod:
az login --identity
az storage blob list \
--account-name prodstorageacct \
--container-name documents \
--auth-mode loginUse Case: Multi-cloud strategy, data residency, or gradual migration
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader
namespace: production
annotations:
azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
---
apiVersion: v1
kind: Secret
metadata:
name: aws-credentials
namespace: production
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: "AKIA..."
AWS_SECRET_ACCESS_KEY: "wJalrXUtn..."
# OR use Azure Key Vault CSI to inject these
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: document-processor
spec:
template:
spec:
serviceAccountName: s3-reader
containers:
- name: processor
image: myregistry/doc-processor:v1.2.3
envFrom:
- secretRef:
name: aws-credentials
env:
- name: AWS_REGION
value: us-east-1Better Approach: Use Azure Managed Identity to assume AWS IAM role via OIDC federation
# Set up federated identity credential in Azure
az identity federated-credential create \
--name aws-federation \
--identity-name doc-processor-identity \
--resource-group production-rg \
--issuer "https://sts.amazonaws.com" \
--subject "arn:aws:iam::123456789012:role/prod-s3-reader-role" \
--audience "sts.amazonaws.com"
# Configure AWS IAM role to trust Azure AD
# (Complex setup - beyond scope, generally not recommended)- Inventory all ServiceAccounts with
eks.amazonaws.com/*annotations - Identify AWS SDK usage in application code
- Decide: Migrate to Azure services or maintain cross-cloud access
- Create Azure Managed Identities
- Set up Workload Identity federation
- Update application code to use Azure SDKs (if migrating to Azure services)
- Update Kubernetes manifests with Azure annotations
- Test authentication in non-production environment
- Update CI/CD pipelines to build new container images
- Document credential management changes
- Forgetting pod label:
azure.workload.identity/use: "true"must be on BOTH deployment and pod template - Token expiration: Azure AD tokens have different lifetimes than AWS STS tokens
- SDK version: Older Azure SDK versions don't support Workload Identity
- Regional endpoints: Azure Storage URLs differ from S3 URLs
- Permissions model: Azure RBAC roles vs AWS IAM policies have different granularity
Severity: π΄ High - Application won't start
Frequency: Universal (every stateful app)
Impact: PVCs stuck in Pending, StatefulSets won't deploy
EBS-specific StorageClasses don't exist in AKS. Different provisioners, parameters, and performance tiers require manifest updates.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "16000"
throughput: "1000"
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: database
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 500Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 500Gikubectl get pvc -n database
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
# postgres-data Pending fast-ssd
kubectl describe pvc postgres-data -n database
# Events:
# Warning ProvisioningFailed storageclass.storage.k8s.io "fast-ssd" not foundPerformance Tier Mapping:
| EBS Type | IOPS | Throughput | Azure Disk Equivalent | SKU | IOPS | Throughput |
|---|---|---|---|---|---|---|
| gp3 (baseline) | 3,000 | 125 MB/s | Premium SSD v2 | PremiumV2_LRS | 3,000 | 125 MB/s |
| gp3 (16k IOPS) | 16,000 | 1,000 MB/s | Ultra Disk | UltraSSD_LRS | 16,000+ | 1,000 MB/s |
| io2 Block Express | 256,000 | 4,000 MB/s | Ultra Disk | UltraSSD_LRS | 160,000 | 4,000 MB/s |
| st1 (throughput) | 500 | 500 MB/s | Standard SSD | StandardSSD_LRS | varies | ~60 MB/s |
AKS Configuration:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: disk.csi.azure.com
parameters:
skuName: UltraSSD_LRS # For high IOPS requirement
cachingMode: None # Ultra Disk doesn't support caching
# DiskIOPSReadWrite and DiskMBpsReadWrite are set per-PVC for Ultra Disk
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: database
annotations:
# Ultra Disk specific parameters
disk.csi.azure.com/diskIOPSReadWrite: "16000"
disk.csi.azure.com/diskMBpsReadWrite: "1000"
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 500GiImportant Notes:
-
Ultra Disk requires specific VM sizes - Not all Azure VM SKUs support Ultra Disk
# Check if node pool supports Ultra Disk az aks nodepool show \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name nodepool1 \ --query "enableUltraSsd"
-
No disk encryption by default - Must use Azure Disk Encryption Set
parameters: skuName: UltraSSD_LRS diskEncryptionSetID: /subscriptions/.../diskEncryptionSets/myDES
-
Cost implications - Ultra Disk is significantly more expensive
- Pay for provisioned IOPS and throughput, not just capacity
- Consider Premium SSD v2 for better cost/performance balance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: disk.csi.azure.com
parameters:
skuName: PremiumV2_LRS
cachingMode: ReadOnly # Premium v2 supports caching
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: database
annotations:
disk.csi.azure.com/diskIOPSReadWrite: "10000"
disk.csi.azure.com/diskMBpsReadWrite: "500"
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 500Gi-
Option A: Velero Backup/Restore (Volume Data)
# In EKS velero backup create postgres-backup \ --include-namespaces database \ --snapshot-volumes # In AKS (after setting up new StorageClass) velero restore create postgres-restore \ --from-backup postgres-backup
-
Option B: Database-Native Dump/Restore
# In EKS - Dump database kubectl exec -n database postgres-0 -- \ pg_dumpall -U postgres > /tmp/postgres-dump.sql # Copy to local machine kubectl cp database/postgres-0:/tmp/postgres-dump.sql ./postgres-dump.sql # In AKS - Restore after new StatefulSet is running kubectl cp ./postgres-dump.sql database/postgres-0:/tmp/postgres-dump.sql kubectl exec -n database postgres-0 -- \ psql -U postgres < /tmp/postgres-dump.sql
-
Option C: Continuous Replication (Zero Downtime)
# Set up PostgreSQL streaming replication from EKS to AKS # Primary in EKS, Replica in AKS # Promote AKS replica to primary during cutover
Severity: π‘ Medium - Depends on use case
Frequency: Common (20-30% of workloads)
Impact: Shared storage not available, multi-pod writes fail
EFS provides NFS-based shared storage with ReadWriteMany access mode. Azure Files provides similar capability but with different performance characteristics, protocols (SMB vs NFS), and pricing.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-storage
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-0123456789abcdef0
directoryPerms: "700"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/dynamic_provisioning"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-uploads
namespace: web
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-storage
resources:
requests:
storage: 100Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
namespace: web
spec:
replicas: 5 # Multiple pods share the volume
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: nginx
image: nginx:1.21
volumeMounts:
- name: uploads
mountPath: /var/www/uploads
volumes:
- name: uploads
persistentVolumeClaim:
claimName: shared-uploadskubectl get pvc -n web
# PVC pending - EFS driver not available
kubectl describe pvc shared-uploads -n web
# provisioner "efs.csi.aws.com" not foundProtocol Decision:
- NFS 4.1: Better for Linux workloads, POSIX compliance, better performance
- SMB 3.0: Better for Windows workloads, AD integration, encryption at rest
Option 1: Azure Files with NFS (Recommended for Linux)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile-nfs
provisioner: file.csi.azure.com
parameters:
protocol: nfs
skuName: Premium_LRS # NFS requires Premium tier
# Network settings for better performance
networkEndpointType: privateEndpoint # Optional: for private access
mountOptions:
- nconnect=4 # Parallel connections for better throughput
- actimeo=30 # Attribute cache timeout
allowVolumeExpansion: true
volumeBindingMode: Immediate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-uploads
namespace: web
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile-nfs
resources:
requests:
storage: 100GiOption 2: Azure Files with SMB
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile-smb
provisioner: file.csi.azure.com
parameters:
skuName: Standard_LRS # Or Premium_LRS
protocol: smb
# Optional: Use existing storage account
# storageAccount: mystorageaccount
# resourceGroup: myResourceGroup
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=33 # www-data user
- gid=33
- mfsymlinks # Enable symlinks
- cache=strict
- actimeo=30
allowVolumeExpansion: true
volumeBindingMode: Immediate| Metric | EFS | Azure Files Premium (NFS) | Azure Files Standard (SMB) |
|---|---|---|---|
| Max throughput | 10 GB/s | 10 GB/s | 60 MB/s per share |
| Max IOPS | 500,000+ | 100,000 | 1,000-20,000 |
| Latency | Low (single-digit ms) | Low (single-digit ms) | Higher (varies) |
| Min size | No minimum | 100 GiB | 1 GiB |
| Pricing model | Pay per GB used | Pay per GB provisioned | Pay per GB used |
| Bursting | Yes | Yes | Limited |
-
File Permissions
# EFS uses NFSv4 ACLs # Azure Files NFS uses NFSv4.1 - mostly compatible # Azure Files SMB uses NTFS ACLs - potential permission issues # Test file operations kubectl exec -it web-frontend-xxx -- touch /var/www/uploads/test.txt kubectl exec -it web-frontend-xxx -- ls -la /var/www/uploads/
-
Symbolic Links
# Azure Files SMB requires mfsymlinks mount option mountOptions: - mfsymlinks
-
File Locking
# EFS supports byte-range locking # Azure Files NFS: Full support # Azure Files SMB: Full support # Test your application's file locking behavior
-
Case Sensitivity
# EFS: Case-sensitive (Linux NFS) # Azure Files NFS: Case-sensitive # Azure Files SMB: Case-insensitive by default # This could break applications expecting case-sensitivity! touch /uploads/File.txt touch /uploads/file.txt # Different files on EFS/NFS, same file on SMB
Option 1: Rsync Between Volumes
# Create sync pod with both volumes mounted
apiVersion: v1
kind: Pod
metadata:
name: efs-to-azurefile-sync
namespace: web
spec:
containers:
- name: sync
image: instrumentisto/rsync-ssh:latest
command: ["/bin/sh", "-c"]
args:
- |
rsync -avz --progress \
/source/ /destination/
echo "Sync complete"
sleep infinity
volumeMounts:
- name: source
mountPath: /source
- name: destination
mountPath: /destination
volumes:
- name: source
persistentVolumeClaim:
claimName: efs-pvc # EKS cluster - requires cross-cluster volume access
- name: destination
persistentVolumeClaim:
claimName: azurefile-pvc # AKS clusterOption 2: AWS DataSync to Azure Blob, then mount
# Use AWS DataSync to copy data to S3
# Use AzCopy to copy from S3 to Azure Files
azcopy copy \
"https://my-bucket.s3.amazonaws.com/*" \
"https://mystorageaccount.file.core.windows.net/myshare" \
--recursiveOption 3: Application-Level Migration
# 1. Deploy application in AKS with empty Azure Files volume
# 2. Configure application to write to both EFS (in AWS) and Azure Files
# 3. Run backfill job to copy existing data
# 4. Switch application to read from Azure Files
# 5. Decommission EFSSeverity: π΄ High - Application won't start
Frequency: Very Common (80%+ of secure applications)
Impact: Secrets not available, authentication failures
Applications using AWS Secrets Manager via the Secrets Store CSI Driver need reconfiguration to use Azure Key Vault. The SecretProviderClass CRD has completely different parameters.
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: application-secrets
namespace: production
spec:
provider: aws
parameters:
objects: |
- objectName: "production/database/postgres"
objectType: "secretsmanager"
objectAlias: "db-password"
- objectName: "production/api/jwt-secret"
objectType: "secretsmanager"
objectAlias: "jwt-key"
- objectName: "production/ssl/certificate"
objectType: "secretsmanager"
objectAlias: "ssl-cert"
secretObjects: # Auto-create Kubernetes Secrets
- secretName: db-credentials
type: Opaque
data:
- objectName: db-password
key: password
- secretName: jwt-credentials
type: Opaque
data:
- objectName: jwt-key
key: secret
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
spec:
template:
spec:
serviceAccountName: api-sa # Has IRSA permissions
containers:
- name: api
image: myapi:v1.0
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-credentials
key: secret
volumeMounts:
- name: secrets
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "application-secrets"kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# api-server-6d8f9c5b4-abc123 0/1 ContainerCreating 0 5m
kubectl describe pod api-server-6d8f9c5b4-abc123 -n production
# Events:
# Warning FailedMount MountVolume.SetUp failed for volume "secrets" :
# rpc error: code = Unknown desc = failed to mount secrets store objects for pod:
# provider "aws" not foundPrerequisites:
# 1. Enable Azure Key Vault Provider for Secrets Store CSI Driver
az aks enable-addons \
--addons azure-keyvault-secrets-provider \
--name myAKSCluster \
--resource-group myResourceGroup
# 2. Create Azure Key Vault
az keyvault create \
--name prodappvault \
--resource-group production-rg \
--location eastus
# 3. Create Managed Identity for workload
az identity create \
--name api-server-identity \
--resource-group production-rg
# 4. Grant Key Vault access
IDENTITY_CLIENT_ID=$(az identity show \
--name api-server-identity \
--resource-group production-rg \
--query clientId -o tsv)
az keyvault set-policy \
--name prodappvault \
--secret-permissions get list \
--spn $IDENTITY_CLIENT_IDMigrate Secrets:
# Export from AWS Secrets Manager
aws secretsmanager get-secret-value \
--secret-id production/database/postgres \
--query SecretString \
--output text > db-password.txt
# Import to Azure Key Vault
az keyvault secret set \
--vault-name prodappvault \
--name db-password \
--file db-password.txt
# Repeat for other secrets
az keyvault secret set \
--vault-name prodappvault \
--name jwt-secret \
--value "$(aws secretsmanager get-secret-value --secret-id production/api/jwt-secret --query SecretString --output text)"AKS Configuration:
apiVersion: v1
kind: ServiceAccount
metadata:
name: api-sa
namespace: production
annotations:
azure.workload.identity/client-id: "12345678-1234-1234-1234-123456789012"
azure.workload.identity/tenant-id: "87654321-4321-4321-4321-210987654321"
---
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: application-secrets
namespace: production
spec:
provider: azure
parameters:
usePodIdentity: "false"
useVMManagedIdentity: "false"
clientID: "12345678-1234-1234-1234-123456789012" # Managed Identity Client ID
keyvaultName: "prodappvault"
cloudName: "" # Empty for Azure Public Cloud
objects: |
array:
- |
objectName: db-password
objectType: secret
objectAlias: db-password
- |
objectName: jwt-secret
objectType: secret
objectAlias: jwt-key
- |
objectName: ssl-certificate
objectType: secret
objectAlias: ssl-cert
tenantId: "87654321-4321-4321-4321-210987654321"
secretObjects: # Create Kubernetes Secrets (same as before)
- secretName: db-credentials
type: Opaque
data:
- objectName: db-password
key: password
- secretName: jwt-credentials
type: Opaque
data:
- objectName: jwt-key
key: secret
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
labels:
azure.workload.identity/use: "true"
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
spec:
serviceAccountName: api-sa
containers:
- name: api
image: myapi:v1.0
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-credentials
key: secret
volumeMounts:
- name: secrets
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "application-secrets"AWS Secrets Manager doesn't auto-rotate mounted secrets by default in CSI.
Azure Key Vault CSI supports auto-rotation:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: application-secrets
namespace: production
spec:
provider: azure
parameters:
# ... other parameters ...
# Enable secret rotation
secretProviderClass: "application-secrets"
# Secrets will be rotated based on polling intervalConfigure rotation interval:
# Update CSI driver configuration
kubectl edit configmap azure-keyvault-secrets-provider-config -n kube-system
# Add:
data:
rotation-poll-interval: "120s" # Check for updates every 2 minutes-
Token Expiration
# Workload Identity tokens expire # Symptoms: "AuthenticationFailed" after ~24 hours # Solution: Ensure pod has correct labels azure.workload.identity/use: "true"
-
Permission Errors
# Error: "Caller is not authorized to perform action" # Check Key Vault access policies az keyvault show --name prodappvault --query properties.accessPolicies # Grant missing permissions az keyvault set-policy \ --name prodappvault \ --object-id <managed-identity-object-id> \ --secret-permissions get list
-
Secret Not Syncing
# Check CSI driver logs kubectl logs -n kube-system -l app=secrets-store-csi-driver # Check provider logs kubectl logs -n kube-system -l app=csi-secrets-store-provider-azure
Severity: π‘ Medium - Functionality degraded
Frequency: Very Common
Impact: Lost features (SSL, redirects, WAF), different costs
AWS ALB Ingress Controller annotations don't work on AKS. Features like SSL termination, HTTP-to-HTTPS redirects, health checks, and WAF integration need reconfiguration.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
# SSL Certificate from ACM
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc-def-ghi
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
# HTTP to HTTPS redirect
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/actions.ssl-redirect: |
{"Type": "redirect", "RedirectConfig": {
"Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"
}}
# Health check configuration
alb.ingress.kubernetes.io/healthcheck-path: /health
alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
alb.ingress.kubernetes.io/success-codes: "200"
# Access logs
alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=my-alb-logs
# WAF
alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:us-east-1:123456789012:regional/webacl/MyWAF/a1b2c3d4
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080# Ingress created but:
# - No ALB (falls back to nginx or nothing)
# - No SSL termination
# - No HTTP redirect
# - No WAF
# - No custom health checks
# - Different cost model
kubectl get ingress -n production
# NAME CLASS HOSTS ADDRESS PORTS AGE
# production-ingress <none> api.example.com 80 5mMost similar to ALB, enterprise features
Prerequisites:
# Create Application Gateway
az network application-gateway create \
--name prodAppGateway \
--resource-group production-rg \
--location eastus \
--sku WAF_v2 \
--capacity 2 \
--vnet-name aksVNet \
--subnet appgw-subnet \
--public-ip-address appgw-pip
# Enable AGIC addon on AKS
az aks enable-addons \
--name myAKSCluster \
--resource-group production-rg \
--addon ingress-appgw \
--appgw-id /subscriptions/.../resourceGroups/production-rg/providers/Microsoft.Network/applicationGateways/prodAppGatewayAKS Configuration:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: azure/application-gateway
# SSL Certificate from Azure Key Vault
appgw.ingress.kubernetes.io/appgw-ssl-certificate: "api-example-com-cert"
# HTTP to HTTPS redirect
appgw.ingress.kubernetes.io/ssl-redirect: "true"
# Backend protocol
appgw.ingress.kubernetes.io/backend-protocol: "http"
# Health probe
appgw.ingress.kubernetes.io/health-probe-path: "/health"
appgw.ingress.kubernetes.io/health-probe-interval: "15"
appgw.ingress.kubernetes.io/health-probe-timeout: "5"
appgw.ingress.kubernetes.io/health-probe-unhealthy-threshold: "3"
# WAF Policy
appgw.ingress.kubernetes.io/waf-policy-for-path: "/subscriptions/.../resourceGroups/production-rg/providers/Microsoft.Network/applicationGatewayWebApplicationFirewallPolicies/prodWAF"
# Connection draining
appgw.ingress.kubernetes.io/connection-draining: "true"
appgw.ingress.kubernetes.io/connection-draining-timeout: "30"
spec:
tls:
- hosts:
- api.example.com
secretName: api-tls-secret # Certificate must be in Key Vault and referenced
rules:
- host: api.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080Certificate Setup:
# Import certificate to Key Vault
az keyvault certificate import \
--vault-name prodappvault \
--name api-example-com-cert \
--file certificate.pfx \
--password "cert-password"
# Grant Application Gateway access
az keyvault set-policy \
--name prodappvault \
--spn <appgw-identity> \
--secret-permissions get \
--certificate-permissions getWAF Configuration:
# Create WAF policy
az network application-gateway waf-policy create \
--name prodWAF \
--resource-group production-rg \
--location eastus
# Configure OWASP rules
az network application-gateway waf-policy managed-rule rule-set add \
--policy-name prodWAF \
--resource-group production-rg \
--type OWASP \
--version 3.2Better for multi-cloud, more mature, larger community
# Install nginx ingress controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthzAKS Configuration:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: nginx
# SSL redirect
nginx.ingress.kubernetes.io/ssl-redirect: "true"
# Force SSL
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
# Certificate management via cert-manager
cert-manager.io/cluster-issuer: "letsencrypt-prod"
# Rate limiting
nginx.ingress.kubernetes.io/limit-rps: "100"
# Custom health check
nginx.ingress.kubernetes.io/health-check-path: "/health"
# CORS
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "https://example.com"
spec:
tls:
- hosts:
- api.example.com
secretName: api-tls-secret # Auto-provisioned by cert-manager
rules:
- host: api.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080cert-manager Setup (for automated SSL):
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Create ClusterIssuer for Let's Encrypt
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF| Feature | AWS ALB | Azure App Gateway (AGIC) | nginx Ingress |
|---|---|---|---|
| SSL Termination | ACM | Key Vault | cert-manager/manual |
| WAF | AWS WAF | Azure WAF | ModSecurity (addon) |
| Path-based routing | β | β | β |
| HTTP redirects | β | β | β |
| Header manipulation | Limited | β | β (extensive) |
| Rate limiting | Via WAF | Via WAF | β (native) |
| Canary deployments | Via target groups | Via backend pools | β (native) |
| mTLS | β | β | β |
| Cost | Pay per hour + LCU | Pay per hour + capacity | Free (infra only) |
| Multi-cloud | AWS only | Azure only | Any cloud |
Severity: π΄ High - Security controls lost
Frequency: Common in regulated industries
Impact: Pod-level network isolation not available
EKS allows assigning AWS Security Groups directly to pods via the VPC CNI plugin. AKS uses standard Kubernetes Network Policies, which have different capabilities and granularity.
# Custom Resource for Security Group Policy
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
name: database-pod-sg
namespace: database
spec:
podSelector:
matchLabels:
app: postgres
tier: database
securityGroups:
groupIds:
- sg-0a1b2c3d4e5f6g7h8 # Only allows 5432 from app tier SG
- sg-1a2b3c4d5e6f7g8h9 # Allows SSH from bastion SG
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
tier: database
template:
metadata:
labels:
app: postgres
tier: database
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
name: postgres
# Pod automatically gets dedicated ENI with security group sg-0a1b2c3d4e5f6g7h8AWS Security Group Rules (defined in AWS):
# sg-0a1b2c3d4e5f6g7h8 - Database Security Group
# Inbound:
# - Port 5432 from sg-app-tier-xyz (application pods)
# - Port 5432 from sg-bastion-abc (admin access)
# Outbound:
# - Port 5432 to sg-0a1b2c3d4e5f6g7h8 (cluster communication)# SecurityGroupPolicy CRD doesn't exist
kubectl get securitygrouppolicy -n database
# error: the server doesn't have a resource type "securitygrouppolicy"
# Pods have no network restrictions
# All pods can communicate with all pods!Enable Azure Network Policy:
# When creating cluster
az aks create \
--resource-group production-rg \
--name myAKSCluster \
--network-plugin azure \
--network-policy azure # or "calico"
# For existing cluster (requires recreation of node pools)
az aks update \
--resource-group production-rg \
--name myAKSCluster \
--network-policy azureAKS Configuration:
# Default deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: database
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow specific ingress to PostgreSQL
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-allow-from-app
namespace: database
spec:
podSelector:
matchLabels:
app: postgres
tier: database
policyTypes:
- Ingress
- Egress
ingress:
# Allow from application tier
- from:
- namespaceSelector:
matchLabels:
name: application
podSelector:
matchLabels:
tier: application
ports:
- protocol: TCP
port: 5432
# Allow from monitoring
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9187 # postgres_exporter
# Allow from same namespace (replica communication)
- from:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
egress:
# Allow DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# Allow PostgreSQL replication
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
---
# Label namespaces for network policy
apiVersion: v1
kind: Namespace
metadata:
name: application
labels:
name: application
---
apiVersion: v1
kind: Namespace
metadata:
name: database
labels:
name: database| Aspect | AWS Security Groups | K8s Network Policies |
|---|---|---|
| Scope | ENI (pod gets own network interface) | Pod-to-pod |
| Statefulness | Stateful (return traffic automatic) | Varies by CNI plugin |
| IP-based rules | Can reference external IPs | Can reference IP blocks (CIDR) |
| Cloud integration | Native AWS (RDS, ELB, etc.) | Kubernetes-only |
| Management | AWS Console/API/Terraform | Kubernetes manifests |
| Performance | Enforced at VPC level (hardware) | Enforced at node level (software) |
| Granularity | Per-ENI (can be per-pod) | Per-pod only |
| Cost | No additional cost | No additional cost |
For node-level security (not pod-level):
# Create NSG for AKS nodes
az network nsg create \
--resource-group production-rg \
--name aks-node-nsg
# Add rules
az network nsg rule create \
--resource-group production-rg \
--nsg-name aks-node-nsg \
--name allow-postgres-from-app-nodes \
--priority 100 \
--source-address-prefixes 10.240.1.0/24 \ # App tier subnet
--destination-port-ranges 5432 \
--access Allow \
--protocol Tcp
# Associate with subnet
az network vnet subnet update \
--resource-group production-rg \
--vnet-name aksVNet \
--name database-subnet \
--network-security-group aks-node-nsgLimitation: NSGs apply to ALL pods on a node, not individual pods like Security Groups for Pods
-
Inventory Security Groups
# List all SecurityGroupPolicies in EKS kubectl get securitygrouppolicy --all-namespaces -o yaml > eks-sg-policies.yaml
-
Map to Network Policies
- Security Group β Network Policy (pod selector)
- Security Group rules β Ingress/Egress rules
- Source Security Groups β Namespace/Pod selectors
-
Test Thoroughly
# Test connectivity between pods kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- /bin/bash # Inside pod: nc -zv postgres-0.postgres.database.svc.cluster.local 5432
-
Use Policy Enforcement Tools
# Install network policy enforcer visualizer kubectl apply -f https://github.com/ahmetb/kubernetes-network-policy-recipes/blob/master/00-deny-all-traffic-to-an-application.yaml
Severity: π‘ Medium - Operational visibility
Frequency: Universal
Impact: Different query language, metrics, alerting
EKS integrates with CloudWatch for logs and metrics. AKS uses Azure Monitor with different collection mechanisms, query languages (KQL vs CloudWatch Insights), and pricing models.
FluentBit DaemonSet for CloudWatch:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: amazon-cloudwatch
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Grace 30
Daemon Off
Log_Level info
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
DB /var/fluent-bit/state/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /aws/eks/production-cluster/application
log_stream_prefix from-fluent-bit-
auto_create_group true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: amazon-cloudwatch
spec:
selector:
matchLabels:
name: fluent-bit
template:
metadata:
labels:
name: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: amazon/aws-for-fluent-bit:latest
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-configQuerying in CloudWatch Insights:
fields @timestamp, @message
| filter kubernetes.namespace_name = "production"
| filter kubernetes.labels.app = "api-server"
| filter @message like /ERROR/
| stats count() by bin(5m)# Logs not reaching any destination
# CloudWatch not accessible from Azure
# Need to reconfigure entire logging pipelineEnable Container Insights:
# Create Log Analytics Workspace
az monitor log-analytics workspace create \
--resource-group production-rg \
--workspace-name prodLogAnalytics \
--location eastus
# Enable on AKS cluster
az aks enable-addons \
--resource-group production-rg \
--name myAKSCluster \
--addons monitoring \
--workspace-resource-id /subscriptions/<subscription-id>/resourceGroups/production-rg/providers/Microsoft.OperationalInsights/workspaces/prodLogAnalyticsThis automatically deploys:
- OMS Agent DaemonSet (collects logs and metrics)
- Container Insights solution
- Pre-configured workbooks and dashboards
Querying in Azure Monitor (KQL):
ContainerLog
| where TimeGenerated > ago(1h)
| where Namespace == "production"
| where PodLabel_app_s == "api-server"
| where LogEntry contains "ERROR"
| summarize count() by bin(TimeGenerated, 5m)
| render timechartQuery Translation Examples:
| CloudWatch Insights | Azure Monitor (KQL) |
|---|---|
fields @timestamp, @message |
project TimeGenerated, LogEntry |
filter kubernetes.namespace = "prod" |
where Namespace == "prod" |
filter @message like /ERROR/ |
where LogEntry contains "ERROR" |
stats count() by bin(5m) |
summarize count() by bin(TimeGenerated, 5m) |
sort @timestamp desc |
sort by TimeGenerated desc |
limit 100 |
take 100 |
In EKS (CloudWatch Custom Metrics):
import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
Namespace='Production/API',
MetricData=[
{
'MetricName': 'RequestDuration',
'Value': 123.45,
'Unit': 'Milliseconds',
'Dimensions': [
{'Name': 'Endpoint', 'Value': '/api/users'},
{'Name': 'StatusCode', 'Value': '200'}
]
}
]
)In AKS (Azure Monitor Custom Metrics):
from azure.monitor.ingestion import LogsIngestionClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = LogsIngestionClient(
endpoint="https://prodLogAnalytics.eastus-1.ingest.monitor.azure.com",
credential=credential
)
# Send custom logs
client.upload(
rule_id="/subscriptions/.../dataCollectionRules/myDCR",
stream_name="Custom-RequestMetrics",
logs=[
{
"TimeGenerated": "2024-02-16T10:00:00Z",
"Endpoint": "/api/users",
"Duration": 123.45,
"StatusCode": 200
}
]
)EKS (CloudWatch Alarms):
aws cloudwatch put-metric-alarm \
--alarm-name high-error-rate \
--alarm-description "Alert when error rate > 5%" \
--metric-name Errors \
--namespace AWS/EKS \
--statistic Sum \
--period 300 \
--evaluation-periods 2 \
--threshold 100 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789012:critical-alertsAKS (Azure Monitor Alerts):
# Create alert rule
az monitor metrics alert create \
--name high-error-rate \
--resource-group production-rg \
--scopes /subscriptions/.../resourceGroups/production-rg/providers/Microsoft.ContainerService/managedClusters/myAKSCluster \
--condition "avg Percentage CPU > 80" \
--window-size 5m \
--evaluation-frequency 1m \
--action /subscriptions/.../resourceGroups/production-rg/providers/microsoft.insights/actionGroups/critical-alertsOr using KQL-based log alerts:
az monitor scheduled-query create \
--name high-error-rate-log \
--resource-group production-rg \
--scopes /subscriptions/.../workspaces/prodLogAnalytics \
--condition "count > 100" \
--condition-query "ContainerLog | where LogEntry contains 'ERROR' | summarize count()" \
--window-size 5m \
--evaluation-frequency 1m \
--action /subscriptions/.../actionGroups/critical-alerts| Feature | CloudWatch | Azure Monitor |
|---|---|---|
| Log ingestion | $0.50/GB | $2.76/GB (first 5GB/day free per workspace) |
| Log storage | $0.03/GB/month | Included for 31 days, $0.12/GB/month after |
| Metrics | Custom metrics $0.30/metric | Included (native), $0.60/metric (custom) |
| Queries | $0.005/GB scanned | Included |
| Data export | $0.09/GB | $0.13/GB |
Severity: π’ Low - Straightforward migration
Frequency: Universal
Impact: Image pulls fail until reconfigured
Container images stored in Amazon ECR need to be migrated to ACR, and image pull secrets need updating.
apiVersion: v1
kind: Secret
metadata:
name: ecr-registry
namespace: production
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <base64-encoded-ecr-credentials>
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
spec:
template:
spec:
imagePullSecrets:
- name: ecr-registry
containers:
- name: api
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/api-server:v1.2.31. Create ACR:
az acr create \
--resource-group production-rg \
--name prodacr \
--sku Premium \
--location eastus2. Enable ACR Integration with AKS:
# Attach ACR to AKS (automatic image pull)
az aks update \
--resource-group production-rg \
--name myAKSCluster \
--attach-acr prodacr3. Migrate Images:
# Login to both registries
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
az acr login --name prodacr
# Pull from ECR
docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/api-server:v1.2.3
# Tag for ACR
docker tag \
123456789012.dkr.ecr.us-east-1.amazonaws.com/api-server:v1.2.3 \
prodacr.azurecr.io/api-server:v1.2.3
# Push to ACR
docker push prodacr.azurecr.io/api-server:v1.2.3Automated migration script:
#!/bin/bash
ECR_REGISTRY="123456789012.dkr.ecr.us-east-1.amazonaws.com"
ACR_REGISTRY="prodacr.azurecr.io"
# List all images in ECR
aws ecr describe-repositories --region us-east-1 --output json | \
jq -r '.repositories[].repositoryName' | \
while read repo; do
# List all tags for repository
aws ecr list-images --region us-east-1 --repository-name $repo --output json | \
jq -r '.imageIds[].imageTag' | \
while read tag; do
echo "Migrating $repo:$tag"
# Pull from ECR
docker pull $ECR_REGISTRY/$repo:$tag
# Tag for ACR
docker tag $ECR_REGISTRY/$repo:$tag $ACR_REGISTRY/$repo:$tag
# Push to ACR
docker push $ACR_REGISTRY/$repo:$tag
# Clean up local image
docker rmi $ECR_REGISTRY/$repo:$tag $ACR_REGISTRY/$repo:$tag
done
done4. Update Manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
spec:
template:
spec:
# No imagePullSecrets needed when ACR is attached to AKS
containers:
- name: api
image: prodacr.azurecr.io/api-server:v1.2.3 # Updated image reference5. Update CI/CD Pipelines:
# GitHub Actions example
- name: Login to ACR
uses: azure/docker-login@v1
with:
login-server: prodacr.azurecr.io
username: ${{ secrets.ACR_USERNAME }}
password: ${{ secrets.ACR_PASSWORD }}
- name: Build and push
run: |
docker build -t prodacr.azurecr.io/api-server:${{ github.sha }} .
docker push prodacr.azurecr.io/api-server:${{ github.sha }}# Replicate to multiple regions for faster pulls
az acr replication create \
--registry prodacr \
--location westus2
az acr replication create \
--registry prodacr \
--location westeuropeSeverity: π‘ Medium
Frequency: Common
Impact: Backup/restore processes need reconfiguration
EBS snapshot-based backups (via tools like Velero) use AWS-specific APIs. Azure has different snapshot mechanisms.
EKS Velero Configuration:
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: default
namespace: velero
spec:
provider: aws
objectStorage:
bucket: prod-velero-backups
prefix: eks-cluster
config:
region: us-east-1
---
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: default
namespace: velero
spec:
provider: aws
config:
region: us-east-1AKS Velero Configuration:
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: default
namespace: velero
spec:
provider: azure
objectStorage:
bucket: velero-backups # Actually an Azure Blob container
prefix: aks-cluster
config:
resourceGroup: production-rg
storageAccount: prodvelarostorage
subscriptionId: 12345678-1234-1234-1234-123456789012
---
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: default
namespace: velero
spec:
provider: azure
config:
resourceGroup: production-rg
subscriptionId: 12345678-1234-1234-1234-123456789012Install Velero with Azure Plugin:
# Create storage account for backups
az storage account create \
--name prodvelarostorage \
--resource-group production-rg \
--sku Standard_GRS \
--encryption-services blob \
--https-only true
# Create blob container
az storage container create \
--name velero-backups \
--account-name prodvelarostorage
# Install Velero
velero install \
--provider azure \
--plugins velero/velero-plugin-for-microsoft-azure:v1.9.0 \
--bucket velero-backups \
--secret-file ./credentials-velero \
--backup-location-config resourceGroup=production-rg,storageAccount=prodvelarostorage,subscriptionId=12345678-1234-1234-1234-123456789012 \
--snapshot-location-config resourceGroup=production-rg,subscriptionId=12345678-1234-1234-1234-123456789012Severity: π’ Low - Configuration change
Frequency: Universal
Impact: Performance characteristics may differ
Node pools configured for specific EC2 instance types don't exist in Azure. VM sizes have different names, capabilities, and pricing.
| EKS (EC2) | vCPU | Memory | AKS (Azure VM) | vCPU | Memory | Notes |
|---|---|---|---|---|---|---|
| t3.medium | 2 | 4 GiB | Standard_B2ms | 2 | 8 GiB | Burstable |
| m5.large | 2 | 8 GiB | Standard_D2s_v5 | 2 | 8 GiB | General purpose |
| m5.xlarge | 4 | 16 GiB | Standard_D4s_v5 | 4 | 16 GiB | General purpose |
| c5.xlarge | 4 | 8 GiB | Standard_F4s_v2 | 4 | 8 GiB | Compute optimized |
| r5.xlarge | 4 | 32 GiB | Standard_E4s_v5 | 4 | 32 GiB | Memory optimized |
| p3.2xlarge | 8 | 61 GiB + V100 | Standard_NC6s_v3 | 6 | 112 GiB + V100 | GPU |
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production-cluster
region: us-east-1
nodeGroups:
- name: general-purpose
instanceType: m5.xlarge
desiredCapacity: 3
minSize: 2
maxSize: 10
labels:
workload-type: general
taints:
- key: workload-type
value: general
effect: NoScheduleaz aks nodepool add \
--resource-group production-rg \
--cluster-name myAKSCluster \
--name generalpurpose \
--node-count 3 \
--min-count 2 \
--max-count 10 \
--node-vm-size Standard_D4s_v5 \
--labels workload-type=general \
--node-taints workload-type=general:NoSchedule \
--enable-cluster-autoscalerNo changes needed - Kubernetes-native constructs work identically:
apiVersion: apps/v1
kind: Deployment
metadata:
name: compute-intensive-app
spec:
template:
spec:
nodeSelector:
workload-type: general
tolerations:
- key: "workload-type"
operator: "Equal"
value: "general"
effect: "NoSchedule"
containers:
- name: app
image: my-app:latestSeverity: π‘ Medium - Advanced use cases
Frequency: Uncommon
Impact: Service mesh configuration incompatible
AWS App Mesh uses AWS-specific CRDs and control plane. Azure supports open-source service meshes (Istio, Linkerd, OSM).
This is complex and beyond the scope of this document, but key considerations:
-
Install Istio on AKS
istioctl install --set profile=production
-
Migrate Virtual Services
- App Mesh VirtualServices β Istio VirtualServices
- Different syntax, similar concepts
-
Update mTLS Configuration
- App Mesh uses AWS Certificate Manager
- Istio uses cert-manager or manual certificates
-
Rewrite Traffic Policies
Severity: π’ Low - If using managed databases
Frequency: Common
Impact: Connection strings, authentication
Applications connecting to AWS RDS need connection string updates for Azure Database for PostgreSQL/MySQL.
apiVersion: v1
kind: Secret
metadata:
name: db-connection
namespace: production
stringData:
host: "prod-postgres.c9akz82fkwix.us-east-1.rds.amazonaws.com"
port: "5432"
database: "production"
username: "app_user"
password: "secure-password"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
containers:
- name: api
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-connection
key: host
- name: DB_PORT
valueFrom:
secretKeyRef:
name: db-connection
key: port
# etc.apiVersion: v1
kind: Secret
metadata:
name: db-connection
namespace: production
stringData:
host: "prod-postgres.postgres.database.azure.com" # Changed!
port: "5432"
database: "production"
username: "app_user@prod-postgres" # Azure requires @servername
password: "secure-password"
# Optional: SSL parameters for Azure Database
sslmode: "require"
---
# Rest of deployment unchangedAdditional Azure-specific considerations:
-
SSL/TLS Required
# Connection string must include SSL conn = psycopg2.connect( host="prod-postgres.postgres.database.azure.com", port=5432, database="production", user="app_user@prod-postgres", password="password", sslmode="require" )
-
Firewall Rules
# Allow AKS nodes to access Azure Database az postgres server firewall-rule create \ --resource-group production-rg \ --server-name prod-postgres \ --name AllowAKSNodes \ --start-ip-address 10.240.0.0 \ --end-ip-address 10.240.255.255 -
Private Endpoints (recommended)
# Create private endpoint for database az network private-endpoint create \ --name postgres-private-endpoint \ --resource-group production-rg \ --vnet-name aksVNet \ --subnet database-subnet \ --private-connection-resource-id /subscriptions/.../servers/prod-postgres \ --group-id postgresqlServer \ --connection-name postgres-connection
Severity: π’ Low - CI/CD reconfiguration
Frequency: Very Common
Impact: Build/deploy pipelines need rewriting
AWS-native CI/CD tools (CodePipeline, CodeBuild, CodeDeploy) need replacement or reconfiguration.
Option 1: GitHub Actions (Cloud-agnostic)
name: Deploy to AKS
on:
push:
branches: [ main ]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Build and push image
run: |
az acr login --name prodacr
docker build -t prodacr.azurecr.io/api-server:${{ github.sha }} .
docker push prodacr.azurecr.io/api-server:${{ github.sha }}
- name: Set AKS context
uses: azure/aks-set-context@v3
with:
resource-group: production-rg
cluster-name: myAKSCluster
- name: Deploy to AKS
uses: azure/k8s-deploy@v4
with:
manifests: |
k8s/deployment.yaml
k8s/service.yaml
images: |
prodacr.azurecr.io/api-server:${{ github.sha }}Option 2: Azure DevOps
trigger:
branches:
include:
- main
pool:
vmImage: 'ubuntu-latest'
variables:
acrName: 'prodacr'
imageName: 'api-server'
aksResourceGroup: 'production-rg'
aksClusterName: 'myAKSCluster'
stages:
- stage: Build
jobs:
- job: BuildAndPush
steps:
- task: Docker@2
inputs:
containerRegistry: 'prodacr'
repository: $(imageName)
command: 'buildAndPush'
Dockerfile: '**/Dockerfile'
tags: |
$(Build.BuildId)
latest
- stage: Deploy
jobs:
- job: DeployToAKS
steps:
- task: KubernetesManifest@0
inputs:
action: 'deploy'
kubernetesServiceConnection: 'myAKSCluster'
namespace: 'production'
manifests: |
k8s/deployment.yaml
k8s/service.yaml
containers: |
$(acrName).azurecr.io/$(imageName):$(Build.BuildId)Tools like Konveyor should flag these patterns:
# PATTERN: EKS-specific annotations
annotations:
eks.amazonaws.com/role-arn: *
alb.ingress.kubernetes.io/*: *
# ACTION: Flag for Workload Identity or AGIC migration# PATTERN: AWS storage drivers
spec:
csi:
driver: ebs.csi.aws.com
driver: efs.csi.aws.com
# ACTION: Suggest Azure Disk or Azure Files# PATTERN: AWS-only Custom Resources
apiVersion: vpcresources.k8s.aws/*
apiVersion: secretsproviderclass.k8s.aws/*
# ACTION: Recommend Kubernetes Network Policies or Azure equivalents# PATTERN: AWS SDK environment variables
env:
- name: AWS_REGION
- name: AWS_DEFAULT_REGION
- name: AWS_ACCESS_KEY_ID
# ACTION: Warn about credential management changes# PATTERN: AWS service endpoints
env:
- name: S3_ENDPOINT
value: "https://s3.us-east-1.amazonaws.com"
- name: SQS_URL
value: "https://sqs.us-east-1.amazonaws.com/123456789012/my-queue"
# ACTION: Suggest Azure service equivalents# PATTERN: Code accessing EC2 metadata
import requests
response = requests.get('http://169.254.169.254/latest/meta-data/')
# ACTION: Flag for Azure Instance Metadata Service (IMDS) migration-
Phase 1: Infrastructure (Week 1)
- Create AKS cluster
- Configure networking, storage classes
- Set up Azure equivalents (ACR, Key Vault, etc.)
-
Phase 2: Data Migration (Week 2)
- Velero backup from EKS
- Velero restore to AKS (data only)
- Validate data integrity
-
Phase 3: Application Deployment (Week 3)
- Update manifests (storage classes, ingress, etc.)
- Deploy via GitOps
- Run smoke tests
-
Phase 4: Cutover (Week 4)
- DNS cutover
- Decommission EKS
Pros: Fast, minimal code changes
Cons: Doesn't leverage Azure-native features, potential performance issues
-
Phase 1: Build Green (AKS) (Weeks 1-2)
- Parallel infrastructure build
- Migrate data
-
Phase 2: Validate Green (Week 3)
- Run integration tests
- Performance testing
- Security validation
-
Phase 3: Traffic Split (Week 4)
- 10% traffic to AKS
- Monitor for 48 hours
- Increase to 50%, then 100%
-
Phase 4: Decommission Blue (EKS) (Week 5)
- Archive data
- Terminate EKS
Pros: Safest, easy rollback
Cons: Highest cost (dual infrastructure), complex traffic splitting
-
Phase 1: Stateless Workloads (Weeks 1-3)
- Migrate stateless apps first
- Test in production with real traffic
-
Phase 2: Stateful Non-Database (Weeks 4-6)
- Redis, message queues
- Can tolerate brief downtime
-
Phase 3: Databases (Weeks 7-10)
- Set up replication
- Gradual cutover per database
-
Phase 4: Cleanup (Week 11+)
- Remove EKS resources
- Optimize AKS
Pros: Lowest risk, learn as you go
Cons: Longest duration, complex coordination
| Category | EKS Component | AKS Equivalent | Migration Effort | Blocking? |
|---|---|---|---|---|
| Auth | IRSA | Workload Identity | High (code changes) | π΄ Yes |
| Storage | EBS CSI | Azure Disk CSI | Medium (manifests) | π΄ Yes |
| Secrets | Secrets Manager CSI | Key Vault CSI | Medium (manifests + data) | π΄ Yes |
| Ingress | ALB Controller | AGIC / nginx | Medium (manifests) | π‘ Partial |
| Network | Security Groups for Pods | Network Policies | High (different model) | π‘ Partial |
| Registry | ECR | ACR | Low (image migration) | π΄ Yes |
| Monitoring | CloudWatch | Azure Monitor | Medium (queries) | π’ No |
| Backups | Velero (AWS) | Velero (Azure) | Low (config) | π’ No |
- Inventory all AWS-specific annotations across all manifests
- List all StatefulSets and PersistentVolumeClaims
- Document all IRSA service accounts and their permissions
- Export all AWS Secrets Manager secrets
- List all ALB Ingresses and their annotations
- Document CloudWatch dashboards and alerts
- Map EC2 instance types to Azure VM sizes
- Plan database migration strategy (native tools vs Velero)
- Update CI/CD pipelines
- Train team on Azure-specific tooling (KQL, Azure Portal)
- Set up cost monitoring in Azure
- Plan DNS cutover strategy
- Define rollback procedures
- Pod authentication works (Workload Identity)
- PVCs provision correctly (storage classes)
- Secrets mount successfully (Key Vault CSI)
- Ingress creates load balancer (AGIC/nginx)
- Network policies block unauthorized traffic
- Applications can connect to databases
- Logs appear in Azure Monitor
- Metrics are collected
- Alerts fire correctly
- Backups complete successfully
- Load testing passes
- Security scanning passes
- Cost is within budget
Cause: IRSA not configured, Workload Identity missing
Fix: Add Workload Identity annotations to ServiceAccount and pod labels
Cause: EBS StorageClass doesn't exist in AKS
Fix: Create Azure Disk or Azure Files StorageClass
Cause: AWS Secrets Store CSI provider not installed
Fix: Reconfigure SecretProviderClass for Azure
Cause: Volume driver mismatch
Fix: Update CSI driver in PV/PVC specs
| Scenario | EKS (EBS gp3) | AKS (Premium SSD) | Savings |
|---|---|---|---|
| 1 TB, 3000 IOPS | $80/month | $135/month | -69% |
| 1 TB, 10000 IOPS | $145/month | $180/month | -24% |
Recommendation: Use Azure Premium SSD v2 for cost-effective high-IOPS workloads
| Workload | EKS (m5.xlarge) | AKS (D4s_v5) | Savings |
|---|---|---|---|
| 24/7 production | $122/month | $140/month | -15% |
| Dev/test (8h/day) | $41/month | $47/month | -15% |
Note: Costs vary by region and commitment (Reserved Instances vs Spot)
Migrating from EKS to AKS requires careful planning and attention to cloud-specific integrations. The most common pain points involve:
- Authentication: IRSA β Workload Identity
- Storage: EBS/EFS β Azure Disk/Files
- Secrets: AWS Secrets Manager β Key Vault
- Networking: Security Groups β Network Policies
- Observability: CloudWatch β Azure Monitor
Success Factors:
- Thorough inventory of AWS-specific resources
- Automated detection of cloud-specific patterns
- Comprehensive testing in staging environment
- Incremental migration approach
- Team training on Azure-specific concepts
Tools to Leverage:
- Konveyor for automated migration analysis
- Velero for data migration
- GitOps (ArgoCD/Flux) for consistent deployments
- Azure Migrate for assessment
This document should serve as a comprehensive reference for platform teams undertaking EKS to AKS migrations.