Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save carloswm85/12739308908b75922c1682ec9f7c63b2 to your computer and use it in GitHub Desktop.

Select an option

Save carloswm85/12739308908b75922c1682ec9f7c63b2 to your computer and use it in GitHub Desktop.

πŸ“ ASSIGNMENT INSTRUCTIONS: Week 10 - Teach One Another: Infrastructure Standards

Purpose

Create a set of standards for a company and understand how enforcing standards via code can increase predicability and reduce administrative overhead.

Scenario

You were asked to assist the cloud engineering team to migrate from an on-premise* infrastructure to a cloud-based infrastructure for a medium-sized company with $500 million in annual revenue. This company currently has about 500 internal and customer-facing web services, web applications, and other data integration-based applications. You are to report to the CTO (chief technology officer) with your plan on how to perform this lift-and-shift from on-premise to cloud. At this stage of the planning, you are only to report back on the naming conventions, consistency, reliability, and the potential cost management. After discussing with the teams, you have identified these following resources that need to be created for each application:

  • Virtual machines (minimum of 4) (since it's a lift-n-shift)
  • DNS
  • Logging management
  • Load balancers (a single load balancer will be sufficient)
  • IAM role for permissions
  • Cloud database
  • Secrets manager (for rotation of secrets)
  • Cloud storage (to store files and other assets)

The final requirement the CTO gave you was to have a smaller, scaled down environment for the test environment as well as a production environment for each application.

Tasks

As a team, submit the following:

  • A diagram with this cloud architecture (how each cloud resource would interact with the other resources)
  • A naming convention (since we are utilizing a staging/production environments with 500 applications, naming convention is necessary)
  • A timeline with the lift-and-shift migration process. For example:
  • Target start date
  • Target staging completion time length
  • Target production completion time length
  • Your plan for ensuring there is no company down time
  • How IaC can play a major role

* On-Premise: On-premises ("on-prem") refers to IT infrastructure, software, or systems deployed, operated, and maintained locally within an organization's own physical facilities rather than on remote, third-party cloud servers. It gives businesses full control over hardware, data, and security, but requires managing all maintenance, upgrades, and costs.


πŸ“ ASSIGNMENT RESOLUTION

1. Cloud Architecture Diagram

---
title: '~ Lift-and-Shift Cloud Architecture ~'
config:
  theme: dark
---

flowchart TD

  User["Users / External Systems"]
  DNS["(a) DNS Service"]
  LB["(b) Load Balancer"]
  VM1to4["(c) 4 Virtual Machines"]
  DB["(d) Cloud Database"]
  Storage["(e) Cloud Storage"]
  Secrets["(f) Secrets Manager"]
  Logs["(g) Logging System"]
  IAM["(h) IAM Role / Permissions"]

  User --> DNS
  DNS --> LB
  LB --> VM1to4

  VM1to4 <--> DB
  VM1to4 <--> Storage
  VM1to4 <--> Secrets
  VM1to4 <--> Logs
  IAM --> VM1to4
Loading

Architecture Flow

Step Component Purpose
a DNS Routes requests to the correct application endpoint
b Load Balancer Distributes traffic across VM instances
c Virtual Machines Host the migrated applications
d Cloud Database Stores application data
e Cloud Storage Stores files, assets, and backups
f Secrets Manager Manages credentials and secret rotation
g Logging Centralized monitoring and auditing
h IAM Controls access permissions between services

2. Naming Convention

  • Applications: 500 total
  • Environments required (2 total): Test (staging) and production
  • Resources required (8 total): Virtual Machines + DNS + Load Balancer + Logging System + IAM Role + Cloud Database + Secrets Manager + Cloud Storage

Since 500 applications Γ— 2 environments Γ— multiple resources exist, structured naming prevents chaos.

Standard Pattern

<company>-<application>-<environment>-<resource>-<region>-<number>

Component Meaning

Element Description Example
company Company identifier acme
application Application short name billing
environment stg or prd prd
resource Resource type vm, db, lb
region Cloud region us-east
number Instance number 01

Examples

Resource Name
VM acme-billing-prd-vm-us-east-01
Load Balancer acme-billing-prd-lb-us-east
Database acme-billing-prd-db-us-east
Storage acme-billing-prd-storage-us-east
Secrets Manager acme-billing-prd-secret-us-east
DNS billing-prd.company.com

Benefits

Benefit Explanation
Predictability Engineers immediately know what the resource is
Automation IaC scripts can auto-generate names
Scalability Easy to support thousands of resources
Monitoring Easier log filtering and metrics grouping

3. Migration Timeline

  • Lift-and-shift (also called rehosting) is a cloud migration strategy where applications are moved from on-premise servers to cloud servers without changing the application architecture.
  • Assuming 500 applications, migration must occur in phases.
Phase Duration Duration (%) Description
Planning 2–3 weeks ~10 Inventory systems, dependencies, architecture
Infrastructure Setup 2 weeks ~8 Create base cloud networking, IAM, logging
Staging Migration 6–8 weeks ~27 Migrate applications to staging environment
Testing 3 weeks ~12 Functional, performance, security testing
Production Migration 8–10 weeks ~35 Gradual migration of live services
Optimization 2 weeks ~8 Cost and performance tuning

Total weeks: 23-28 weeks or 5.3-6.5 months (100 %)

Example Schedule (Optimistic)

Milestone Target
Migration Start Week 1
Staging Ready Week 8
Production Ready Week 18
Full Migration Complete Week 22

4. Ensuring Zero Downtime

Strategy Explanation
Blue-Green Deployment Old system stays active while new cloud system is validated
DNS Cutover Gradually shift traffic to cloud infrastructure
Database Replication Sync on-premise DB to cloud DB
Incremental Migration Move apps in batches
Rollback Plan Immediate fallback to on-prem if issues occur

Migration Flow

  1. Deploy cloud environment - Create the base cloud infrastructure (network, subnets, VMs, load balancer, storage, database, IAM roles, logging).
  2. Replicate database - Synchronize the on-premise database with the cloud database so both contain the same data during migration.
  3. Deploy application VMs - Install and configure the application on the cloud virtual machines.
  4. Run staging validation - Test the application in the staging environment to verify functionality, performance, and connectivity.
  5. Route partial traffic - Gradually direct a small percentage of user traffic to the cloud environment to monitor stability.
  6. Full DNS switch - Update DNS records so all traffic is routed to the cloud infrastructure instead of the on-premise servers.

5. Cost Management Strategy

Method Description
Auto-scaling Scale VMs based on traffic
Reserved Instances Reduce cost for predictable workloads
Storage Tiering Move older data to cheaper storage
Monitoring Identify unused resources
Environment Downsizing Smaller infrastructure for staging

Environment Size Example

Resource Staging Production
Virtual Machines 2 4
Database Medium Large
Storage Standard High availability

6. Role of Infrastructure as Code (IaC)

FAQ

  1. What is IaC?
  • Infrastructure as Code (IaC) means the infrastructure is defined in code files instead of being created manually in the cloud portal.
  • Tools such as Terraform enforce standards and automate deployments.
  1. Where are the files located?
  • The infrastructure files are stored in a source control repository. Examples: Git repository, GitHub/GitLab/Azure DevOps, CI/CD pipeline
cloud-infrastructure/
β”œβ”€ modules/
β”‚   β”œβ”€ vm/
β”‚   β”œβ”€ database/
β”‚   β”œβ”€ load-balancer/
β”‚   └─ storage/
β”œβ”€ environments/
β”‚   β”œβ”€ staging/
β”‚   β”‚   └─ main.tf
β”‚   └─ production/
β”‚       └─ main.tf
β”œβ”€ variables.tf
└─ terraform.tfvars
  1. Who makes the changes?
  • Infrastructure changes follow a software development workflow.
Role Responsibility
Cloud Architect Defines infrastructure standards
DevOps Engineers Write Terraform modules
Cloud Engineers Modify infrastructure configs
Reviewers Approve pull requests
  1. How the deployment works?
  • Deployment is usually automated.

Flow:

  • Developer commits code
  • Pipeline runs Terraform
  • Terraform compares desired state vs current infrastructure
  • Terraform creates/updates resources

Benefits

Benefit Explanation
Consistency Same infrastructure across all apps
Version Control Infrastructure changes tracked in Git
Automation One script deploys full environments
Compliance Policies enforced automatically
Faster Deployments Infrastructure in minutes

Example Terraform Concept

module "application_stack" {
  source = "./modules/app"

  app_name    = "billing"
  environment = "prd"
  vm_count    = 4
}

This would automatically create:

  • VMs
  • Load balancer
  • Database
  • Storage
  • Secrets
  • Logging

for each application.


Final Outcome

After implementation:

Metric Result
Applications Migrated 500
Environments Staging/Testing + Production
Infrastructure Management Fully automated
Deployment Time Reduced from days to minutes
Downtime Near zero

  • For 500 applications, IaC ensures thousands of resources can be deployed consistently and safely.
  • People required for this kind of workplan: With ~30 engineers, completing migration in ~6 months becomes realistic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment