Introduction

Infrastructure as Code (IaC) is how modern teams build reliable systems. Instead of manually clicking through cloud consoles or SSHing into servers, you define infrastructure in codeβ€”testable, version-controlled, repeatable. This guide shows you practical patterns for Terraform, Ansible, and Kubernetes with real examples, not just theory.

Why Infrastructure as Code?

Consider a production outage scenario:

Without IaC:

  • Database server dies
  • You manually recreate it through AWS console (30 minutes)
  • Forgot to enable backups? Another 15 minutes
  • Need to reconfigure custom security groups? More time
  • Total recovery: 2-4 hours
  • Risk of missing steps = still broken

With IaC:

  • Database server dies
  • Run: terraform apply (5 minutes)
  • Everything recreates: RDS instance, security groups, backups, monitoring, IAM roles
  • Verify with: terraform plan before applying
  • Total recovery: 15-30 minutes
  • Zero manual steps = zero mistakes

The difference is catastrophic when you’re in an incident.

Terraform: State Management in Practice

The Problem With Manual Infrastructure

Imagine this scenario: Two SREs both need to provision servers, so they both log into AWS console and create them. Now you have:

  • Two separate places defining infrastructure (console + minds)
  • No version history of what changed
  • No rollback capability
  • No way to know why a security group exists
  • Manual changes drift from documentation
  • Disaster recovery means starting from scratch

This is configuration drift, and it’s your enemy.

Remote State: The Foundation

Terraform’s .tfstate file tracks what exists in your cloud. Without it, Terraform can’t know whether to create, update, or delete resources.

The wrong way (local state):

# Your laptop
terraform apply
# Creates infrastructure AND stores state in local terraform.tfstate
# Problem: If laptop crashes, you lose the state file
# Problem: Team members have different state files (chaos)
# Problem: State contains secrets (passwords, API keys)

The right way (remote state with encryption):

# versions.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

This simple configuration solves multiple problems:

State Storage:

  • State file stored in S3 (survives laptop crashes)
  • Encrypted at rest (secrets are protected)

State Locking:

  • DynamoDB table prevents concurrent modifications
  • When you run terraform apply, it locks the state
  • Another team member can’t apply until you’re done
  • Prevents conflicts and corruption

Example scenario:

# Alice starts deployment
$ terraform apply

# Bob tries to apply at same time
$ terraform apply
# Waits... waiting... (DynamoDB lock acquired by Alice)
# After Alice finishes, Bob can proceed

# This prevents both modifying the same infrastructure at once

Organizing Infrastructure Into Modules

Monolithic infrastructure code becomes unmaintainable fast:

Bad: Single 5000-line main.tf

# main.tf (everything dumped here)
resource "aws_vpc" "main" { ... }           # 50 lines
resource "aws_subnet" "public_1" { ... }    # 20 lines
resource "aws_subnet" "public_2" { ... }    # 20 lines
resource "aws_route_table" "public" { ... } # 15 lines
# ... 4900 more lines
# You can't find anything. New team members are confused.

Good: Organized modules

infrastructure/
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ networking/
β”‚   β”‚   β”œβ”€β”€ main.tf        (VPC, subnets, routing)
β”‚   β”‚   β”œβ”€β”€ variables.tf   (inputs)
β”‚   β”‚   └── outputs.tf     (what other modules need)
β”‚   β”œβ”€β”€ compute/
β”‚   β”‚   β”œβ”€β”€ main.tf        (EC2, security groups)
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   └── database/
β”‚       β”œβ”€β”€ main.tf        (RDS, backups)
β”‚       β”œβ”€β”€ variables.tf
β”‚       └── outputs.tf
β”‚
β”œβ”€β”€ environments/
β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   └── main.tf
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   └── main.tf
β”‚   └── production/
β”‚       └── main.tf
β”‚
└── versions.tf

Each module has a single responsibility. This enables reuse and clarity.

Step 1: Define Module Inputs (variables.tf)

What it does: The variables.tf file declares what inputs your module accepts. Think of it like function parametersβ€”it defines what values the module needs to work. This is where you validate inputs and set defaults.

# modules/networking/variables.tf

# STEP 1: Define what CIDR block the VPC should use
variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  # Validation: Only accept valid CIDR notation
  # Examples: 10.0.0.0/16, 172.16.0.0/16, 192.168.0.0/16
  validation {
    condition     = can(cidrhost(var.vpc_cidr, 0))
    error_message = "Must be valid CIDR notation (e.g., 10.0.0.0/16)"
  }
}

# STEP 2: Define which availability zones to use
# AZs are physical data centers in a region (us-east-1a, us-east-1b, etc.)
# We need at least 2 for high availability
variable "azs" {
  description = "List of availability zones"
  type        = list(string)
  # Example: ["us-east-1a", "us-east-1b"]
  # Having 2 means if one data center fails, the other still works
}

# STEP 3: Define environment name (used for naming resources)
variable "environment" {
  description = "Environment name (dev, staging, production)"
  type        = string
  # Validates that only valid environment names are used
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Must be dev, staging, or production"
  }
}

# STEP 4: Optional feature flag for NAT Gateway
# In dev, we might skip this to save costs
# In production, we need it for security
variable "enable_nat_gateway" {
  description = "Create NAT gateway for private subnets to access internet"
  type        = bool
  # Default: false (skip it unless explicitly enabled)
  # NAT Gateway costs ~$32/month, so dev doesn't need it
  default     = false
}

# STEP 5: Tags for cost tracking and organization
# Tags are labels that help you identify and organize resources
variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  # Example: { Environment = "prod", Team = "platform", CostCenter = "ops" }
  # These help with:
  # - AWS billing (who spent what)
  # - Resource identification (which team owns this)
  # - Automation (find all prod resources)
  default     = {}
}

Why this matters:

  • Reusability: Same module works with different inputs (dev vs prod)
  • Validation: Prevents invalid inputs from being applied (e.g., invalid CIDR)
  • Documentation: Each variable explains what it does
  • Safety: Errors caught before applying to cloud

Step 2: Create Resources (main.tf)

What it does: The main.tf file contains the actual resource definitions. This is where you say “create a VPC”, “create subnets”, etc. It references the variables from variables.tf to customize behavior.

# modules/networking/main.tf

# RESOURCE 1: Create the Virtual Private Cloud (VPC)
# A VPC is like a private network in AWS
# CIDR block (10.0.0.0/16) means:
#   - 10.0.0.0 to 10.0.255.255 = 65,536 IP addresses available
#   - Subnets will carve out smaller chunks (10.0.0.0/24 = 256 IPs each)
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr  # Use the CIDR from variables
  enable_dns_hostnames = true          # Allows friendly DNS names
  enable_dns_support   = true          # Enables DNS resolution

  # Tags are crucial for organization
  tags = merge(var.tags, {
    Name = "${var.environment}-vpc"  # Creates: "dev-vpc", "prod-vpc", etc.
  })
}

# RESOURCE 2: Create Public Subnets
# Public subnets are where you put load balancers and NAT gateways
# They have internet access directly
# We create one per availability zone for redundancy
# count = length(var.azs) means: if we have 2 AZs, create 2 subnets
resource "aws_subnet" "public" {
  count = length(var.azs)  # Creates one subnet per availability zone

  vpc_id                  = aws_vpc.main.id  # Put subnet in the VPC we created
  cidr_block              = cidrsubnet(var.vpc_cidr, 4, count.index)
  # cidrsubnet breaks the VPC CIDR into smaller chunks
  # Example: 10.0.0.0/16 becomes 10.0.0.0/20, 10.0.16.0/20, etc.
  
  availability_zone       = var.azs[count.index]  # First subnet in first AZ, etc.
  map_public_ip_on_launch = true  # Instances here get public IPs automatically

  tags = merge(var.tags, {
    Name = "${var.environment}-public-${count.index + 1}"
    Type = "public"  # Tag helps identify purpose
  })
}

# RESOURCE 3: Create Private Subnets
# Private subnets are for application servers and databases
# They don't have direct internet access (safer)
# Applications reach internet through NAT Gateway (in public subnet)
# We place them in the upper half of the CIDR block (after public subnets)
resource "aws_subnet" "private" {
  count = length(var.azs)  # One per availability zone, just like public

  vpc_id            = aws_vpc.main.id
  # offset by number of public subnets (count.index + length(var.azs))
  # so they don't overlap with public subnets
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index + length(var.azs))
  availability_zone = var.azs[count.index]

  tags = merge(var.tags, {
    Name = "${var.environment}-private-${count.index + 1}"
    Type = "private"
  })
}

# RESOURCE 4: Create NAT Gateway (Optional)
# NAT Gateway allows private subnets to reach the internet
# But internet can't reach back into private subnets (secure)
# Only created if enable_nat_gateway = true
# count = var.enable_nat_gateway ? 1 : 0 means:
#   - If true: create 1 NAT Gateway
#   - If false: create 0 (none)
resource "aws_nat_gateway" "main" {
  count = var.enable_nat_gateway ? 1 : 0

  # NAT Gateway needs an Elastic IP address (public IP that doesn't change)
  allocation_id = aws_eip.nat[0].id
  
  # Put NAT in a public subnet (so it's accessible from internet)
  subnet_id = aws_subnet.public[0].id

  tags = merge(var.tags, {
    Name = "${var.environment}-nat"
  })

  # Terraform best practice: depend on internet gateway existing first
  depends_on = [aws_internet_gateway.main]
}

Why this structure:

  • Modular: Each resource is a building block
  • Reusable: Same code works for different environments
  • Readable: Clear what each resource does
  • Maintainable: Easy to find and update specific resources

Step 3: Define Module Outputs (outputs.tf)

What it does: The outputs.tf file specifies what values the module exposes to other modules. If variables.tf is input, outputs.tf is output. Other modules need IDs, ARNs, and addresses from this module.

# modules/networking/outputs.tf

# OUTPUT 1: VPC ID
# Other modules (compute, database) need this to know which VPC to use
# Example value: vpc-0a1b2c3d4e5f6g7h8
output "vpc_id" {
  value       = aws_vpc.main.id
  description = "VPC ID for reference by other modules"
  # This value is used by compute module to create EC2 instances in this VPC
}

# OUTPUT 2: Public Subnet IDs
# Load balancers and NAT gateways need public subnets
# Example value: ["subnet-0abc123", "subnet-0def456"]
output "public_subnet_ids" {
  value       = aws_subnet.public[*].id
  description = "Public subnet IDs where load balancers go"
  # The [*].id syntax means: extract the id from each subnet
}

# OUTPUT 3: Private Subnet IDs
# Application servers and databases go in private subnets
# Example value: ["subnet-0ghi789", "subnet-0jkl012"]
output "private_subnet_ids" {
  value       = aws_subnet.private[*].id
  description = "Private subnet IDs where apps and databases go"
}

# OUTPUT 4: NAT Gateway IP (if created)
# Other resources might need to know the NAT Gateway's public IP
output "nat_gateway_eip" {
  value       = var.enable_nat_gateway ? aws_eip.nat[0].public_ip : null
  description = "Public IP of NAT Gateway for firewall rules"
  # null means "not created" if NAT Gateway is disabled
}

Why outputs matter:

  • Module communication: Lets modules pass data to each other
  • Dependency management: Terraform uses outputs to understand what depends on what
  • Documentation: Shows what useful values the module provides

Step 4: Use the Module (environments/*)

What it does: Now that you have a reusable networking module, use it in different environments with different configurations. This is where the real power of modules shows.

# environments/dev/main.tf
# This is for the DEV environment
# Dev is cheaper, simpler, fewer redundancies

module "vpc" {
  # Where is the module code?
  source = "../../modules/networking"

  # Input configuration for DEV
  vpc_cidr           = "10.0.0.0/16"           # Small network for dev
  environment        = "dev"
  enable_nat_gateway = false                   # Don't spend money on NAT in dev
  azs                = ["us-east-1a", "us-east-1b"]  # 2 AZs

  tags = {
    Environment = "dev"
    Team        = "platform"
    CostCenter  = "engineering"  # Charge to engineering budget
  }
}

# Use the outputs
output "dev_vpc_id" {
  value = module.vpc.vpc_id
}

output "dev_subnet_ids" {
  value = module.vpc.private_subnet_ids
}
# environments/production/main.tf
# This is for the PRODUCTION environment
# Production needs redundancy, security, and monitoring

module "vpc" {
  source = "../../modules/networking"

  # Input configuration for PRODUCTION
  vpc_cidr           = "10.100.0.0/16"  # Larger network
  environment        = "production"
  enable_nat_gateway = true             # Must have NAT for security and compliance
  azs                = ["us-east-1a", "us-east-1b", "us-east-1c"]  # 3 AZs for extra redundancy

  tags = {
    Environment = "production"
    Team        = "platform"
    CostCenter  = "ops"  # Charge to operations budget
  }
}

# Use the outputs
output "prod_vpc_id" {
  value = module.vpc.vpc_id
}

output "prod_private_subnets" {
  value = module.vpc.private_subnet_ids
}

output "prod_public_subnets" {
  value = module.vpc.public_subnet_ids
}

Key differences between dev and prod:

AspectDevProduction
VPC CIDR10.0.0.0/1610.100.0.0/16
NAT GatewayDisabled (cost savings)Enabled (security requirement)
AZs2 zones3 zones (more redundancy)
CostCenterengineeringops

The power of this approach:

  • Same module code for both environments (no duplication)
  • Different configurations per environment (flexibility)
  • Easy to maintain: bug fix in module helps both dev and prod
  • Easy to add staging: just copy production’s config, change CIDR and name

Step 5: Connecting Modules Together

What it does: Modules don’t live in isolation. The compute module needs to know which VPC to create instances in. This is where module outputs become inputs to other modules.

# environments/production/main.tf (expanded)

# First: Create networking
module "vpc" {
  source = "../../modules/networking"
  
  vpc_cidr           = "10.100.0.0/16"
  environment        = "production"
  enable_nat_gateway = true
  azs                = ["us-east-1a", "us-east-1b", "us-east-1c"]
  
  tags = {
    Environment = "production"
    Team        = "platform"
  }
}

# Then: Create compute infrastructure in that VPC
module "compute" {
  source = "../../modules/compute"
  
  # Use VPC outputs as inputs
  vpc_id              = module.vpc.vpc_id  # ← VPC output becomes compute input
  subnet_ids          = module.vpc.private_subnet_ids  # ← Same here
  
  instance_count = 3
  instance_type  = "t3.large"
  environment    = "production"
  
  tags = {
    Environment = "production"
  }
}

# Then: Create database in that VPC
module "database" {
  source = "../../modules/database"
  
  # Use VPC outputs again
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
  
  # Database specific settings
  engine         = "postgres"
  instance_class = "db.r5.large"
  environment    = "production"
  
  tags = {
    Environment = "production"
  }
}

# Output the complete stack
output "app_url" {
  value = module.compute.load_balancer_dns
}

output "database_endpoint" {
  value = module.database.endpoint
}

How modules work together:

modules/networking/outputs.tf
         ↓ (outputs)
         ↓ vpc_id, subnet_ids
         ↓
modules/compute/variables.tf ← (receives as input)
modules/database/variables.tf ← (receives as input)
         ↓
         ↓ Creates resources using these IDs
         ↓
Complete production stack!

Same module code, different configurations. You’re not copy-pasting; you’re reusing.

Testing Infrastructure Changes

Before applying to production, validate your code:

# Syntax check
$ terraform validate

# Format check (consistency)
$ terraform fmt -recursive

# Policy checking (security rules)
$ terraform plan -json | tfjson policy-check.rego

# Show exactly what will change
$ terraform plan -out=tfplan

# Review the plan carefully before applying
$ terraform apply tfplan

Real-world workflow:

# 1. Local development
$ terraform plan -var-file=environments/dev/terraform.tfvars
# Output: 3 resources will be created

# 2. Version control
$ git add -A
$ git commit -m "Add NAT gateway to dev environment"
$ git push origin feature/add-nat-gateway

# 3. Code review
# Team reviews the changes, sees exactly what will change

# 4. CI/CD runs automated checks
# GitHub Actions runs:
#   - terraform validate
#   - terraform plan
#   - tfsec (security scanning)
# If all pass, PR can be merged

# 5. Auto-apply in CI/CD
$ terraform apply  # After merge to main

Ansible: Idempotent Configuration Management

The Challenge: Making Things Repeatable

Manual configuration work breaks easily:

Without Ansible (manual SSH):

$ ssh web1.example.com
$ sudo apt-get update
$ sudo apt-get install -y nginx
$ sudo systemctl start nginx
$ sudo systemctl enable nginx

$ ssh web2.example.com
# Repeat the same commands... forgot a step? Now they're inconsistent
# New team member doesn't know what's installed where
# Disaster recovery = manually SSH to each server

With Ansible (repeatable):

  • Define desired state once
  • Apply to 100 servers with one command
  • Re-run anytime, same result (idempotent)
  • Document what’s on each server
  • Version controlled playbooks

Building Idempotent Playbooks

Idempotency means: running the playbook multiple times = same result (no changes on subsequent runs).

Bad: Non-idempotent task

---
- name: Configure web server
  hosts: web
  tasks:
    - name: Run setup script
      shell: /opt/setup.sh
      # Problem: Runs every time, even if already configured
      # Problem: Could fail if run twice

    - name: Append to config file
      shell: echo "ServerLimit 256" >> /etc/apache2/apache2.conf
      # Problem: Each run appends more lines!

Good: Idempotent tasks

---
- name: Configure web server
  hosts: web
  tasks:
    - name: Ensure required packages installed
      package:
        name: "{{ item }}"
        state: present  # Ensures installed, idempotent
      loop:
        - nginx
        - openssl
        - curl

    - name: Copy nginx configuration
      copy:
        src: files/nginx.conf
        dest: /etc/nginx/nginx.conf
        owner: root
        group: root
        mode: '0644'
        backup: yes  # Creates backup if changes
      notify: restart nginx  # Only restart if changed

    - name: Ensure nginx is running and enabled
      systemd:
        name: nginx
        state: started  # Only starts if not running
        enabled: yes    # Only enables if not enabled
        daemon_reload: yes

  handlers:
    - name: restart nginx
      systemd:
        name: nginx
        state: restarted

What makes this idempotent:

  • state: present checks if already installed
  • copy module compares files, only updates if different
  • systemd checks current status before acting
  • Handlers only run if a notify was triggered

Running this multiple times:

# First run:
$ ansible-playbook site.yml
# Output: 3 changed (installed packages, copied config, started nginx)

# Second run:
$ ansible-playbook site.yml
# Output: 0 changed (everything already in desired state)
# This is idempotency - safe to run repeatedly

Real Configuration Example

Role structure for a web application:

roles/
└── web_app/
    β”œβ”€β”€ files/
    β”‚   └── nginx.conf           # Static nginx config
    β”œβ”€β”€ templates/
    β”‚   └── app-env.j2           # Template with variables
    β”œβ”€β”€ tasks/
    β”‚   β”œβ”€β”€ main.yml
    β”‚   β”œβ”€β”€ install.yml
    β”‚   β”œβ”€β”€ configure.yml
    β”‚   └── deploy.yml
    β”œβ”€β”€ handlers/
    β”‚   └── main.yml             # Service restart handlers
    β”œβ”€β”€ defaults/
    β”‚   └── main.yml             # Default variables
    └── vars/
        └── main.yml             # Role-specific variables
# roles/web_app/tasks/main.yml
---
- name: Install dependencies
  package:
    name: "{{ item }}"
    state: present
  loop: "{{ packages_to_install }}"

- name: Create application user
  user:
    name: appuser
    home: /home/appuser
    shell: /bin/bash
    createhome: yes
    state: present

- name: Create app directory
  file:
    path: /opt/myapp
    state: directory
    owner: appuser
    group: appuser
    mode: '0755'

- name: Copy application files
  copy:
    src: ../files/app/
    dest: /opt/myapp/
    owner: appuser
    group: appuser
    mode: '0755'

- name: Generate environment configuration
  template:
    src: app-env.j2
    dest: /opt/myapp/.env
    owner: appuser
    group: appuser
    mode: '0600'  # Secrets file, restrictive permissions
  notify: restart app service

- name: Install Python dependencies
  pip:
    requirements: /opt/myapp/requirements.txt
    virtualenv: /opt/myapp/venv
  become_user: appuser

- name: Create systemd service file
  template:
    src: app-service.j2
    dest: /etc/systemd/system/myapp.service
    owner: root
    group: root
    mode: '0644'
  notify: restart app service

- name: Enable and start application
  systemd:
    name: myapp
    state: started
    enabled: yes
    daemon_reload: yes
# roles/web_app/templates/app-env.j2
# Generated from Ansible template
ENVIRONMENT={{ app_environment }}
DATABASE_URL=postgresql://{{ db_user }}:{{ db_password }}@{{ db_host }}/{{ db_name }}
LOG_LEVEL={{ log_level }}
SECRET_KEY={{ secret_key }}
API_TIMEOUT=30
# roles/web_app/handlers/main.yml
---
- name: restart app service
  systemd:
    name: myapp
    state: restarted
  listen: "app service needs restart"
# roles/web_app/defaults/main.yml
---
packages_to_install:
  - python3
  - python3-pip
  - postgresql-client
  - curl

app_environment: production
log_level: info

Using the role:

# playbooks/deploy.yml
---
- name: Deploy web application
  hosts: web_servers
  become: yes
  roles:
    - web_app
  vars:
    app_environment: "{{ target_env }}"  # From -e flag
    db_host: "db.example.com"
    db_user: "app_user"
    db_password: "{{ vault_db_password }}"  # From Ansible vault
    db_name: "app_db"

Deploying:

# Deploy to dev
$ ansible-playbook playbooks/deploy.yml \
  -i inventory/dev.ini \
  -e target_env=dev

# Deploy to production
$ ansible-playbook playbooks/deploy.yml \
  -i inventory/production.ini \
  -e target_env=production

This playbook is idempotent. Run it 10 times, same result.

Kubernetes: GitOps and IaC

The GitOps Pattern

Modern Kubernetes deployments follow GitOps: your Git repository is the source of truth for your entire cluster.

Without GitOps (manual kubectl):

$ kubectl apply -f app.yaml
$ kubectl set image deployment/app app=myapp:v2.1
$ kubectl scale deployment/app --replicas=5
$ kubectl port-forward pod/debug-xyz 8080:8080

# Now your cluster state is different from your Git repo
# No audit trail of who changed what
# New team member doesn't know how to recreate the cluster
# Disaster recovery = starting from scratch

With GitOps (all in Git):

gitops-repo/
β”œβ”€β”€ base/
β”‚   β”œβ”€β”€ kustomization.yaml
β”‚   β”œβ”€β”€ app-deployment.yaml
β”‚   β”œβ”€β”€ app-service.yaml
β”‚   └── app-config.yaml
β”‚
β”œβ”€β”€ overlays/
β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   └── kustomization.yaml  (3 replicas, dev domain)
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   └── kustomization.yaml  (5 replicas, staging domain)
β”‚   └── production/
β”‚       └── kustomization.yaml  (10 replicas, prod domain)
β”‚
└── .github/workflows/
    └── deploy.yml  (ArgoCD watches this repo)

All changes flow through Git:

  1. Edit YAML in Git
  2. Create PR
  3. Team reviews
  4. Merge to main
  5. ArgoCD automatically syncs to cluster

Example: Deploying new app version

# base/app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: default
spec:
  replicas: 3  # Overridden by overlays
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
        version: v2.1
    spec:
      containers:
      - name: app
        image: myregistry/app:v2.1  # Image tag in Git
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

To deploy a new version:

# Instead of: kubectl set image deployment/app app=myapp:v2.2

# Do this:
$ git checkout -b bump-app-version
# Edit: base/app-deployment.yaml, change image tag to v2.2
$ git add base/app-deployment.yaml
$ git commit -m "Bump app to v2.2"
$ git push origin bump-app-version
# Open PR β†’ team reviews β†’ merge

# ArgoCD automatically sees the change and syncs
# Result: New version running, change tracked in Git

Resource Limits and Requests

Kubernetes requires you to think about resource usage:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: myapp:2.1
        resources:
          requests:  # Minimum resources needed
            cpu: 200m        # 0.2 CPU cores
            memory: 256Mi    # 256 MB
          limits:    # Maximum resources allowed
            cpu: 1000m       # 1 CPU core
            memory: 1Gi      # 1 GB
        # Problem: If actual use exceeds limits, pod gets killed
        # Solution: Use monitoring to adjust

What happens:

Pod requests 200m, limit 1000m
β”œβ”€β”€ Node has 1000m available
β”œβ”€β”€ Pod can use 200m-1000m depending on contention
└── If Pod tries to use >1000m, it gets OOMKilled

Pod requests 1Gi memory, limit 2Gi
β”œβ”€β”€ Node allocates 1Gi for scheduling purposes
β”œβ”€β”€ Pod can use 1Gi-2Gi
└── If it tries >2Gi, container restarts

Good practices:

# Requests = what the pod needs to run (for scheduling)
# Limits = maximum before it gets killed (for safety)

# Conservative but safe:
requests:
  cpu: 100m
  memory: 128Mi
limits:
  cpu: 500m
  memory: 512Mi

# Watch your actual usage with:
# kubectl top pod POD_NAME
# kubectl top node

# Adjust after observing real usage for 1-2 weeks

Security with RBAC and Service Accounts

What is RBAC?

RBAC stands for Role-Based Access Control. In Kubernetes, it’s a security mechanism that answers three questions:

Who can do what?

  • Who: Service accounts (identities for applications)
  • Do what: Specific actions (get, list, create, delete)
  • On what: Specific resources (pods, secrets, configmaps)

Without RBAC - The Problem:

Imagine your application is deployed in Kubernetes. By default:

  • Your app runs as root user
  • Your app can do ANYTHING in the cluster
  • If a hacker compromises your app, they have full cluster access
  • They can steal secrets, delete databases, access other applications
Compromised App β†’ Full Cluster Access β†’ Data Breach

With RBAC - The Solution:

Your app has a service account with minimal permissions:

  • Can only read its own ConfigMap
  • Can only read its own Secret
  • Cannot list other secrets
  • Cannot delete pods
  • Cannot access other namespaces
Compromised App β†’ Limited to own resources β†’ Breach contained

Why Do We Need It?

Real-world scenario: Data breach through a compromised application

1. Attacker finds SQL injection in your app
2. Exploits it to run commands inside pod
3. WITHOUT RBAC:
   - Attacker runs: kubectl get secrets -A
   - Gets ALL secrets from ALL namespaces
   - Finds database credentials for production database
   - Accesses production data
   - Data breach: 100 million users affected
   
4. WITH RBAC:
   - Attacker runs: kubectl get secrets -A
   - Kubernetes returns: "Error: permission denied"
   - Attacker only has access to this app's one secret
   - Cannot see other applications' credentials
   - Breach limited to this one app's data

The principle: Principle of Least Privilege

  • Give each application the MINIMUM permissions it needs
  • If app only needs to read ConfigMap, don’t give it Secret access
  • If app only needs one Secret, don’t give it all Secrets
  • If app doesn’t need to delete pods, don’t give it that permission

Step 1: Create a Service Account

A service account is an identity for your application (like a user account for apps).

---
# Step 1: Create a service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app                    # Name of the service account
  namespace: production        # Only valid in this namespace

# Why you need this:
# - Each app should have its own identity
# - Kubernetes authenticates using this account
# - Audit logs show which account did what
# - Makes security easier to manage

What happens without a service account:

  • Pod uses default service account
  • Default account often has too many permissions
  • Hard to track which app did what in logs
  • Security risk if default is compromised

Step 2: Create a Role with Minimal Permissions

A role defines what actions are allowed on which resources.

---
# Step 2: Create a role with EXACTLY the permissions needed
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: app                    # Role name
  namespace: production        # Only for this namespace

rules:
# RULE 1: Read ConfigMaps, but only the app's own ConfigMap
- apiGroups: [""]             # Empty string = core API
  resources: ["configmaps"]   # Only ConfigMaps
  resourceNames: ["app-config"]  # ONLY this specific ConfigMap!
  verbs: ["get", "list", "watch"]  # Only read operations
  
  # Example: App needs to read configuration
  # Allowed: kubectl get configmap app-config
  # Denied: kubectl get configmap other-config
  # Denied: kubectl delete configmap app-config

# RULE 2: Read Secrets, but only the app's own Secret
- apiGroups: [""]
  resources: ["secrets"]      # Only Secrets
  resourceNames: ["app-secret"]  # ONLY this specific Secret!
  verbs: ["get"]              # Only get (not list, not delete)
  
  # Example: App needs database password from Secret
  # Allowed: kubectl get secret app-secret
  # Denied: kubectl get secret admin-secret
  # Denied: kubectl list secrets (can't see all secrets)
  # This is important! Even listing secrets can be a leak!

# What's NOT in this role:
# - Can't create pods (can't spawn new containers)
# - Can't delete pods (can't break cluster)
# - Can't create secrets (can't store malicious data)
# - Can't access other namespaces (confined to production)

Real example of what happens:

# App inside pod tries to:
$ kubectl get secrets
# Result: Error! Permission denied
# Why: Role only allows "get" on specific secret "app-secret", not "list" all secrets

$ kubectl get secret app-secret
# Result: Success! App gets its credentials
# Why: Role specifically allows this

$ kubectl delete configmap app-config
# Result: Error! Permission denied
# Why: Role only allows "get, list, watch" - not "delete"

Step 3: Bind Role to Service Account

A RoleBinding connects a Role to a ServiceAccount, saying “this service account has this role”.

---
# Step 3: Create a RoleBinding (connect Role to ServiceAccount)
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app                    # Name of the binding
  namespace: production        # Same namespace as Role

roleRef:                       # Reference to the Role
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: app                    # Name of the Role we created above

subjects:                      # Who gets this role
- kind: ServiceAccount         # It's a service account
  name: app                    # The service account name
  namespace: production        # In this namespace

What this accomplishes:

  • Service account “app” now has the permissions defined in Role “app”
  • Any pod using this service account gets these permissions
  • Multiple service accounts can have the same role
  • Multiple roles can be applied to one service account

Step 4: Use Service Account in Deployment

Now configure your pod to use this restricted service account.

---
# Step 4: Use the service account in your deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      # CRITICAL: Use the restricted service account
      serviceAccountName: app  # Use our service account!
      
      # CRITICAL: Don't run as root
      securityContext:
        runAsNonRoot: true      # Refuse to run as root
        runAsUser: 1000         # Run as unprivileged user (UID 1000)
        fsGroup: 2000           # File system group for volume mounts
        
      containers:
      - name: app
        image: myapp:2.1
        
        # Container-level security settings
        securityContext:
          # Prevent privilege escalation (sudo-like operations)
          allowPrivilegeEscalation: false
          
          # Read-only filesystem (app can't modify container)
          # If compromised, attacker can't install tools or backdoors
          readOnlyRootFilesystem: true
          
          # Drop all Linux capabilities
          # Prevents: mount, network operations, privilege escalation
          capabilities:
            drop:
              - ALL  # Drop EVERYTHING by default
          
          # If your app needs specific capabilities, add them back:
          # add:
          #   - NET_BIND_SERVICE  # Only if it needs to bind to ports <1024

Why each security setting matters:

SettingProtection
serviceAccountNameOnly access allowed resources
runAsNonRootCan’t run as root (blocks full cluster takeover)
runAsUser: 1000Uses unprivileged user (limited damage)
allowPrivilegeEscalation: falseCan’t escalate to root with sudo
readOnlyRootFilesystemCan’t install backdoors/malware
drop: ALL capabilitiesCan’t do system-level operations

Real-World Security Scenario

Scenario: Kubernetes cluster with 5 applications

# App 1: Web Frontend (needs only to read config)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-frontend
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: web-frontend
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["frontend-config"]
  verbs: ["get"]

---
# App 2: API Server (needs config + database secret + logging)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: api-server
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["api-config"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["db-credentials", "api-key"]
  verbs: ["get"]

---
# App 3: Background Worker (needs only message queue secret)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: worker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: worker
rules:
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["queue-credentials"]
  verbs: ["get"]

If API Server is compromised:

Attacker can:
  βœ“ Read api-config ConfigMap
  βœ“ Read db-credentials Secret
  βœ“ Read api-key Secret

Attacker CANNOT:
  βœ— Read frontend-config (not allowed)
  βœ— Read queue-credentials (not allowed)
  βœ— Create new secrets (no permission)
  βœ— Delete pods (no permission)
  βœ— Access admin account (different service account)

Damage contained! Without RBAC, attacker could access everything.

Multi-Tool Orchestration

Understanding the Complete Pipeline

Real infrastructure deployments combine multiple tools in sequence:

The complete flow (AWS best practice):

1. Infrastructure Layer (Terraform)
   ↓
2. Application Layer (Kubernetes / EKS)
   ↓
3. Running Applications

Using AWS EKS simplifies this dramatically:

Why EKS over manual Kubernetes on EC2?

AspectEC2 + Manual K8sAWS EKS
Setup Time2-3 hours15-20 minutes
MaintenanceYou manage everythingAWS manages control plane
UpdatesManual version upgradesAutomated updates
Security PatchesYou apply themAWS applies them
Multi-AZManual setupBuilt-in by default
CostLower (you manage it)Higher (but less operational work)
Best forLearning, custom needsProduction, managed service

With EKS, you only manage worker nodes. AWS manages:

  • Kubernetes control plane (API server, etcd, scheduler)
  • Master node availability
  • Security patches
  • Updates
  • Backups

Terraform + EKS Deployment (Simplified)

File structure:

infrastructure/
β”œβ”€β”€ main.tf           (Create VPC and EKS cluster)
β”œβ”€β”€ variables.tf      (Input variables)
β”œβ”€β”€ outputs.tf        (Cluster info for kubectl)
└── environments/
    β”œβ”€β”€ dev/
    β”‚   └── terraform.tfvars
    └── production/
        └── terraform.tfvars

Step 1: Create VPC for EKS

# main.tf - Part 1: Network Infrastructure

# Create VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.environment}-vpc"
  }
}

# Create public subnets (for load balancers and NAT)
resource "aws_subnet" "public" {
  count = length(var.azs)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 4, count.index)
  availability_zone       = var.azs[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name                                      = "${var.environment}-public-${count.index + 1}"
    "kubernetes.io/role/elb"                  = "1"  # EKS needs this tag
  }
}

# Create private subnets (for EKS worker nodes)
resource "aws_subnet" "private" {
  count = length(var.azs)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index + length(var.azs))
  availability_zone = var.azs[count.index]

  tags = {
    Name                                      = "${var.environment}-private-${count.index + 1}"
    "kubernetes.io/role/internal-elb"         = "1"  # EKS needs this tag
  }
}

# Create Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.environment}-igw"
  }
}

# Create NAT Gateway for private subnet internet access
resource "aws_eip" "nat" {
  count  = length(var.azs)
  domain = "vpc"

  tags = {
    Name = "${var.environment}-nat-eip-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "main" {
  count         = length(var.azs)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.environment}-nat-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# Route tables for public subnets
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block      = "0.0.0.0/0"
    gateway_id      = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.environment}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(aws_subnet.public)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Route tables for private subnets
resource "aws_route_table" "private" {
  count  = length(var.azs)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block      = "0.0.0.0/0"
    nat_gateway_id  = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.environment}-private-rt-${count.index + 1}"
  }
}

resource "aws_route_table_association" "private" {
  count          = length(aws_subnet.private)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Step 2: Create IAM roles for EKS

# main.tf - Part 2: IAM Roles

# IAM role for EKS cluster (control plane)
resource "aws_iam_role" "eks_cluster_role" {
  name = "${var.environment}-eks-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"  # EKS service can assume this role
        }
      }
    ]
  })
}

# Attach required policy for cluster
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster_role.name
}

# IAM role for EKS worker nodes
resource "aws_iam_role" "eks_worker_role" {
  name = "${var.environment}-eks-worker-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"  # EC2 instances (nodes) can assume this role
        }
      }
    ]
  })
}

# Attach required policies for worker nodes
resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_worker_role.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_worker_role.name
}

resource "aws_iam_role_policy_attachment" "eks_registry_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_worker_role.name
}

# Instance profile for worker nodes
resource "aws_iam_instance_profile" "eks_worker_profile" {
  name = "${var.environment}-eks-worker-profile"
  role = aws_iam_role.eks_worker_role.name
}

Step 3: Create EKS Cluster

# main.tf - Part 3: EKS Cluster

# Security group for EKS cluster
resource "aws_security_group" "eks_cluster" {
  name        = "${var.environment}-eks-cluster-sg"
  description = "Security group for EKS cluster"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # In production, restrict this!
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.environment}-eks-cluster-sg"
  }
}

# Create EKS Cluster
resource "aws_eks_cluster" "main" {
  name            = "${var.environment}-cluster"
  role_arn        = aws_iam_role.eks_cluster_role.arn
  version         = var.kubernetes_version  # e.g., "1.27"

  vpc_config {
    subnet_ids              = concat(aws_subnet.private[*].id, aws_subnet.public[*].id)
    endpoint_private_access = true   # Internal access
    endpoint_public_access  = true   # External access via kubectl
    security_group_ids      = [aws_security_group.eks_cluster.id]
  }

  enabled_cluster_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler"
  ]

  tags = {
    Name = "${var.environment}-eks-cluster"
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy
  ]
}

# Create EKS Node Group (managed worker nodes)
resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.environment}-node-group"
  node_role_arn   = aws_iam_role.eks_worker_role.arn
  subnet_ids      = aws_subnet.private[*].id

  scaling_config {
    desired_size = var.desired_node_count
    max_size     = var.max_node_count
    min_size     = var.min_node_count
  }

  instance_types = [var.node_instance_type]  # e.g., "t3.medium"

  tags = {
    Name = "${var.environment}-node-group"
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_registry_policy
  ]
}

Step 4: Output Kubeconfig Information

# outputs.tf

output "cluster_name" {
  value       = aws_eks_cluster.main.name
  description = "EKS cluster name"
}

output "cluster_endpoint" {
  value       = aws_eks_cluster.main.endpoint
  description = "EKS cluster API endpoint"
}

output "cluster_version" {
  value       = aws_eks_cluster.main.version
  description = "EKS cluster Kubernetes version"
}

# Command to update kubeconfig
output "configure_kubectl" {
  value       = "aws eks update-kubeconfig --region ${var.aws_region} --name ${aws_eks_cluster.main.name}"
  description = "Command to configure kubectl"
}

Simplified Deployment Script

File: deploy-infrastructure.sh

#!/bin/bash
# Simplified deployment: Terraform + EKS only
# No need for Ansible anymore!

set -e

ENVIRONMENT=${1:-dev}
REGION=${2:-us-east-1}

echo "╔════════════════════════════════════════════════╗"
echo "β•‘  Deploying EKS Infrastructure: ${ENVIRONMENT}   β•‘"
echo "β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•"

# ════════════════════════════════════════════════════════
# PHASE 1: PROVISION INFRASTRUCTURE WITH TERRAFORM
# ════════════════════════════════════════════════════════
echo ""
echo "βœ“ PHASE 1: Creating VPC and EKS cluster..."
cd terraform/

terraform init
terraform plan \
  -var-file="environments/${ENVIRONMENT}/terraform.tfvars" \
  -var="aws_region=$REGION" \
  -out=tfplan
terraform apply tfplan

# Get cluster info
CLUSTER_NAME=$(terraform output -raw cluster_name)
KUBECONFIG_CMD=$(terraform output -raw configure_kubectl)

cd ..

echo "   β€’ EKS Cluster: $CLUSTER_NAME"
echo "   β€’ Region: $REGION"

# ════════════════════════════════════════════════════════
# PHASE 2: CONFIGURE KUBECTL
# ════════════════════════════════════════════════════════
echo ""
echo "βœ“ PHASE 2: Configuring kubectl..."

# Update kubeconfig (AWS managed, no SSH needed!)
eval $KUBECONFIG_CMD

# Verify cluster access
kubectl get nodes

echo "   β€’ Cluster access configured"
echo "   β€’ Worker nodes ready"

# ════════════════════════════════════════════════════════
# PHASE 3: DEPLOY APPLICATIONS
# ════════════════════════════════════════════════════════
echo ""
echo "βœ“ PHASE 3: Deploying applications..."

# Wait for nodes to be ready
kubectl wait --for=condition=Ready node --all --timeout=600s

# Deploy applications
kubectl apply -k kubernetes/overlays/${ENVIRONMENT}

echo "   β€’ Applications deployed"
echo "   β€’ Services configured"

# ════════════════════════════════════════════════════════
# VERIFY DEPLOYMENT
# ════════════════════════════════════════════════════════
echo ""
echo "βœ“ VERIFICATION"

echo "   β€’ Nodes:"
kubectl get nodes -o wide

echo ""
echo "   β€’ Pods:"
kubectl get pods -A

echo "   β€’ Services:"
kubectl get svc -A

# ════════════════════════════════════════════════════════
# FINAL STATUS
# ════════════════════════════════════════════════════════
echo ""
echo "╔════════════════════════════════════════════════╗"
echo "β•‘  βœ… DEPLOYMENT COMPLETE!                      β•‘"
echo "╠════════════════════════════════════════════════╣"
echo "β•‘  EKS Cluster: $CLUSTER_NAME                    β•‘"
echo "β•‘  Region: $REGION                               β•‘"
echo "β•‘                                                β•‘"
echo "β•‘  Access your applications:                     β•‘"
echo "β•‘  β†’ kubectl get svc -A                          β•‘"
echo "β•‘  β†’ kubectl port-forward svc/...                β•‘"
echo "β•‘                                                β•‘"
echo "β•‘  View cluster:                                 β•‘"
echo "β•‘  β†’ AWS Console: EKS β†’ Clusters                 β•‘"
echo "β•‘  β†’ AWS CloudWatch Logs                         β•‘"
echo "β•‘                                                β•‘"
echo "β•‘  Kubectl context:                              β•‘"
echo "β•‘  β†’ kubectl config current-context              β•‘"
echo "β•‘  β†’ kubectl cluster-info                        β•‘"
echo "β•‘                                                β•‘"
echo "β•‘  Timestamp: $(date)                            β•‘"
echo "β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•"

Why EKS is Better for AWS

Advantages:

  1. No Configuration Layer - EKS is fully managed, no Ansible needed
  2. Automated Control Plane - AWS handles master nodes, upgrades, patches
  3. Multi-AZ by Default - Spreads across 3 availability zones
  4. Integrated with AWS Services - RDS, ALB, IAM, CloudWatch, VPC
  5. Security Patches Automatic - AWS patches vulnerabilities immediately
  6. Simpler Backup/Recovery - Managed by AWS
  7. Compliance - Easier to meet regulatory requirements

Simpler deployment flow:

Terraform creates:
  βœ“ VPC with subnets
  βœ“ EKS cluster (control plane)
  βœ“ Node groups (worker nodes)
  βœ“ IAM roles

kubectl applies:
  βœ“ Deployments
  βœ“ Services
  βœ“ Ingress
  βœ“ ConfigMaps, Secrets

No Ansible needed because:

  • EKS worker nodes come pre-configured
  • Kubernetes, Docker, and CNI already installed
  • Security hardening already applied by AWS
  • Just add your applications with kubectl

Infrastructure as Code with EKS

Your final deployment:

Terraform        β†’ Create cloud infrastructure (VPC, EKS, nodes)
kubectl/Helm     β†’ Deploy containerized applications
ArgoCD/Flux      β†’ Continuous GitOps synchronization

This is the AWS-native, production-recommended approach! 🎯