Introduction
Terraform state is the source of truth for your infrastructure. Proper state management is critical for team collaboration, preventing conflicts, and maintaining infrastructure integrity. This guide covers remote backends, locking mechanisms, and workspace strategies.
Understanding Terraform State
What is State?
State is Terraform’s way of tracking which real-world resources correspond to your configuration. It’s stored in terraform.tfstate
file.
State file contains:
- Resource mappings
- Metadata
- Resource dependencies
- Attribute values
Why State Matters
Without proper state management:
- Team members overwrite each other’s changes
- State file gets lost or corrupted
- No locking causes race conditions
- Secrets exposed in version control
- Cannot track infrastructure drift
With proper state management:
- Team collaboration enabled
- State is backed up and versioned
- Locking prevents conflicts
- Secrets are encrypted
- Audit trail maintained
Remote Backends
Local State (Default)
Bad for teams:
# terraform.tf
# No backend configuration
# State stored locally in terraform.tfstate
Problems:
- State on individual laptops
- No collaboration possible
- No locking
- No backup
- Secrets in plaintext
Only use for:
- Personal learning
- Proof of concepts
- Single-user environments
S3 Backend (AWS)
Recommended for AWS:
# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Optional: Use role assumption
role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
# Optional: Versioning enabled on bucket
# Optional: Server-side encryption with KMS
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abc-def-123"
}
}
Setup S3 backend:
# Create S3 bucket
aws s3 mb s3://mycompany-terraform-state --region us-east-1
# Enable versioning
aws s3api put-bucket-versioning \
--bucket mycompany-terraform-state \
--versioning-configuration Status=Enabled
# Enable encryption
aws s3api put-bucket-encryption \
--bucket mycompany-terraform-state \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# Block public access
aws s3api put-public-access-block \
--bucket mycompany-terraform-state \
--public-access-block-configuration \
BlockPublicAcls=true,\
IgnorePublicAcls=true,\
BlockPublicPolicy=true,\
RestrictPublicBuckets=true
# Create DynamoDB table for locking
aws dynamodb create-table \
--table-name terraform-state-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
Bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/TerraformRole"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::mycompany-terraform-state/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/TerraformRole"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::mycompany-terraform-state"
}
]
}
Azure Storage Backend
# backend.tf
terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstate12345"
container_name = "tfstate"
key = "production.terraform.tfstate"
# Use service principal or managed identity
use_azuread_auth = true
}
}
Setup Azure backend:
# Create resource group
az group create --name terraform-state-rg --location eastus
# Create storage account
az storage account create \
--name tfstate12345 \
--resource-group terraform-state-rg \
--location eastus \
--sku Standard_LRS \
--encryption-services blob \
--https-only true \
--min-tls-version TLS1_2
# Create container
az storage container create \
--name tfstate \
--account-name tfstate12345 \
--auth-mode login
# Enable versioning
az storage account blob-service-properties update \
--account-name tfstate12345 \
--resource-group terraform-state-rg \
--enable-versioning true
GCS Backend (Google Cloud)
# backend.tf
terraform {
backend "gcs" {
bucket = "mycompany-terraform-state"
prefix = "production/vpc"
# Optional: Custom encryption key
encryption_key = "base64-encoded-key"
}
}
Setup GCS backend:
# Create bucket
gsutil mb gs://mycompany-terraform-state
# Enable versioning
gsutil versioning set on gs://mycompany-terraform-state
# Set lifecycle to keep versions
cat > lifecycle.json <<EOF
{
"lifecycle": {
"rule": [
{
"action": {"type": "Delete"},
"condition": {
"numNewerVersions": 10,
"isLive": false
}
}
]
}
}
EOF
gsutil lifecycle set lifecycle.json gs://mycompany-terraform-state
Terraform Cloud Backend
# backend.tf
terraform {
cloud {
organization = "mycompany"
workspaces {
name = "production-vpc"
}
}
}
Benefits:
- Built-in state management
- Automatic locking
- State versioning
- Encrypted at rest
- Access control
- Audit logs
- Remote execution
- Cost estimation
State Locking
Why Locking Matters
Without locking:
User A: terraform apply (starts)
User B: terraform apply (starts simultaneously)
Result: State corruption, resource conflicts
With locking:
User A: terraform apply (acquires lock)
User B: terraform apply (waits for lock)
User A: completes, releases lock
User B: acquires lock, proceeds
Locking Mechanisms
S3 + DynamoDB:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock" # Enables locking
}
}
Azure Storage:
# Locking enabled automatically
terraform {
backend "azurerm" {
# Blob lease used for locking
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstate12345"
container_name = "tfstate"
key = "production.terraform.tfstate"
}
}
GCS:
# Locking enabled automatically
terraform {
backend "gcs" {
bucket = "mycompany-terraform-state"
prefix = "production"
}
}
Handling Lock Issues
Check lock status:
# View lock info
terraform force-unlock <LOCK_ID>
# Only use if you're sure no other process is running
terraform force-unlock -force <LOCK_ID>
Lock timeout:
# Set lock timeout (default 0 = no timeout)
terraform apply -lock-timeout=10m
Stuck lock scenario:
# Identify lock ID
terraform plan
# Error: Error acquiring the state lock
# Lock ID: abc123-def456-...
# Verify no other terraform process running
ps aux | grep terraform
# Force unlock (use cautiously!)
terraform force-unlock abc123-def456-...
Workspace Strategies
What are Workspaces?
Workspaces allow multiple state files for the same configuration.
Default workspace:
# Always exists
terraform workspace list
# * default
Creating Workspaces
# Create workspace
terraform workspace new development
terraform workspace new staging
terraform workspace new production
# List workspaces
terraform workspace list
# default
# development
# * staging
# production
# Switch workspace
terraform workspace select production
# Show current workspace
terraform workspace show
Workspace Use Cases
1. Environment Separation
# variables.tf
variable "environment" {
type = string
default = terraform.workspace
}
variable "instance_count" {
type = map(number)
default = {
development = 1
staging = 2
production = 5
}
}
# main.tf
resource "aws_instance" "app" {
count = var.instance_count[terraform.workspace]
instance_type = terraform.workspace == "production" ? "t3.large" : "t3.micro"
tags = {
Name = "app-${terraform.workspace}-${count.index}"
Environment = terraform.workspace
}
}
Deploy to different environments:
# Development
terraform workspace select development
terraform apply
# Staging
terraform workspace select staging
terraform apply
# Production
terraform workspace select production
terraform apply
2. Feature Branch Development
# Create workspace for feature branch
terraform workspace new feature-vpc-peering
# Develop and test
terraform apply
# When done, destroy and delete workspace
terraform destroy
terraform workspace select default
terraform workspace delete feature-vpc-peering
3. Multi-Region Deployment
# main.tf
locals {
region_map = {
us-east = "us-east-1"
us-west = "us-west-2"
eu-central = "eu-central-1"
}
region = local.region_map[terraform.workspace]
}
provider "aws" {
region = local.region
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "vpc-${terraform.workspace}"
Region = local.region
}
}
# Deploy to multiple regions
terraform workspace new us-east
terraform apply
terraform workspace new us-west
terraform apply
terraform workspace new eu-central
terraform apply
Workspace Limitations
Not recommended for:
- Long-lived environment separation (use separate state files)
- When environments need different backends
- When you need strict RBAC per environment
Better alternative: Separate directories
infrastructure/
βββ environments/
β βββ development/
β β βββ main.tf
β β βββ backend.tf (dev bucket)
β β βββ variables.tf
β βββ staging/
β β βββ main.tf
β β βββ backend.tf (staging bucket)
β β βββ variables.tf
β βββ production/
β βββ main.tf
β βββ backend.tf (prod bucket)
β βββ variables.tf
βββ modules/
βββ vpc/
βββ main.tf
State File Organization
Hierarchical Structure
s3://mycompany-terraform-state/
βββ production/
β βββ network/
β β βββ terraform.tfstate
β βββ databases/
β β βββ terraform.tfstate
β βββ kubernetes/
β β βββ terraform.tfstate
β βββ applications/
β βββ terraform.tfstate
βββ staging/
β βββ network/
β β βββ terraform.tfstate
β βββ applications/
β βββ terraform.tfstate
βββ development/
βββ all/
βββ terraform.tfstate
Benefits:
- Blast radius isolation
- Easier to manage
- Faster operations
- Better access control
State File Separation
Monolithic (not recommended):
# Single state file for everything
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/everything.tfstate"
}
}
# All resources in one configuration
resource "aws_vpc" "main" {}
resource "aws_db_instance" "main" {}
resource "aws_eks_cluster" "main" {}
# ... 100s more resources
Problems:
- Long apply times
- Higher risk of errors
- Hard to delegate
- Complex dependencies
Modular (recommended):
infrastructure/
βββ network/
β βββ backend.tf # key = "production/network/terraform.tfstate"
β βββ main.tf # VPC, subnets, etc.
βββ database/
β βββ backend.tf # key = "production/database/terraform.tfstate"
β βββ main.tf # RDS instances
βββ kubernetes/
βββ backend.tf # key = "production/kubernetes/terraform.tfstate"
βββ main.tf # EKS cluster
Data Sources for Cross-State References
# database/main.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/database/terraform.tfstate"
}
}
# Reference VPC from network state
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "mycompany-terraform-state"
key = "production/network/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_db_subnet_group" "main" {
subnet_ids = data.terraform_remote_state.network.outputs.private_subnet_ids
}
State Operations
Viewing State
# List all resources
terraform state list
# Show specific resource
terraform state show aws_instance.app
# Show all state
terraform show
Moving Resources
# Move resource within state
terraform state mv aws_instance.app aws_instance.web
# Move resource to different state
terraform state mv -state-out=../other/terraform.tfstate \
aws_instance.app aws_instance.app
Removing Resources
# Remove from state (resource stays in cloud)
terraform state rm aws_instance.old_app
# Remove and destroy
terraform destroy -target=aws_instance.old_app
Importing Existing Resources
# Import resource
terraform import aws_instance.app i-1234567890abcdef
# Import with module
terraform import 'module.vpc.aws_vpc.main' vpc-abc123
Import block (Terraform 1.5+):
import {
to = aws_instance.app
id = "i-1234567890abcdef"
}
resource "aws_instance" "app" {
# Configuration will be populated
}
State Refresh
# Refresh state with real infrastructure
terraform refresh
# Plan automatically refreshes (unless disabled)
terraform plan
# Disable refresh during plan
terraform plan -refresh=false
State Recovery
Restore from S3 versioning:
# List versions
aws s3api list-object-versions \
--bucket mycompany-terraform-state \
--prefix production/vpc/terraform.tfstate
# Download specific version
aws s3api get-object \
--bucket mycompany-terraform-state \
--key production/vpc/terraform.tfstate \
--version-id abc123 \
terraform.tfstate
Restore from Terraform Cloud:
# List state versions
terraform state list
# Download specific version
terraform state pull > terraform.tfstate.backup
Security Best Practices
1. Encrypt State
S3 with KMS:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/terraform.tfstate"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abc-123"
}
}
2. Restrict Access
S3 bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::mycompany-terraform-state",
"arn:aws:s3:::mycompany-terraform-state/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::mycompany-terraform-state",
"arn:aws:s3:::mycompany-terraform-state/production/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/terraform-state-lock"
}
]
}
3. Enable Versioning
Always enable versioning on state storage:
- S3: Bucket versioning
- Azure: Blob versioning
- GCS: Object versioning
4. Don’t Commit State to Git
.gitignore:
# .gitignore
*.tfstate
*.tfstate.*
.terraform/
.terraform.lock.hcl
5. Audit Access
CloudTrail for S3:
resource "aws_cloudtrail" "state_access" {
name = "terraform-state-audit"
s3_bucket_name = aws_s3_bucket.cloudtrail.id
event_selector {
read_write_type = "All"
include_management_events = true
data_resource {
type = "AWS::S3::Object"
values = ["arn:aws:s3:::mycompany-terraform-state/*"]
}
}
}
Troubleshooting
State Corruption
Symptoms:
terraform plan
# Error: state snapshot was created by Terraform v1.6.0,
# which is newer than current v1.5.0
Recovery:
# Restore from backup
cp terraform.tfstate.backup terraform.tfstate
# Or restore from S3 version
aws s3api get-object \
--bucket mycompany-terraform-state \
--key production/terraform.tfstate \
--version-id <GOOD_VERSION_ID> \
terraform.tfstate
State Drift
Detect drift:
# Check for changes
terraform plan -detailed-exitcode
# Exit codes:
# 0 = no changes
# 1 = error
# 2 = changes detected
Automatic drift detection:
#!/bin/bash
# drift-check.sh
terraform plan -detailed-exitcode -out=plan.tfplan
if [ $? -eq 2 ]; then
echo "Drift detected!"
terraform show plan.tfplan
# Send alert
fi
Lost State
If state is lost:
# 1. Try to recover from backup
# 2. Recreate state by importing
# Import all resources
terraform import aws_vpc.main vpc-abc123
terraform import aws_subnet.public subnet-def456
# ... import all resources
# Or use tools like terraformer
terraformer import aws --resources=vpc,subnet --regions=us-east-1
Complete Example
Production-Ready Setup
infrastructure/
βββ backend-setup/
β βββ main.tf # Creates S3 bucket and DynamoDB
βββ environments/
β βββ production/
β βββ network/
β β βββ backend.tf
β β βββ main.tf
β β βββ outputs.tf
β β βββ variables.tf
β βββ database/
β β βββ backend.tf
β β βββ main.tf
β β βββ data.tf # References network outputs
β βββ kubernetes/
β βββ backend.tf
β βββ main.tf
β βββ data.tf
βββ modules/
βββ vpc/
βββ rds/
backend-setup/main.tf:
# Create backend infrastructure
terraform {
# This uses local state
# Run once to create S3 and DynamoDB
}
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
environments/production/network/backend.tf:
terraform {
required_version = ">= 1.5.0"
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/network/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
Conclusion
Proper Terraform state management is essential for:
- Team collaboration - Remote backends enable multiple users
- Safety - Locking prevents conflicts and corruption
- Security - Encryption and access control protect sensitive data
- Reliability - Versioning and backups enable recovery
- Organization - Workspaces and state separation improve maintainability
Key takeaways:
- Always use remote backends for team environments
- Enable state locking to prevent conflicts
- Encrypt state at rest and in transit
- Use workspaces thoughtfully (or separate directories)
- Split large states into smaller, manageable pieces
- Back up state with versioning
- Audit state access
- Never commit state to version control
Following these practices ensures your Terraform state is secure, reliable, and manageable at scale.