Incident: SSL Certificate Expiry Causes Complete Outage
Incident Summary Date: 2025-08-15 Time: 08:00 UTC Duration: 1 hour 45 minutes Severity: SEV-1 (Critical) Impact: Complete service unavailability for all users
Quick Facts Users Affected: 100% - all external traffic Services Affected: All public-facing services Revenue Impact: ~$12,000 in lost sales SLO Impact: 80% of monthly error budget consumed in single incident Timeline 08:00:00 - SSL certificate expired (not detected) 08:00:30 - User reports started coming in: “Your connection is not private” 08:02:00 - PagerDuty alert: Health check failures from external monitoring 08:02:30 - On-call engineer (Sarah) acknowledged alert 08:03:00 - Opened website, saw SSL certificate error 08:03:30 - Checked certificate expiry: Expired at 08:00 UTC 08:04:00 - Root cause identified: SSL certificate expired 08:04:30 - Incident escalated to SEV-1, incident commander assigned 08:05:00 - Senior SRE (Mike) joined as incident commander 08:06:00 - Attempted automatic renewal with certbot: Failed - rate limit exceeded 08:08:00 - Checked Let’s Encrypt rate limits: Hit weekly renewal limit 08:10:00 - Decision: Use backup certificate from 6 months ago (still valid) 08:12:00 - Located backup certificate in secure storage 08:15:00 - Deployed backup certificate to load balancer 08:18:00 - Certificate updated, but services still showing errors 08:20:00 - Discovered cached certificate in CDN (Cloudflare) 08:22:00 - Purged Cloudflare cache 08:25:00 - Still seeing errors from some users 08:27:00 - Realized nginx not reloaded after certificate update 08:30:00 - Reloaded nginx on all load balancers 08:33:00 - Service partially restored, some users still affected 08:35:00 - Identified browser certificate caching 08:38:00 - Communicated workaround to users (clear browser cache) 08:45:00 - Traffic gradually recovering 09:00:00 - 90% of users able to access site 09:30:00 - 98% recovery, remaining issues browser caching 09:45:00 - Incident marked as resolved Root Cause Analysis What Happened Primary cause: SSL certificate for *.example.com expired at 08:00 UTC on August 15th, 2025.
…