🛠️ Guide
12 min
Kubernetes Troubleshooting: Pod Crashes, Networking, and Resources
Introduction Kubernetes troubleshooting can be challenging due to its distributed nature and multiple abstraction layers. This guide covers the most common issues and systematic approaches to diagnosing and fixing them.
Pod Crash Loops Understanding CrashLoopBackOff What it means: The pod starts, crashes, restarts, and repeats in an exponential backoff pattern.
Diagnostic Process Step 1: Check pod status
kubectl get pods -n production # Output: # NAME READY STATUS RESTARTS AGE # myapp-7d8f9c6b5-xyz12 0/1 CrashLoopBackOff 5 10m Step 2: Describe the pod
…
October 15, 2025 · 12 min · DevOps Engineer
🚨 Incident
2 min
Incident: Missing DAGs in Apache Airflow
Incident Description Time: 2025-08-17 02:00 UTC
Duration: 45 minutes
Impact: Critical - all scheduled tasks stopped
Symptoms DAGs disappeared from Airflow UI Scheduler logs showing import errors Tasks not running on schedule Timeline 02:00 - Issue Detection # Monitoring showed no tasks airflow dags list | wc -l # Result: 0 (should be ~50) 02:05 - Initial Diagnosis # Check scheduler status systemctl status airflow-scheduler # Check logs tail -f /var/log/airflow/scheduler.log 02:10 - Root Cause Found # Error found in logs: ImportError: No module named 'pandas' # DAG file imports pandas, but library is missing Root Cause Analysis Cause Virtual environment update removed the pandas dependency used in one of the DAG files. Airflow stops loading ALL DAGs when any single DAG file has import errors.
…
August 17, 2025 · 2 min · DevOps Engineer