Logging

Browse all Logging

📊 SRE Practice 13 min

Observability: The Three Pillars of Metrics, Logs, and Traces

Introduction Observability is the ability to understand the internal state of a system based on its external outputs. Unlike traditional monitoring, which tells you what is broken, observability helps you understand why it’s broken, even for issues you’ve never encountered before. Core Principle: “You can’t fix what you can’t see. You can’t see what you don’t measure.” The Three Pillars Overview ┌─────────────────────────────────────────┐ │ OBSERVABILITY │ ├─────────────┬──────────────┬────────────┤ │ METRICS │ LOGS │ TRACES │ ├─────────────┼──────────────┼────────────┤ │ What/When │ Why/Details │ Where │ │ Aggregated │ Individual │ Causal │ │ Time-series │ Events │ Flows │ │ Dashboards │ Search │ Waterfall │ └─────────────┴──────────────┴────────────┘ When to Use Each: …

October 16, 2025 · 13 min · DevOps Engineer

11 min

Linux Observability: Metrics, Logs, eBPF Tools, and 5-Minute Triage

Executive Summary Observability = see inside your systems: metrics (CPU, memory, I/O), logs (audit trail), traces (syscalls, latency). This guide covers: Metrics: node_exporter → Prometheus (system-level health) Logs: journald → rsyslog/Vector/Fluent Bit (aggregation) eBPF tools: 5 quick wins (trace syscalls, network, I/O) Triage: 5-minute flowchart to diagnose CPU, memory, I/O, network issues 1. Metrics: node_exporter & Prometheus What It Is node_exporter: Exposes OS metrics (CPU, memory, disk, network) as Prometheus scrape target Prometheus: Time-series database; collects metrics, queries, alerts Dashboard: Grafana visualizes Prometheus data Install node_exporter Ubuntu/Debian: …

October 16, 2025 · 11 min · DevOps Engineer

linux observability prometheus metrics logging

12 min

Linux Reliability & Lifecycle: Time Sync, Logging, Shutdown, and Patching

Executive Summary Reliability means predictable, auditable behavior. This guide covers: Time sync: Chrony for clock accuracy (critical for logging, security) Networking: Stable interface names & hostnames (infrastructure consistency) Logging: Persistent journald + logrotate (audit trail + disk management) Shutdown: Clean hooks to prevent data loss Patching: Kernel updates with livepatch (zero-downtime), tested rollback 1. Time Synchronization (chrony) Why Time Matters Critical for: Logging: Accurate timestamps for debugging, compliance audits Security: TLS cert validation, Kerberos, API token expiry Distributed systems: Causality ordering (happens-before relationships) Monitoring: Alert timing, metric correlation Cost of poor time sync: …

October 16, 2025 · 12 min · DevOps Engineer

linux time-sync chrony journald logging

🛠️ Guide 10 min

ELK Stack Tuning: Elasticsearch Index Lifecycle and Logstash Pipelines

Introduction The ELK stack (Elasticsearch, Logstash, Kibana) is powerful for log aggregation and analysis, but requires proper tuning for production workloads. This guide covers Elasticsearch index lifecycle management, Logstash pipeline optimization, and performance best practices. Elasticsearch Index Lifecycle Management (ILM) Understanding ILM ILM automates index management through lifecycle phases: Phases: Hot - Actively writing and querying Warm - No longer writing, still querying Cold - Rarely queried, compressed Frozen - Very rarely queried, minimal resources Delete - Removed from cluster Basic ILM Policy Create policy: …

October 15, 2025 · 10 min · DevOps Engineer

elasticsearch logstash kibana elk logging