Skip to content

The Three Pillars of Observability

One-Liner

The three core types of telemetry data that provide a comprehensive view of a system’s health: Metrics, Logs, and Traces.

What It Is

  • Metrics: Numeric representations of data measured over time (e.g., CPU usage, request latency). They are cheap to store and good for dashboards and alerting.
  • Logs: Immutable, timestamped records of discrete events. They provide detailed, contextual information about what occurred at a specific point in time.
  • Traces: A representation of a single request as it flows through a distributed system. Traces are made up of spans, which represent individual operations.

Why It Exists

To provide a framework for understanding and debugging complex systems. Each pillar provides a different perspective, and together they give a more complete picture than any one pillar alone.

How It Works

  • Metrics tell you what is happening.
  • Logs tell you why it is happening for a specific event.
  • Traces tell you where in the system the problem is.

Tradeoffs

Metrics

  • Pros: Cheap, efficient, good for aggregation.
  • Cons: Lack context.

Logs

  • Pros: Rich context.
  • Cons: Expensive to store and query, can be unstructured.

Traces

  • Pros: Great for debugging latency in distributed systems.
  • Cons: Can be complex to set up, sampling may miss rare events.

Failure Modes

  • Relying on only one pillar: For example, having metrics but no logs to explain why a metric has spiked.
  • Uncorrelated data: Having all three pillars but no way to link them together (e.g., finding the logs for a specific trace).

Interview Traps

  • Not being able to explain the role of each pillar.
  • Not understanding that they are complementary, not mutually exclusive.

Real-World Usage

  • Modern observability platforms (like Datadog, New Relic, Honeycomb) are built around these three pillars.

Anti-Patterns

  • Putting high-cardinality data (like user IDs) in metric tags, which can cause an “explosion” of time series.
  • Logging unstructured text that is hard to parse and query.
  • The Four Golden Signals
  • Structured Logging
  • Distributed Tracing