Metrics: counter, gauge, histogram, summary — when to use… — Cracked Java
// High-Level Design (HLD / Distributed Systems) · Observability — Logging, Metrics, Tracing
SeniorSystem Design

Metrics: counter, gauge, histogram, summary — when to use each.

Metrics: counter, gauge, histogram, summary — when to use each

Picking the wrong metric type silently produces meaningless data — a counter you can reset and average, a gauge you try to rate, a latency you store as an average. Knowing the four types (the Prometheus taxonomy, mirrored by OpenTelemetry) and matching each to what you're measuring is a concrete senior signal.

Counter — a value that only goes up

A counter is a monotonically increasing cumulative value. It only ever goes up (or resets to zero on process restart). You almost never look at the raw counter value; you look at its rate of change.

  • Use for: total requests served, errors, bytes sent, tasks completed — anything you count.
  • Query pattern: rate(http_requests_total[5m]) gives requests-per-second; dividing the error counter's rate by the total counter's rate gives the error rate.
  • Why monotonic matters: because it's cumulative, a missed scrape or restart doesn't corrupt the rate — the math handles resets. Storing requests as a non-monotonic number would lose counts between samples.

Gauge — a value that goes up and down

A gauge is a snapshot of a value that can increase or decrease. You read the current value directly.

  • Use for: current memory usage, in-flight requests, queue depth, CPU utilization, active connections, temperature — any instantaneous level.
  • Query pattern: the value itself, or avg/max over instances. Taking rate() of a gauge is usually a mistake.
  • Rule of thumb: if "the number going down" is meaningful, it's a gauge; if it can only ever climb, it's a counter.

Histogram — the distribution of observations

A histogram samples observations (typically durations or sizes) into pre-defined buckets, plus a running sum and count. It's how you measure latency and answer percentile questions.

  • Use for: request latency, response size — anything where the distribution matters, not just the average.
  • Why not an average: an average latency of 50 ms hides that 1% of users wait 5 seconds. Percentiles (p50, p95, p99) are what matter, and you can only compute them from a distribution.
  • How it works: the client counts observations falling into each bucket (≤10 ms, ≤50 ms, ≤100 ms, …). Percentiles are computed server-side from buckets, so histograms are aggregatable across instances — you can compute a fleet-wide p99. The cost: you must choose bucket boundaries up front, and percentiles are interpolated/approximate.

Summary — client-side quantiles

A summary also tracks the distribution but computes quantiles on the client over a sliding window, exporting the φ-quantiles (e.g., 0.5, 0.9, 0.99) directly along with sum and count.

  • Use for: latency where you need precise quantiles and pre-chosen buckets won't do.
  • The catch: client-computed quantiles cannot be aggregated across instances — you can't average ten servers' p99 to get a fleet p99 (averaging percentiles is mathematically invalid). It also costs more CPU on the client.
  • Practice: prefer histograms in distributed systems precisely because they aggregate; reach for summaries only when you need exact quantiles on a single instance and won't aggregate.
Choosing the right metric type by what you're measuring
TypeDirectionExampleRead asAggregatable?
CounterUp onlyrequests_total, errors_totalrate()Yes
GaugeUp/downmemory_bytes, queue_depthvalueYes (avg/max)
HistogramDistributionrequest_duration bucketsserver-side p50/p95/p99Yes
SummaryDistributionrequest_duration quantilesclient-side φ-quantilesNo

Mark your status