Latency numbers every engineer should know
Jeff Dean's "latency numbers" are the back-of-the-envelope intuition behind every capacity estimate and every "where should this data live?" decision. You don't need the digits to the nanosecond — you need the orders of magnitude and the ratios between them, so you can reason about what dominates a request's latency.
The canonical table
Rounded to memorable orders of magnitude (modern hardware):
OPERATION LATENCY RELATIVE ---------------------------------------- ----------- ----------------------- L1 cache reference ~1 ns 1x Branch mispredict ~3 ns L2 cache reference ~4 ns ~4x L1 Mutex lock/unlock ~17 ns Main memory (RAM) reference ~100 ns ~100x L1 Compress 1 KB (fast algorithm) ~2 us Read 1 MB sequentially from RAM ~3 us SSD random read ~16-100 us ~1,000x RAM Round trip within same datacenter ~500 us (0.5 ms) Read 1 MB sequentially from SSD ~1 ms Disk seek (spinning HDD) ~2-10 ms Read 1 MB sequentially from disk ~20 ms Round trip CA <-> Netherlands (cross-region) ~150 ms ~1,000,000x RAM
(ns = nanosecond, us = microsecond, ms = millisecond; 1 ms = 1,000 us = 1,000,000 ns.)
The takeaways that actually matter
- RAM is ~100× faster than SSD, which is ~100× faster than a disk seek. Keeping the working set in memory is the single biggest performance lever — this is the whole case for caching.
- A datacenter round trip (~0.5 ms) dwarfs local compute. Within a request, network hops dominate; cutting one cross-service call beats micro-optimizing code.
- Cross-region is ~150 ms. A single synchronous transatlantic round trip blows most latency budgets — hence read replicas near users, CDNs, and async cross-region replication.
- Sequential beats random by orders of magnitude, especially on disk. This is why LSM-trees (sequential writes) and log-structured designs win on write throughput.
Using them in an interview
Latency numbers turn vague claims into arithmetic. "We need p99 < 50 ms, and a cross-region call alone is ~150 ms, so this read must be served from a same-region replica or cache" is the kind of sentence that lands. Likewise: "100K reads/s against a disk-bound store at ~10 ms each is impossible on one node → it must be served from memory."