Range vs hash sharding — pros and cons. — Cracked Java
// High-Level Design (HLD / Distributed Systems) · Replication & Partitioning (Sharding)
SeniorSystem DesignAmazon

Range vs hash sharding — pros and cons.

Range vs hash sharding — pros and cons

When a dataset or write volume exceeds one machine, you partition (shard): each node holds a disjoint subset of the data, keyed by a partition key. The central decision is how you map a key to a shard. The two base strategies are range and hash, and the choice is dominated by one trade-off: range queries vs. even load distribution.

Range partitioning

Assign contiguous key ranges to shards: A–F on shard 1, G–M on shard 2, and so on (keys can be IDs, timestamps, usernames).

  • Pro — efficient range scans. "All orders from last Tuesday" or "users H through K" touch one or a few adjacent shards. Sorted-order queries are natural.
  • Con — hot spots from skew. If the key is monotonically increasing (a timestamp, an auto-increment ID), all new writes hit the last shard while the rest sit idle. Uneven natural distribution (lots of users named "S") also concentrates load.
  • Used by: HBase, Bigtable, MongoDB (ranged), many time-series stores.

Hash partitioning

Apply a hash function to the key and assign by the hash (e.g., hash(key) mod N, or a position on a hash ring). The hash spreads even sequential keys uniformly.

  • Pro — even load. A good hash scatters writes and storage across all shards, eliminating monotonic-key hot spots.
  • Con — range queries are gone. Adjacent keys land on random shards, so a range scan must hit every shard (scatter-gather) — expensive. Sorted iteration is impossible without secondary structures.
  • Used by: Cassandra, DynamoDB, most distributed caches.
Keys:  k1 k2 k3 k4 k5 k6 (k7=newest, monotonic)

RANGE:   [ k1 k2 ] [ k3 k4 ] [ k5 k6 k7 ]   <- new writes pile on last shard
          shard A    shard B    shard C  (HOT)

HASH:    [ k2 k5 ] [ k1 k7 ] [ k3 k4 k6 ]   <- even, but "k3..k5" spans all shards
          shard A    shard B    shard C
Same keys, two strategies: range keeps order (and hot spots); hash spreads load (and scatters ranges)

Side by side

RangeHash
Range / sorted queriesEfficient (few shards)Scatter-gather (all shards)
Load distributionRisky (hot spots on skew/monotonic keys)Even
Point lookupsEfficientEfficient
RebalancingSplit/merge rangesEasier with consistent hashing
Best forTime-series, sorted scans, range filtersUniform point-access at scale

Choosing the partition key

The strategy is only half the decision — the partition key matters as much:

  • High cardinality so load spreads (don't shard on "country" with 3 dominant values).
  • Matches the access pattern — shard chat messages by chat_id so a conversation lives on one shard; shard by user_id if reads are per-user.
  • Avoid monotonic keys with range sharding — or prefix the timestamp with a hashed bucket to break the hot spot ("salting").

Mark your status