Cracked Java

Range vs hash sharding — pros and cons

When a dataset or write volume exceeds one machine, you partition (shard): each node holds a disjoint subset of the data, keyed by a partition key. The central decision is how you map a key to a shard. The two base strategies are range and hash, and the choice is dominated by one trade-off: range queries vs. even load distribution.

Range partitioning

Assign contiguous key ranges to shards: A–F on shard 1, G–M on shard 2, and so on (keys can be IDs, timestamps, usernames).

Pro — efficient range scans. "All orders from last Tuesday" or "users H through K" touch one or a few adjacent shards. Sorted-order queries are natural.
Con — hot spots from skew. If the key is monotonically increasing (a timestamp, an auto-increment ID), all new writes hit the last shard while the rest sit idle. Uneven natural distribution (lots of users named "S") also concentrates load.
Used by: HBase, Bigtable, MongoDB (ranged), many time-series stores.

Hash partitioning

Apply a hash function to the key and assign by the hash (e.g., hash(key) mod N, or a position on a hash ring). The hash spreads even sequential keys uniformly.

Pro — even load. A good hash scatters writes and storage across all shards, eliminating monotonic-key hot spots.
Con — range queries are gone. Adjacent keys land on random shards, so a range scan must hit every shard (scatter-gather) — expensive. Sorted iteration is impossible without secondary structures.
Used by: Cassandra, DynamoDB, most distributed caches.

Keys:  k1 k2 k3 k4 k5 k6 (k7=newest, monotonic)

RANGE:   [ k1 k2 ] [ k3 k4 ] [ k5 k6 k7 ]   <- new writes pile on last shard
          shard A    shard B    shard C  (HOT)

HASH:    [ k2 k5 ] [ k1 k7 ] [ k3 k4 k6 ]   <- even, but "k3..k5" spans all shards
          shard A    shard B    shard C

Same keys, two strategies: range keeps order (and hot spots); hash spreads load (and scatters ranges)

Side by side

	Range	Hash
Range / sorted queries	Efficient (few shards)	Scatter-gather (all shards)
Load distribution	Risky (hot spots on skew/monotonic keys)	Even
Point lookups	Efficient	Efficient
Rebalancing	Split/merge ranges	Easier with consistent hashing
Best for	Time-series, sorted scans, range filters	Uniform point-access at scale

Choosing the partition key

The strategy is only half the decision — the partition key matters as much:

High cardinality so load spreads (don't shard on "country" with 3 dominant values).
Matches the access pattern — shard chat messages by chat_id so a conversation lives on one shard; shard by user_id if reads are per-user.
Avoid monotonic keys with range sharding — or prefix the timestamp with a hashed bucket to break the hot spot ("salting").