Why is size() approximate? What does mappingCount() do? — Cracked Java
// Java Collections Framework · Concurrent Collections — ConcurrentHashMap
SeniorTrick

Why is size() approximate? What does mappingCount() do?

size() is approximate because CHM doesn't maintain a single global counter — that would itself become a contention bottleneck. Instead it uses a striped counter (like LongAdder): each thread updates its own cell, and size() sums the cells on demand. Concurrent writes can change the total mid-summation. mappingCount() returns the same value as a long, which matters once the map exceeds Integer.MAX_VALUE entries.

Why not a single counter?

Imagine a volatile long count incremented on every put. Every writer would contend on the same cache line — CAS storms, cache-line ping-pong between cores. CHM's whole point is to scale; a single counter would erase the gain.

The striped counter design

CHM uses LongAdder-style internals (CounterCell[]):

  • Each thread, indexed by a per-thread probe, updates its own CounterCell.
  • The array grows under contention up to roughly NCPU cells.
  • size() walks the array and sums every cell's value.

Result: writes scale linearly with cores, and reading the size is O(cells), not O(map).

Why "approximate"?

long total = baseCount;
for (CounterCell c : cells) {
    if (c != null) total += c.value;
}

Between reading cell 0 and cell 7, other threads can put or remove. The returned number was the true size at no single instant — it's a sum of snapshots. For monitoring, it's plenty accurate. For control flow, don't trust the exact value.

size() vs mappingCount()

MethodReturn typeSaturates at
size()intInteger.MAX_VALUE (~2.1B). Larger maps overflow / silently clamp.
mappingCount()longLong.MAX_VALUE — for serious large maps.

If you only need a rough count and your map will never exceed 2 billion entries, size() is fine. If you might (caching, log aggregation, etc.), reach for mappingCount().

Cost characteristics

OperationBestAverageWorstNote
putO(1)O(1)O(log n) per bucket (treeified)CAS or sync on head
getO(1)O(1)O(log n)lock-free
size() / mappingCount()O(NCPU)O(NCPU)O(NCPU)sums striped cells

Misuse

// BAD — race condition
if (map.size() < MAX) {
    map.put(k, v);     // another thread may have put between check and act
}

// GOOD — atomic
map.compute(k, (key, old) -> {
    if (mapSizeWithinLimit()) return v;
    return old;
});
// or use a Semaphore / explicit counter for capacity gating

Mark your status