size() is approximate because CHM doesn't maintain a single global counter — that would itself become a contention bottleneck. Instead it uses a striped counter (like LongAdder): each thread updates its own cell, and size() sums the cells on demand. Concurrent writes can change the total mid-summation. mappingCount() returns the same value as a long, which matters once the map exceeds Integer.MAX_VALUE entries.
Why not a single counter?
Imagine a volatile long count incremented on every put. Every writer would contend on the same cache line — CAS storms, cache-line ping-pong between cores. CHM's whole point is to scale; a single counter would erase the gain.
The striped counter design
CHM uses LongAdder-style internals (CounterCell[]):
- Each thread, indexed by a per-thread probe, updates its own
CounterCell. - The array grows under contention up to roughly
NCPUcells. size()walks the array and sums every cell's value.
Result: writes scale linearly with cores, and reading the size is O(cells), not O(map).
Why "approximate"?
long total = baseCount;
for (CounterCell c : cells) {
if (c != null) total += c.value;
}
Between reading cell 0 and cell 7, other threads can put or remove. The returned number was the true size at no single instant — it's a sum of snapshots. For monitoring, it's plenty accurate. For control flow, don't trust the exact value.
size() vs mappingCount()
| Method | Return type | Saturates at |
|---|---|---|
size() | int | Integer.MAX_VALUE (~2.1B). Larger maps overflow / silently clamp. |
mappingCount() | long | Long.MAX_VALUE — for serious large maps. |
If you only need a rough count and your map will never exceed 2 billion entries, size() is fine. If you might (caching, log aggregation, etc.), reach for mappingCount().
Cost characteristics
| Operation | Best | Average | Worst | Note |
|---|---|---|---|---|
| put | O(1) | O(1) | O(log n) per bucket (treeified) | CAS or sync on head |
| get | O(1) | O(1) | O(log n) | lock-free |
| size() / mappingCount() | O(NCPU) | O(NCPU) | O(NCPU) | sums striped cells |
Misuse
// BAD — race condition
if (map.size() < MAX) {
map.put(k, v); // another thread may have put between check and act
}
// GOOD — atomic
map.compute(k, (key, old) -> {
if (mapSizeWithinLimit()) return v;
return old;
});
// or use a Semaphore / explicit counter for capacity gating