A rate limiter restricts how often an action can happen — calls per second per user, requests per minute per API key, login attempts per IP. The two canonical algorithms are token bucket and leaky bucket: same goal, different shape of traffic they allow. Pick token bucket unless the interviewer specifically asks why you'd use leaky.
Requirements
- Allow at most
Rrequests per second per key, with burst tolerance of up toB. - Operations:
tryAcquire(key)returnstrueif allowed,falseif throttled. - Single JVM, multi-threaded (many request handlers calling
tryAcquireconcurrently). - Out of scope: distributed coordination, retries, backoff suggestions.
Token bucket vs leaky bucket — the one-line difference
- Token bucket: a bucket holds up to
Btokens, refilled atRtokens/second. Each request consumes one token. Idle periods accumulate tokens up toB— so bursts are allowed (up toBrequests in zero time). - Leaky bucket: requests enter a fixed-capacity queue that drains at
Rrequests/second. Overflows are dropped. Idle periods don't earn capacity — output rate is steady, no bursts.
| Property | Token bucket | Leaky bucket |
|---|---|---|
| Burst handling | Allows bursts up to B | No bursts — steady output rate |
| Implementation | Counter + timestamp | Queue + scheduler |
| Memory per key | O(1) | O(queue size) |
| Most common in practice | Yes (Stripe, AWS, GitHub API) | Rare today |
Token bucket is the right default because real APIs want to allow short bursts — users do refresh-twice or open three tabs at once and shouldn't be punished for it.
Token bucket implementation
public final class TokenBucket {
private final long capacity; // B: max tokens
private final double refillPerSec; // R: refill rate
private double tokens; // current count (double for fractional refill)
private long lastRefillNanos; // last update timestamp
private final Object lock = new Object();
public TokenBucket(long capacity, double refillPerSec) {
this.capacity = capacity;
this.refillPerSec = refillPerSec;
this.tokens = capacity;
this.lastRefillNanos = System.nanoTime();
}
public boolean tryAcquire() {
synchronized (lock) {
refill();
if (tokens >= 1.0) {
tokens -= 1.0;
return true;
}
return false;
}
}
private void refill() {
long now = System.nanoTime();
double elapsedSec = (now - lastRefillNanos) / 1e9;
tokens = Math.min(capacity, tokens + elapsedSec * refillPerSec);
lastRefillNanos = now;
}
}
Two design choices worth defending:
- Lazy refill instead of a background scheduler. Adding tokens only when somebody calls
tryAcquiresaves a thread per key and avoids drift. The math is just "elapsed time times rate." - Double instead of long for
tokens. WithR = 100and 1ms between calls, you should add 0.1 tokens — integer truncation would make the bucket never refill in fast-call scenarios.
Per-key rate limiter
public final class RateLimiter {
private final long capacity;
private final double refillPerSec;
private final ConcurrentMap<String, TokenBucket> buckets = new ConcurrentHashMap<>();
public RateLimiter(long capacity, double refillPerSec) {
this.capacity = capacity;
this.refillPerSec = refillPerSec;
}
public boolean tryAcquire(String key) {
return buckets.computeIfAbsent(key, k -> new TokenBucket(capacity, refillPerSec))
.tryAcquire();
}
}
ConcurrentHashMap plus computeIfAbsent gives lock-free bucket lookup. The per-bucket synchronized block inside tryAcquire keeps contention bounded to one key.
Concurrency upgrade: lock-free token bucket
The synchronized block above serializes every tryAcquire on the same key. For very high QPS, you can replace it with a CAS loop on an AtomicLong packing tokens + timestamp into a single 64-bit value. It's harder to get right; mention it as a "I'd reach for it when profiling demanded it" — don't write it on the whiteboard unless asked.
Memory hygiene
A naive ConcurrentHashMap grows forever — every distinct key keeps a bucket alive. Two fixes:
- TTL the buckets: wrap each entry with
lastAccessedAt, sweep with a scheduled cleanup, or use Caffeine'sexpireAfterAccess. - LRU eviction on the bucket map: bounded total bucket count, evict least-recently-used keys.
Leaky bucket sketch (for comparison)
public final class LeakyBucket {
private final int capacity;
private final long drainIntervalNanos; // 1 / R seconds
private final Deque<Long> queue = new ArrayDeque<>();
private final Object lock = new Object();
public boolean tryEnqueue() {
synchronized (lock) {
drain();
if (queue.size() >= capacity) return false;
queue.add(System.nanoTime());
return true;
}
}
private void drain() {
long now = System.nanoTime();
while (!queue.isEmpty() && now - queue.peek() >= drainIntervalNanos) {
queue.poll();
}
}
}
Output rate is steady at R per second. The queue absorbs bursty input but smooths the rate downstream. Use this when the downstream consumer needs steady pacing (e.g. throttling a third-party API that hates bursts).
Complexity
| Operation | Best | Average | Worst | Note |
|---|---|---|---|---|
| tryAcquire (token bucket) | O(1) | O(1) | O(1) | constant arithmetic |
| tryAcquire (leaky bucket) | O(1) | O(1) | O(k) | k = queued entries to drain |
| per-key lookup | O(1) | O(1) | O(1) | ConcurrentHashMap |