Fork/Join & Parallel Streams — Java Interview Guide | Cracked Java
Senior

Fork/Join & Parallel Streams

Divide-and-conquer parallelism: the work-stealing ForkJoinPool, RecursiveTask vs RecursiveAction, the common pool trap, and when parallel streams actually help.

Prereqs: thread-pools

The Fork/Join framework is Java's answer to divide-and-conquer parallelism: split a big task into smaller subtasks, run them across a pool of worker threads, and combine the results. Its engine is the ForkJoinPool, which uses work-stealing to keep cores busy even when subtasks are unevenly sized. Parallel streams are the high-level, declarative face of the same machinery — under the hood, stream().parallel() schedules its work on a ForkJoinPool.

Work-stealing, not work-sharing

A classic thread pool has one shared queue that every worker pulls from — a contention point. A ForkJoinPool instead gives each worker its own double-ended queue (deque). A worker pushes newly forked subtasks onto the head of its own deque and pops from the head too (LIFO, which is cache-friendly and keeps the most recently split work hot). When a worker runs dry, it steals from the tail of another worker's deque (FIFO), grabbing the oldest, typically largest, unstarted chunk. This minimizes contention and balances load automatically.

Worker A deque:  [head] t5  t4  t3 [tail]
                 ^own push/pop      ^steal target
Worker B deque:  [head] t9  t8 [tail]   (B is busy)
Worker C: empty  --->  steals t3 from A's tail
Each worker owns a deque; idle workers steal from the tail of a busy one

RecursiveTask, RecursiveAction, and the common pool

You model work as a RecursiveTask<V> (returns a value) or RecursiveAction (no result). The idiom inside compute(): if the problem is below a threshold, solve it directly; otherwise split, fork() one half (push it for async execution), compute() the other half inline, then join() the forked half. Forking both and joining both wastes a thread — always compute one side directly.

Parallel streams: the same engine, declarative

Parallel streams help when data is large, CPU-bound, and cheaply splittable (arrays, ArrayList, IntStream.range). They hurt on small datasets, poorly-splittable sources (LinkedList, Iterator-backed), I/O-bound bodies, or ordered/stateful operations where merge and ordering costs eat the gains. The default rule of thumb: measure before you parallelize.

Questions

5 in this topic