Parallel streams help when you have a large amount of CPU-bound work over a cheaply splittable source, and they hurt when any of those conditions is missing — small data, expensive-to-split sources, I/O-bound or blocking bodies, or order-sensitive operations. The honest answer is: measure, because the overhead of splitting, scheduling on the common pool, and merging is real and easily exceeds the savings.
When they help
- Big N. There must be enough elements that the per-element work, summed up, dwarfs the fixed cost of forking tasks and combining partial results.
- CPU-bound, independent per-element work. Math, parsing, transformation — work that genuinely saturates a core.
- Splittable source with a good
Spliterator. Arrays,ArrayList,IntStream.range, andHashMapsplit in O(1) into balanced halves. These give the work-stealing pool even chunks.
// Good fit: large array, pure CPU work, trivially splittable
double[] data = ...; // millions of elements
double sum = Arrays.stream(data)
.parallel()
.map(Math::sqrt) // CPU-bound, stateless, no I/O
.sum();
When they hurt
- Small datasets. For a few hundred elements the setup cost loses to a plain loop every time.
- Poorly-splittable sources.
LinkedList,Stream.iterate,BufferedReader.lines(), anythingIterator-backed — splitting is sequential and lopsided, so one worker does most of the work. - I/O-bound or blocking bodies. Network/DB calls block common-pool workers (see the common-pool trap); you get no CPU parallelism and risk starving the whole JVM.
- Ordered / stateful pipelines.
forEachOrdered,limit,findFirst,sorted, anddistinctforce coordination or buffering that erodes or erases the gain. UseforEachoverforEachOrderedandfindAnyoverfindFirstwhen order is irrelevant.
Also remember: all parallel streams in the process share the single common pool, so two parallel streams running at once compete for the same cores − 1 workers. If parallel throughput matters, run on a dedicated ForkJoinPool.