When do parallel streams help, and when do they hurt?

Divide-and-conquer parallelism: the work-stealing ForkJoinPool, RecursiveTask vs RecursiveAction, the common pool trap, and when parallel streams actually help.

Cracked Java

Parallel streams help when you have a large amount of CPU-bound work over a cheaply splittable source, and they hurt when any of those conditions is missing — small data, expensive-to-split sources, I/O-bound or blocking bodies, or order-sensitive operations. The honest answer is: measure, because the overhead of splitting, scheduling on the common pool, and merging is real and easily exceeds the savings.

When they help

Big N. There must be enough elements that the per-element work, summed up, dwarfs the fixed cost of forking tasks and combining partial results.
CPU-bound, independent per-element work. Math, parsing, transformation — work that genuinely saturates a core.
Splittable source with a good Spliterator. Arrays, ArrayList, IntStream.range, and HashMap split in O(1) into balanced halves. These give the work-stealing pool even chunks.

// Good fit: large array, pure CPU work, trivially splittable
double[] data = ...; // millions of elements
double sum = Arrays.stream(data)
    .parallel()
    .map(Math::sqrt)   // CPU-bound, stateless, no I/O
    .sum();

When they hurt

Small datasets. For a few hundred elements the setup cost loses to a plain loop every time.
Poorly-splittable sources. LinkedList, Stream.iterate, BufferedReader.lines(), anything Iterator-backed — splitting is sequential and lopsided, so one worker does most of the work.
I/O-bound or blocking bodies. Network/DB calls block common-pool workers (see the common-pool trap); you get no CPU parallelism and risk starving the whole JVM.
Ordered / stateful pipelines. forEachOrdered, limit, findFirst, sorted, and distinct force coordination or buffering that erodes or erases the gain. Use forEach over forEachOrdered and findAny over findFirst when order is irrelevant.

Also remember: all parallel streams in the process share the single common pool, so two parallel streams running at once compete for the same cores − 1 workers. If parallel throughput matters, run on a dedicated ForkJoinPool.