Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize
How Parallel Streams Work
Parallel streams are one of Java 8’s most misused features. It is tempting to add .parallel() to any slow stream pipeline, but the performance characteristics are counterintuitive: parallel can make things slower for small data, and adding blocking I/O inside a parallel stream can stall the entire JVM. This article explains the mechanics, the cases where parallel genuinely helps, and the patterns to avoid.
A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results. The mechanism is ForkJoin — specifically ForkJoinPool.commonPool(), a shared thread pool managed by the JVM.
// Sequential — processes on the calling thread
List<String> seq = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
// Parallel — splits work across ForkJoinPool.commonPool()
List<String> par = names.parallelStream()
.map(String::toUpperCase)
.collect(Collectors.toList());
// Convert existing stream to parallel
List<String> par2 = names.stream()
.parallel() // switch to parallel
.map(String::toUpperCase)
.collect(Collectors.toList());
A single call to .parallel() or .sequential() anywhere in a pipeline applies to the entire pipeline.
ForkJoinPool.commonPool
The common pool is shared across the entire JVM process. Its thread count defaults to Runtime.getRuntime().availableProcessors() - 1 (leaving one CPU for the main thread).
System.out.println(ForkJoinPool.commonPool().getParallelism());
// e.g., 7 on an 8-core machine
You can change the pool size with a system property:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=4
The Shared Pool Problem
Because the common pool is shared, one slow parallel stream can starve all others. A blocking I/O call inside a parallel stream blocks a ForkJoin worker thread, which can cascade:
// DANGEROUS: blocking inside parallel stream consumes ForkJoin threads
orders.parallelStream()
.map(order -> httpClient.fetchDetails(order.getId())) // blocks!
.collect(Collectors.toList());
Fix: Use a custom pool for blocking operations:
ForkJoinPool customPool = new ForkJoinPool(8);
List<OrderDetail> results = customPool.submit(() ->
orders.parallelStream()
.map(order -> httpClient.fetchDetails(order.getId()))
.collect(Collectors.toList())
).get();
customPool.shutdown();
Spliterators
The engine behind stream splitting is Spliterator<T> — an iterator that knows how to split itself for parallel processing.
public interface Spliterator<T> {
boolean tryAdvance(Consumer<? super T> action); // process next element
Spliterator<T> trySplit(); // split off half
long estimateSize(); // estimated remaining elements
int characteristics(); // ORDERED, SIZED, DISTINCT, etc.
}
When a stream goes parallel:
trySplit()is called recursively to create sub-tasks down to a threshold- Each sub-task processes its split and produces partial results
- Results are combined using the pipeline’s combiner
Spliterator Characteristics
Characteristics tell the framework what guarantees the data source provides, enabling optimisations:
| Characteristic | Meaning | Example sources |
|---|---|---|
ORDERED | Elements have a defined encounter order | List, LinkedList, Arrays.stream |
SORTED | Elements are sorted | TreeSet, sorted stream |
SIZED | estimateSize() is exact | ArrayList, HashSet |
DISTINCT | No duplicate elements | Set |
NONNULL | No null elements | ConcurrentHashMap |
IMMUTABLE | Source cannot be modified | List.of() (Java 9+) |
SUBSIZED | Sub-spliterators are also SIZED | Arrays |
ArrayList is ORDERED + SIZED + SUBSIZED — it splits perfectly. HashSet is SIZED + DISTINCT but not ORDERED — it can’t guarantee encounter order.
When Parallel Is Actually Faster
Parallel streams have overhead: task splitting, thread coordination, result merging. They are only faster when the computational savings outweigh this overhead.
Conditions for parallel wins
- Large data set — typically 10,000+ elements; below that, overhead dominates
- Computationally expensive per-element operation — CPU-bound work that takes non-trivial time
- Splittable source —
ArrayList, arrays,IntStream.rangesplit evenly;LinkedListdoes not - No ordering requirement — or the ordering is cheap to restore
- No shared mutable state — thread-safe operations only
Quick benchmark: CPU-bound work
long N = 10_000_000L;
// Sequential
long start = System.nanoTime();
long sumSeq = LongStream.range(0, N)
.map(n -> n * n % 1000000007L)
.sum();
long seqMs = (System.nanoTime() - start) / 1_000_000;
// Parallel
start = System.nanoTime();
long sumPar = LongStream.range(0, N)
.parallel()
.map(n -> n * n % 1000000007L)
.sum();
long parMs = (System.nanoTime() - start) / 1_000_000;
System.out.printf("Sequential: %dms, Parallel: %dms%n", seqMs, parMs);
// Expect ~3–7x speedup on an 8-core machine
Sources with good split behaviour
| Source | Parallelism quality | Reason |
|---|---|---|
ArrayList | Excellent | O(1) split, exact size |
int[] / long[] | Excellent | O(1) split, exact size |
IntStream.range | Excellent | O(1) arithmetic split |
TreeSet / TreeMap | Good | Balanced tree splits reasonably |
HashSet / HashMap | Moderate | Splits by bucket, uneven possible |
LinkedList | Poor | O(n) split |
| Files.lines() | Poor | Sequential read only |
When NOT to Use Parallel Streams
Small data sets
// Bad: 10 elements — overhead far exceeds savings
List<Integer> small = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
small.parallelStream().map(n -> n * 2).collect(Collectors.toList());
// Use sequential
small.stream().map(n -> n * 2).collect(Collectors.toList());
Order matters and the source is ordered
// DANGEROUS: parallel + ordered list = correct but slow (re-ordering overhead)
// and may produce unexpected output order in intermediate steps
List<String> ordered = new ArrayList<>(names);
ordered.parallelStream()
.forEach(System.out::println); // order not guaranteed
// Fix: use forEachOrdered (but this negates most parallelism benefit)
ordered.parallelStream()
.forEachOrdered(System.out::println);
Shared mutable state
// BROKEN: ArrayList.add is not thread-safe
List<Integer> results = new ArrayList<>();
numbers.parallelStream()
.filter(n -> n > 5)
.forEach(results::add); // race condition!
// Fix: use collect instead
List<Integer> results = numbers.parallelStream()
.filter(n -> n > 5)
.collect(Collectors.toList()); // thread-safe
I/O-bound operations (blocking)
Use CompletableFuture with a custom thread pool instead of parallel streams for HTTP calls, database queries, or file I/O.
Short pipelines
If the pipeline has only one or two operations and the element count is modest, the overhead of splitting and merging will dominate.
Ordering Guarantees
| Stream type | forEach | collect(toList()) | findFirst |
|---|---|---|---|
| Sequential, ordered | ✓ ordered | ✓ ordered | first element |
| Parallel, ordered | ✗ any order | ✓ ordered (re-ordered) | first element (expensive) |
| Parallel, unordered | ✗ any order | ✗ any order | any element (fast) |
For parallel streams on ordered sources (lists), collect(Collectors.toList()) always preserves encounter order — Java guarantees this even for parallel streams. Only forEach loses order.
To tell the stream “I don’t care about order” and enable optimisations:
names.parallelStream()
.unordered() // removes ORDERED characteristic
.filter(s -> s.length() > 3)
.collect(Collectors.toList()); // may be in any order — faster
Practical Guide
// Template for deciding stream mode
Stream<T> stream = source.stream();
if (source.size() > 10_000 // large enough
&& operationIsCpuBound // not I/O
&& !needsOrderedSideEffects // no System.out::println in forEach
&& operationIsThreadSafe) { // no shared mutable state
stream = stream.parallel();
}
Common patterns that are safe to parallelise
// CPU-heavy computation, result in a list
List<Result> results = items.parallelStream()
.map(item -> expensiveCompute(item))
.collect(Collectors.toList());
// Sum / reduce over large numeric arrays
long total = LongStream.range(0, 1_000_000).parallel().sum();
// Filtering large collections
List<Order> bigOrders = orders.parallelStream()
.filter(o -> o.getTotal() > 10_000)
.collect(Collectors.toList());
Common Mistakes
Combining parallel streams with sorted() or distinct()
sorted() and distinct() are stateful operations — they must see all elements before they can produce output. In a parallel stream, every worker thread produces partial results that must then be merged and sorted. This negates most of the parallel benefit and adds overhead:
// Very likely slower than sequential for most data sizes
names.parallelStream()
.sorted()
.collect(Collectors.toList());
// Sequential sorted is often faster for in-memory lists
names.stream()
.sorted()
.collect(Collectors.toList());
Assuming parallelStream() is always safe on all sources
Stream.of(...) and Collection.stream() sources are generally safe, but not all collections split well. LinkedList.parallelStream() is essentially sequential because LinkedList cannot be split efficiently:
// LinkedList: trySplit() traverses to the midpoint — O(n) cost per split
List<String> linked = new LinkedList<>(names);
linked.parallelStream().map(String::toUpperCase).collect(toList());
// Almost certainly slower than sequential
// ArrayList: O(1) split
List<String> array = new ArrayList<>(names);
array.parallelStream().map(String::toUpperCase).collect(toList());
// Can be faster for large lists with expensive per-element work
Not measuring before and after
Every parallel stream usage should be accompanied by a benchmark. The JMH microbenchmark framework is the standard tool:
// pom.xml dependency
// <dependency>
// <groupId>org.openjdk.jmh</groupId>
// <artifactId>jmh-core</artifactId>
// <version>1.37</version>
// </dependency>
@Benchmark
public long sequentialSum(BenchmarkState state) {
return LongStream.range(0, state.N).map(n -> n * n % 1_000_000_007L).sum();
}
@Benchmark
public long parallelSum(BenchmarkState state) {
return LongStream.range(0, state.N).parallel().map(n -> n * n % 1_000_000_007L).sum();
}
Without measuring, you are guessing.
Summary
| Concept | Key point |
|---|---|
| How it works | ForkJoin splits source → process in parallel → merge results |
| Default pool | ForkJoinPool.commonPool(), size = CPU count - 1 |
| Spliterator | Enables splitting; ArrayList / arrays split best |
| Ordering | collect(toList()) preserves order; forEach does not |
| Safe use | Large + CPU-bound + stateless + no ordering side effects |
| Avoid when | Small data, I/O-bound, shared mutable state, order matters |
| Always do | Measure with JMH before and after adding .parallel() |
Next Step
Optional: Eliminating NullPointerException the Right Way →
Part of the DevOps Monk Java tutorial series: Java 8 → Java 11 → Java 17 → Java 21