Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

Part 8 of 16

May 04, 2026 Abhay 7 min read

Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

How Parallel Streams Work

Parallel streams are one of Java 8’s most misused features. It is tempting to add .parallel() to any slow stream pipeline, but the performance characteristics are counterintuitive: parallel can make things slower for small data, and adding blocking I/O inside a parallel stream can stall the entire JVM. This article explains the mechanics, the cases where parallel genuinely helps, and the patterns to avoid.

A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results. The mechanism is ForkJoin — specifically ForkJoinPool.commonPool(), a shared thread pool managed by the JVM.

// Sequential — processes on the calling thread
List<String> seq = names.stream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Parallel — splits work across ForkJoinPool.commonPool()
List<String> par = names.parallelStream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Convert existing stream to parallel
List<String> par2 = names.stream()
    .parallel()           // switch to parallel
    .map(String::toUpperCase)
    .collect(Collectors.toList());

A single call to .parallel() or .sequential() anywhere in a pipeline applies to the entire pipeline.

ForkJoinPool.commonPool

The common pool is shared across the entire JVM process. Its thread count defaults to Runtime.getRuntime().availableProcessors() - 1 (leaving one CPU for the main thread).

System.out.println(ForkJoinPool.commonPool().getParallelism());
// e.g., 7 on an 8-core machine

You can change the pool size with a system property:

-Djava.util.concurrent.ForkJoinPool.common.parallelism=4

The Shared Pool Problem

Because the common pool is shared, one slow parallel stream can starve all others. A blocking I/O call inside a parallel stream blocks a ForkJoin worker thread, which can cascade:

// DANGEROUS: blocking inside parallel stream consumes ForkJoin threads
orders.parallelStream()
    .map(order -> httpClient.fetchDetails(order.getId()))  // blocks!
    .collect(Collectors.toList());

Fix: Use a custom pool for blocking operations:

ForkJoinPool customPool = new ForkJoinPool(8);
List<OrderDetail> results = customPool.submit(() ->
    orders.parallelStream()
        .map(order -> httpClient.fetchDetails(order.getId()))
        .collect(Collectors.toList())
).get();
customPool.shutdown();

Spliterators

The engine behind stream splitting is Spliterator<T> — an iterator that knows how to split itself for parallel processing.

public interface Spliterator<T> {
    boolean tryAdvance(Consumer<? super T> action);  // process next element
    Spliterator<T> trySplit();                        // split off half
    long estimateSize();                              // estimated remaining elements
    int characteristics();                            // ORDERED, SIZED, DISTINCT, etc.
}

When a stream goes parallel:

trySplit() is called recursively to create sub-tasks down to a threshold
Each sub-task processes its split and produces partial results
Results are combined using the pipeline’s combiner

Spliterator Characteristics

Characteristics tell the framework what guarantees the data source provides, enabling optimisations:

Characteristic	Meaning	Example sources
`ORDERED`	Elements have a defined encounter order	`List`, `LinkedList`, `Arrays.stream`
`SORTED`	Elements are sorted	`TreeSet`, sorted stream
`SIZED`	`estimateSize()` is exact	`ArrayList`, `HashSet`
`DISTINCT`	No duplicate elements	`Set`
`NONNULL`	No null elements	`ConcurrentHashMap`
`IMMUTABLE`	Source cannot be modified	`List.of()` (Java 9+)
`SUBSIZED`	Sub-spliterators are also `SIZED`	Arrays

ArrayList is ORDERED + SIZED + SUBSIZED — it splits perfectly. HashSet is SIZED + DISTINCT but not ORDERED — it can’t guarantee encounter order.

When Parallel Is Actually Faster

Parallel streams have overhead: task splitting, thread coordination, result merging. They are only faster when the computational savings outweigh this overhead.

Conditions for parallel wins

Large data set — typically 10,000+ elements; below that, overhead dominates
Computationally expensive per-element operation — CPU-bound work that takes non-trivial time
Splittable source — ArrayList, arrays, IntStream.range split evenly; LinkedList does not
No ordering requirement — or the ordering is cheap to restore
No shared mutable state — thread-safe operations only

Quick benchmark: CPU-bound work

long N = 10_000_000L;

// Sequential
long start = System.nanoTime();
long sumSeq = LongStream.range(0, N)
    .map(n -> n * n % 1000000007L)
    .sum();
long seqMs = (System.nanoTime() - start) / 1_000_000;

// Parallel
start = System.nanoTime();
long sumPar = LongStream.range(0, N)
    .parallel()
    .map(n -> n * n % 1000000007L)
    .sum();
long parMs = (System.nanoTime() - start) / 1_000_000;

System.out.printf("Sequential: %dms, Parallel: %dms%n", seqMs, parMs);
// Expect ~3–7x speedup on an 8-core machine

Sources with good split behaviour

Source	Parallelism quality	Reason
`ArrayList`	Excellent	O(1) split, exact size
`int[]` / `long[]`	Excellent	O(1) split, exact size
`IntStream.range`	Excellent	O(1) arithmetic split
`TreeSet` / `TreeMap`	Good	Balanced tree splits reasonably
`HashSet` / `HashMap`	Moderate	Splits by bucket, uneven possible
`LinkedList`	Poor	O(n) split
Files.lines()	Poor	Sequential read only

When NOT to Use Parallel Streams

Small data sets

// Bad: 10 elements — overhead far exceeds savings
List<Integer> small = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
small.parallelStream().map(n -> n * 2).collect(Collectors.toList());

// Use sequential
small.stream().map(n -> n * 2).collect(Collectors.toList());

Order matters and the source is ordered

// DANGEROUS: parallel + ordered list = correct but slow (re-ordering overhead)
// and may produce unexpected output order in intermediate steps
List<String> ordered = new ArrayList<>(names);
ordered.parallelStream()
    .forEach(System.out::println); // order not guaranteed

// Fix: use forEachOrdered (but this negates most parallelism benefit)
ordered.parallelStream()
    .forEachOrdered(System.out::println);

Shared mutable state

// BROKEN: ArrayList.add is not thread-safe
List<Integer> results = new ArrayList<>();
numbers.parallelStream()
    .filter(n -> n > 5)
    .forEach(results::add);  // race condition!

// Fix: use collect instead
List<Integer> results = numbers.parallelStream()
    .filter(n -> n > 5)
    .collect(Collectors.toList());  // thread-safe

I/O-bound operations (blocking)

Use CompletableFuture with a custom thread pool instead of parallel streams for HTTP calls, database queries, or file I/O.

Short pipelines

If the pipeline has only one or two operations and the element count is modest, the overhead of splitting and merging will dominate.

Ordering Guarantees

Stream type	`forEach`	`collect(toList())`	`findFirst`
Sequential, ordered	✓ ordered	✓ ordered	first element
Parallel, ordered	✗ any order	✓ ordered (re-ordered)	first element (expensive)
Parallel, unordered	✗ any order	✗ any order	any element (fast)

For parallel streams on ordered sources (lists), collect(Collectors.toList()) always preserves encounter order — Java guarantees this even for parallel streams. Only forEach loses order.

To tell the stream “I don’t care about order” and enable optimisations:

names.parallelStream()
    .unordered()  // removes ORDERED characteristic
    .filter(s -> s.length() > 3)
    .collect(Collectors.toList()); // may be in any order — faster

Practical Guide

// Template for deciding stream mode
Stream<T> stream = source.stream();

if (source.size() > 10_000           // large enough
        && operationIsCpuBound        // not I/O
        && !needsOrderedSideEffects   // no System.out::println in forEach
        && operationIsThreadSafe) {   // no shared mutable state
    stream = stream.parallel();
}

Common patterns that are safe to parallelise

// CPU-heavy computation, result in a list
List<Result> results = items.parallelStream()
    .map(item -> expensiveCompute(item))
    .collect(Collectors.toList());

// Sum / reduce over large numeric arrays
long total = LongStream.range(0, 1_000_000).parallel().sum();

// Filtering large collections
List<Order> bigOrders = orders.parallelStream()
    .filter(o -> o.getTotal() > 10_000)
    .collect(Collectors.toList());

Common Mistakes

Combining parallel streams with sorted() or distinct()

sorted() and distinct() are stateful operations — they must see all elements before they can produce output. In a parallel stream, every worker thread produces partial results that must then be merged and sorted. This negates most of the parallel benefit and adds overhead:

// Very likely slower than sequential for most data sizes
names.parallelStream()
    .sorted()
    .collect(Collectors.toList());

// Sequential sorted is often faster for in-memory lists
names.stream()
    .sorted()
    .collect(Collectors.toList());

Assuming parallelStream() is always safe on all sources

Stream.of(...) and Collection.stream() sources are generally safe, but not all collections split well. LinkedList.parallelStream() is essentially sequential because LinkedList cannot be split efficiently:

// LinkedList: trySplit() traverses to the midpoint — O(n) cost per split
List<String> linked = new LinkedList<>(names);
linked.parallelStream().map(String::toUpperCase).collect(toList());
// Almost certainly slower than sequential

// ArrayList: O(1) split
List<String> array = new ArrayList<>(names);
array.parallelStream().map(String::toUpperCase).collect(toList());
// Can be faster for large lists with expensive per-element work

Not measuring before and after

Every parallel stream usage should be accompanied by a benchmark. The JMH microbenchmark framework is the standard tool:

// pom.xml dependency
// <dependency>
//   <groupId>org.openjdk.jmh</groupId>
//   <artifactId>jmh-core</artifactId>
//   <version>1.37</version>
// </dependency>

@Benchmark
public long sequentialSum(BenchmarkState state) {
    return LongStream.range(0, state.N).map(n -> n * n % 1_000_000_007L).sum();
}

@Benchmark
public long parallelSum(BenchmarkState state) {
    return LongStream.range(0, state.N).parallel().map(n -> n * n % 1_000_000_007L).sum();
}

Without measuring, you are guessing.

Summary

Concept	Key point
How it works	ForkJoin splits source → process in parallel → merge results
Default pool	`ForkJoinPool.commonPool()`, size = CPU count - 1
Spliterator	Enables splitting; `ArrayList` / arrays split best
Ordering	`collect(toList())` preserves order; `forEach` does not
Safe use	Large + CPU-bound + stateless + no ordering side effects
Avoid when	Small data, I/O-bound, shared mutable state, order matters
Always do	Measure with JMH before and after adding `.parallel()`

Next Step

Optional: Eliminating NullPointerException the Right Way →

Part of the DevOps Monk Java tutorial series: Java 8 → Java 11 → Java 17 → Java 21