Part 6 of 16

Streams API: Lazy Pipelines and the Functional Data Model

The Problem Streams Solve

Java loops are not wrong — they are often the right tool. But when a computation is a multi-step pipeline of filter → transform → aggregate, a loop hides the structure of the computation in a tangle of temporary variables and mutated collections. The Streams API makes the structure explicit, lets the JVM optimise it, and composes naturally with lambdas and method references.

Consider filtering a list of orders to find the names of active premium customers who spent over $500, sorted alphabetically:

// Java 7
List<String> result = new ArrayList<>();
for (Order order : orders) {
    if (order.isActive() && order.isPremium() && order.getTotal() > 500) {
        result.add(order.getCustomerName());
    }
}
Collections.sort(result);

This code is correct but reveals nothing about the structure of the computation at a glance. You have to read every line to understand what’s happening.

// Java 8 Streams
List<String> result = orders.stream()
    .filter(Order::isActive)
    .filter(Order::isPremium)
    .filter(o -> o.getTotal() > 500)
    .map(Order::getCustomerName)
    .sorted()
    .collect(Collectors.toList());

The pipeline reads top-to-bottom like a specification: filter active, filter premium, filter by amount, extract names, sort, collect. The structure of the computation is immediately visible.


Streams vs Collections

A Stream is not just a fancy Iterable. Five key differences:

PropertyCollectionStream
StorageHolds all elements in memoryHolds no elements — moves data through a pipeline
NatureEagerly computed; elements exist before you iterateLazily computed; elements are produced on demand
ReuseIterate as many times as you likeSingle-use — consumed by the terminal operation
ModificationYou can add, remove, or update elementsNo modification — a new stream is always produced
SizeAlways finiteCan be infinite (e.g. Stream.iterate, Stream.generate)

The practical consequence: you cannot re-traverse a stream. If you call two terminal operations on the same stream, the second throws IllegalStateException. Always go back to the source collection for a new pipeline.


What Is a Stream?

A Stream<T> is a sequence of elements that supports sequential and parallel aggregate operations. Key properties:

  • Not a data structure — a stream does not hold data; it moves data through a pipeline
  • Lazy — intermediate operations are not executed until a terminal operation is called
  • Single-use — once a terminal operation is called, the stream is consumed and cannot be reused
  • Non-destructive — stream operations never modify the source collection
  • Optionally parallel — swap .stream() for .parallelStream() and the pipeline runs in parallel

Creating Streams

From Collections

List<String> list = Arrays.asList("a", "b", "c");
Stream<String> stream = list.stream();
Stream<String> parallel = list.parallelStream();

From Arrays

String[] arr = {"x", "y", "z"};
Stream<String> stream = Arrays.stream(arr);

// Partial array
Stream<String> partial = Arrays.stream(arr, 1, 3); // "y", "z"

From Values (Stream.of)

Stream<String> stream = Stream.of("a", "b", "c");
Stream<String> empty  = Stream.empty();
Stream<String> single = Stream.of("one");

From a Range (IntStream, LongStream)

IntStream range = IntStream.range(0, 10);      // 0..9
IntStream rangeClosed = IntStream.rangeClosed(1, 5); // 1..5
LongStream longRange = LongStream.range(0L, 100L);

Infinite Streams

// Stream.iterate: seed + unary operator
Stream<Integer> naturals = Stream.iterate(0, n -> n + 1);
// 0, 1, 2, 3, 4, ...

// Stream.generate: supplier
Stream<Double> randoms = Stream.generate(Math::random);

// Always limit infinite streams before collecting
naturals.limit(10).forEach(System.out::println);

Fibonacci sequence with Stream.iterate — pass a two-element array as the seed so each step can see both the previous and current value:

Stream.iterate(new int[] { 0, 1 }, a -> new int[] { a[1], a[0] + a[1] })
      .limit(10)
      .map(a -> a[0])
      .forEach(n -> System.out.print(n + " "));
// Output: 0 1 1 2 3 5 8 13 21 34

Each step returns a new array where [0] becomes the next Fibonacci number and [1] becomes the one after. The map(a -> a[0]) extracts just the leading value for output.

From Files and I/O

// Lines of a file (Java 8+)
try (Stream<String> lines = Files.lines(Paths.get("data.txt"))) {
    lines.filter(l -> !l.isBlank())
         .forEach(System.out::println);
}
// IMPORTANT: always use try-with-resources — streams backed by I/O hold a resource

// Files in a directory
try (Stream<Path> paths = Files.list(Paths.get("."))) {
    paths.filter(Files::isRegularFile)
         .forEach(System.out::println);
}

From String Characters

// IntStream of char values
"Hello".chars().forEach(c -> System.out.print((char) c));

The Pipeline Model

A stream pipeline has three parts:

source → [intermediate ops]* → terminal op
  1. Source — a collection, array, or generator
  2. Intermediate operations — transform the stream; they return a new Stream and are lazy
  3. Terminal operation — produces a result or side effect; it triggers execution of the entire pipeline
List<String> result = names.stream()           // 1. source
    .filter(s -> s.length() > 3)               // 2. intermediate
    .map(String::toUpperCase)                  // 2. intermediate
    .sorted()                                  // 2. intermediate
    .collect(Collectors.toList());             // 3. terminal — triggers execution

Without the terminal operation, nothing runs. This is the key insight:

Stream<String> pipeline = names.stream()
    .filter(s -> { System.out.println("filter: " + s); return s.length() > 3; })
    .map(s -> { System.out.println("map: " + s); return s.toUpperCase(); });

// Nothing printed yet — pipeline not started
System.out.println("Before terminal");

List<String> result = pipeline.collect(Collectors.toList()); // NOW it runs

Output:

Before terminal
filter: Alice
map: Alice
filter: Bob
filter: Charlie
map: Charlie

Note the interleaving: Java processes elements one at a time through the pipeline, not stage-by-stage.


Intermediate Operations

filter — keep elements matching a Predicate

stream.filter(s -> s.startsWith("A"))
stream.filter(Predicate.not(String::isEmpty))  // Java 11+

map — transform elements with a Function

stream.map(String::toUpperCase)
stream.map(s -> s.length())  // Stream<String> → Stream<Integer>

mapToInt / mapToLong / mapToDouble

Avoid boxing overhead when mapping to primitives:

// Creates IntStream, not Stream<Integer> — no boxing
IntStream lengths = names.stream().mapToInt(String::length);
int totalLength = lengths.sum();

// Box back to object stream if needed
Stream<Integer> boxed = IntStream.range(0, 10).boxed();

flatMap — flatten nested streams

// Each order has a list of items
List<Item> allItems = orders.stream()
    .flatMap(order -> order.getItems().stream())
    .collect(Collectors.toList());

// Split sentences into words
List<String> words = sentences.stream()
    .flatMap(s -> Arrays.stream(s.split(" ")))
    .distinct()
    .collect(Collectors.toList());

flatMap maps each element to a stream, then flattens all those streams into one.

distinct — remove duplicates

stream.distinct()  // uses equals/hashCode

sorted — sort elements

stream.sorted()                                    // natural order (Comparable)
stream.sorted(Comparator.reverseOrder())           // reverse natural
stream.sorted(Comparator.comparing(Person::getAge)) // by field

limit — take the first N elements

stream.limit(5)

skip — skip the first N elements

stream.skip(10)  // useful for pagination

peek — inspect elements without consuming (debugging)

stream.peek(s -> System.out.println("Before map: " + s))
      .map(String::toUpperCase)
      .peek(s -> System.out.println("After map: " + s))

Use peek only for debugging. Do not use it for side effects in production code — its behaviour with short-circuiting and parallel streams is unreliable.


Terminal Operations

collect — accumulate into a collection

The most common terminal operation. See Article 7 for the full Collectors guide.

List<String> list   = stream.collect(Collectors.toList());
Set<String> set     = stream.collect(Collectors.toSet());
String joined       = stream.collect(Collectors.joining(", "));

forEach — execute a Consumer for each element

stream.forEach(System.out::println);

Order is not guaranteed for parallel streams. Use forEachOrdered if order matters.

count

long count = stream.filter(s -> s.length() > 3).count();

reduce — fold elements into one value

Two-argument form (identity + accumulator):

Optional<Integer> sum = numbers.stream().reduce((a, b) -> a + b);
int sumWithIdentity    = numbers.stream().reduce(0, Integer::sum);

Three-argument form (identity + accumulator + combiner) — required for parallel streams when the stream type and the result type differ:

// Sum the lengths of all strings — parallel-safe
int totalLength = Stream.of("Java", "Streams", "API")
    .parallel()
    .reduce(
        0,                           // identity: starting value for each thread's partition
        (sum, s) -> sum + s.length(), // accumulator: fold one string into the running total
        Integer::sum                  // combiner: merge two partial totals from different threads
    );
// Output: 15

The combiner is only called in parallel execution to merge the partial results computed by each thread. In a sequential stream it is never invoked. If you omit the combiner (use the two-argument form), parallel streams with a type mismatch (here Integer result from a Stream<String>) will not compile.

findFirst / findAny

Optional<String> first = stream.filter(s -> s.startsWith("A")).findFirst();
Optional<String> any   = parallelStream.filter(s -> s.startsWith("A")).findAny();

findAny is faster in parallel pipelines when you don’t care which matching element you get.

anyMatch / allMatch / noneMatch

Short-circuit terminal operations:

boolean hasLong   = names.stream().anyMatch(s -> s.length() > 10);
boolean allShort  = names.stream().allMatch(s -> s.length() < 20);
boolean noneEmpty = names.stream().noneMatch(String::isEmpty);

anyMatch stops as soon as it finds a match; allMatch stops as soon as it finds a non-match.

min / max

Optional<String> shortest = names.stream().min(Comparator.comparingInt(String::length));
Optional<String> longest  = names.stream().max(Comparator.comparingInt(String::length));

toArray

Object[] arr   = stream.toArray();
String[] strArr = stream.toArray(String[]::new);  // constructor reference

sum / average / summaryStatistics (primitive streams)

int total = numbers.stream().mapToInt(Integer::intValue).sum();
OptionalDouble avg = numbers.stream().mapToInt(Integer::intValue).average();

IntSummaryStatistics stats = numbers.stream()
    .mapToInt(Integer::intValue)
    .summaryStatistics();
// stats.getCount(), getSum(), getMin(), getMax(), getAverage()

Laziness and Short-Circuiting

Laziness means intermediate operations accumulate a description of the pipeline without doing work. The terminal operation drives execution.

Short-circuiting means some operations stop processing early:

// Only processes elements until it finds the first match
Optional<String> first = Stream.iterate(0, n -> n + 1)
    .map(n -> n * n)
    .filter(n -> n > 100)
    .map(Object::toString)
    .findFirst();
// Does NOT process all integers — stops after finding 121

Short-circuiting terminal operations: findFirst, findAny, anyMatch, allMatch, noneMatch, limit.


Stateful vs. Stateless Operations

Stateless operations process each element independently: filter, map, flatMap, peek, mapToInt.

Stateful operations require seeing other elements to produce a result: sorted, distinct, limit, skip.

Stateful operations can be expensive in parallel streams because they require coordination across threads.


Common Pitfalls

Reusing a Stream

Stream<String> stream = names.stream().filter(s -> s.length() > 3);
stream.collect(Collectors.toList()); // OK
stream.count(); // THROWS: IllegalStateException: stream has already been operated upon or closed

Always create a new stream from the source for each pipeline.

Forgetting the Terminal Operation

// Does nothing — no terminal op
names.stream().filter(s -> s.length() > 3).map(String::toUpperCase);

// Fix: add terminal op
names.stream().filter(s -> s.length() > 3).map(String::toUpperCase).forEach(System.out::println);

Using forEach When collect Is Better

// WRONG: side-effectful, not thread-safe, harder to reason about
List<String> result = new ArrayList<>();
names.stream().filter(s -> s.length() > 3).forEach(result::add);

// RIGHT
List<String> result = names.stream().filter(s -> s.length() > 3).collect(Collectors.toList());

Stream vs Loop Performance

The question “are streams slower than loops?” has a nuanced answer. Here is what benchmarks consistently show.

Sequential stream overhead is small and predictable

For a simple filter-map-collect over an in-memory ArrayList, a sequential stream is typically 5–15% slower than a plain for loop on a cold JVM. After JIT warm-up, the difference is negligible for most workloads. The overhead comes from the Sink indirection layer that connects pipeline stages.

Practical rule: If a pipeline runs fewer than ~100,000 times per second and is not on a CPU-critical hot path, the overhead is not worth measuring.

Boxing overhead is the real killer

The single biggest performance trap in streams is unintentional boxing:

List<Integer> numbers = ...; // List of 1 million integers

// SLOW: each Integer is unboxed to int for the operation, then boxed back
int sum = numbers.stream()
    .map(n -> n * 2)                // Stream<Integer> — every element is boxed
    .reduce(0, Integer::sum);

// FAST: no boxing at all
int sum = numbers.stream()
    .mapToInt(Integer::intValue)    // IntStream — primitive from here on
    .map(n -> n * 2)
    .sum();

Stream<Integer> can be 5–10x slower than IntStream on large numeric datasets because every element is wrapped in an Integer object, causing heap allocation and GC pressure. Always use mapToInt, mapToLong, or mapToDouble when processing numeric data.

Pipeline fusion: where streams can beat loops

A key advantage of streams is pipeline fusion: the JVM collapses multiple intermediate operations into a single pass over the data. A pipeline with five chained operations processes each element once, never producing an intermediate collection.

// Stream: one pass — filter, map, and collect happen element by element
List<String> result = orders.stream()
    .filter(Order::isActive)
    .filter(o -> o.getTotal() > 500)
    .map(Order::getCustomerName)
    .sorted()
    .collect(Collectors.toList());

// Equivalent loop: also one pass for filter+map, but sorted() still needs a full buffer

For pipelines with many intermediate steps and short-circuiting terminals (findFirst, anyMatch), streams can be faster than the equivalent loop because they stop early without processing the entire source.

When to choose streams vs loops

SituationPrefer
Multi-step filter + transform + collectStream
Simple index-based access (e.g. list.get(i))Loop
Accumulating into an external structureStream with collect
Breaking early on a conditionStream with findFirst/anyMatch or loop with break
Numeric computation over large arraysIntStream/LongStream
Checked exceptions that must propagateLoop (or wrapper method)
Readability is the priorityStream
Microsecond-level latency critical pathBenchmark both; loop is safer default

The bottom line: streams are a composability tool first and a performance tool second. Write for clarity; only optimise for performance when a profiler tells you to.


When to Use Streams (Decision Guide)

Data operationRight tool
Filter a collection, collect resultstream().filter().collect()
Transform each elementstream().map().collect()
Aggregate (sum, count, group)stream().collect(Collectors.*)
Process each element for side effectslist.forEach() or plain loop
Find first/any matchstream().filter().findFirst()
Check if any/all/none matchstream().anyMatch() / allMatch() / noneMatch()
Numeric sum / average over primitivesstream().mapToInt().sum() / .average()
Multi-source data (flatten)stream().flatMap()
Infinite sequenceStream.iterate() / Stream.generate() with limit()
Index-based access to list elementsPlain for loop

Summary

ConceptKey point
Lazy evaluationIntermediate ops accumulate; terminal op triggers execution
Pipeline modelsource → intermediate ops → terminal op
Stateless vs statefulfilter/map are stateless; sorted/distinct are stateful
Short-circuitingfindFirst, anyMatch, limit stop early
Single useA consumed stream cannot be reused
PerformanceSequential stream overhead is small; boxing is the real trap — use IntStream for numeric work

Next Step

Advanced Streams: flatMap, Collectors, Grouping, and Partitioning →

Part of the DevOps Monk Java tutorial series: Java 8Java 11Java 17Java 21