Home
Java from First Principles / Chapter 17 — Streams API

Streams API

Filter, map, reduce — Java's functional pipeline for collections. When streams are clearer than loops, and when they aren't.


Loops, but declarative

Before streams, processing a collection in Java meant explicit loops:

Java
List<String> activeNames = new ArrayList<>();
for (User u : users) {
    if (u.isActive()) {
        activeNames.add(u.getName().toUpperCase());
    }
}
Collections.sort(activeNames);

With streams (Java 8+):

Java
List<String> activeNames = users.stream()
    .filter(User::isActive)
    .map(User::getName)
    .map(String::toUpperCase)
    .sorted()
    .toList();

The stream version describes WHAT you want — active users' names, uppercased, sorted. The loop version describes HOW step by step. Both produce the same result.

Streams aren't always better. For simple operations on small collections, a plain loop is often clearer. But for multi-step pipelines, especially anything involving filter/transform/group/sort, streams win on readability once you're used to them.

This chapter covers the pipeline mental model, the main operations, and the practical limits.


The pipeline mental model

A stream has three parts:

Stream pipeline: source, intermediate operations, terminal operation Stream pipeline users.stream().filter(u → u.active).map(User::name).sorted().collect(toList()) Source List, Set, array, file lines, generator… user 1 user 2 user 3 ... .filter(...) drops inactive lazy — no work yet .map(...) User → String lazy .sorted() alphabetical lazy .collect(...) terminal — runs the pipeline "Alice" "Bob" "Carol" intermediate (lazy, returns Stream) intermediate (lazy) intermediate (lazy) terminal (eager) source Nothing happens until the terminal operation runs. The pipeline processes one element at a time, end-to-end, when possible.
A stream pipeline has three parts: a source, zero or more intermediate operations (filter, map, sorted, …), and exactly one terminal operation (collect, count, forEach, …). Intermediate ops are lazy — they don't do anything until the terminal op triggers the actual work.

**Source.** A collection, an array, a file, a generator. Anything you can call .stream() on, plus Stream.of(...), Files.lines(...), IntStream.range(...), etc.

**Zero or more intermediate operations.** filter, map, sorted, distinct, limit, skip, flatMap, peek. Each returns a new Stream. They're **lazy** — they don't actually process anything until a terminal operation triggers the work.

**Exactly one terminal operation.** collect, count, forEach, reduce, findFirst, anyMatch, toList() (Java 16+). The terminal op consumes the stream and produces a result.

**Streams are single-use.** Once you've called a terminal operation, the stream is exhausted. You can't go back to the start:

Java
Stream<String> s = list.stream();
s.count();         // 5
s.count();         // IllegalStateException — stream already used

If you need the same source twice, call .stream() twice on the original collection.


Common intermediate operations

**filter(Predicate)** — keep elements that match:

Java
users.stream().filter(u -> u.getAge() >= 18)

**map(Function)** — transform each element:

Java
users.stream().map(User::getName)   // Stream<User> → Stream<String>

**flatMap(Function)** — transform each element into a stream, then flatten:

Java
List<List<String>> roles = ...;
roles.stream().flatMap(List::stream)   // Stream<List<String>> → Stream<String>

orders.stream().flatMap(o -> o.getItems().stream())   // all items across all orders

flatMap is the operation people miss most often. Use it whenever you have a stream of collections and want a stream of their elements.

**distinct()** — remove duplicates (using .equals()).

**sorted()** / **sorted(Comparator)** — sort. With no argument, requires elements to be Comparable.

**limit(n) / skip(n)** — take first n / skip first n. Often used together for pagination.

**peek(Consumer)** — perform a side effect without consuming. Mostly useful for debugging:

Java
users.stream()
    .filter(User::isActive)
    .peek(u -> log.debug("checking {}", u))
    .map(User::getName)
    .toList();

Don't use peek for real side effects in production code — the rules around when it runs are unintuitive.


Common terminal operations

**toList()** (Java 16+) — collect into an unmodifiable List:

Java
List<String> names = users.stream().map(User::getName).toList();

This replaced the older .collect(Collectors.toList()). Modern code uses .toList().

**collect(Collector)** — flexible collection into various shapes:

Java
// Set
Set<String> nameSet = users.stream().map(User::getName).collect(Collectors.toSet());

// Map by key
Map<Long, User> byId = users.stream().collect(Collectors.toMap(User::getId, u -> u));

// Group by attribute
Map<String, List<User>> byRole = users.stream()
    .collect(Collectors.groupingBy(User::getRole));

// Count by attribute
Map<String, Long> countByRole = users.stream()
    .collect(Collectors.groupingBy(User::getRole, Collectors.counting()));

// Join into a single string
String joined = users.stream().map(User::getName)
    .collect(Collectors.joining(", "));

**count()** — number of elements:

Java
long activeCount = users.stream().filter(User::isActive).count();

**anyMatch / allMatch / noneMatch** — short-circuit boolean checks:

Java
boolean hasAdmin = users.stream().anyMatch(u -> u.hasRole("ADMIN"));
boolean allActive = users.stream().allMatch(User::isActive);

These short-circuit — anyMatch stops as soon as it finds a match.

**findFirst / findAny** — return an Optional containing one element:

Java
Optional<User> admin = users.stream().filter(u -> u.hasRole("ADMIN")).findFirst();

**min / max** — extreme by comparator:

Java
Optional<User> oldest = users.stream().max(Comparator.comparingInt(User::getAge));

**reduce** — combine all elements into one result:

Java
int totalAge = users.stream().mapToInt(User::getAge).sum();
String concat = words.stream().reduce("", (a, b) -> a + b);

For numbers, prefer mapToInt/mapToLong/mapToDouble followed by sum() or average() — they avoid autoboxing.

**forEach(Consumer)** — perform an action on each element. Use sparingly; if you're just doing side effects, a plain loop is often clearer.


Real-world patterns

**Group and count.** "How many orders per status?"

Java
Map<Status, Long> counts = orders.stream()
    .collect(Collectors.groupingBy(Order::getStatus, Collectors.counting()));

**Aggregate within groups.** "Total revenue per region."

Java
Map<String, BigDecimal> revenueByRegion = orders.stream()
    .collect(Collectors.groupingBy(
        Order::getRegion,
        Collectors.reducing(BigDecimal.ZERO, Order::getAmount, BigDecimal::add)
    ));

**Top N.** "The five oldest users."

Java
List<User> oldest = users.stream()
    .sorted(Comparator.comparingInt(User::getAge).reversed())
    .limit(5)
    .toList();

**Index pairs with IntStream.range.** "Pair each item with its index."

Java
IntStream.range(0, names.size())
    .mapToObj(i -> i + ": " + names.get(i))
    .forEach(System.out::println);

**Read lines from a file.**

Java
try (Stream<String> lines = Files.lines(Path.of("data.txt"))) {
    long count = lines.filter(l -> !l.isBlank()).count();
}

Note the try-with-resources — file streams need closing.

**Filter then aggregate.** "Total revenue from active customers."

Java
BigDecimal total = orders.stream()
    .filter(o -> o.getCustomer().isActive())
    .map(Order::getAmount)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

Parallel streams — and why you usually shouldn't

Adding .parallel() makes the stream process elements in parallel across multiple threads:

Java
long count = users.parallelStream().filter(User::isActive).count();

The JVM splits the work across threads in the common ForkJoinPool. For CPU-heavy operations on large collections, this can be a significant speedup.

**Why you usually shouldn't:**

**When parallel streams genuinely help:**
- The collection is large (thousands+ elements).
- The per-element work is CPU-heavy (parsing, encoding, calculations).
- The operations are stateless and order-independent.
- You've profiled and confirmed sequential streams are the bottleneck.

For most application code, leave .parallel() off. If you need real parallelism, prefer CompletableFuture for concurrent I/O or explicit ExecutorService for controlled CPU parallelism.


When NOT to use streams

Streams are a tool, not a religion. They make some code clearer, other code worse.

**Plain loops are often clearer for simple cases.**

Java
// Stream
int max = values.stream().mapToInt(Integer::intValue).max().orElse(0);

// Loop — arguably clearer
int max = 0;
for (int v : values) {
    if (v > max) max = v;
}

For one-step operations on a list, the loop is fine. Streams shine for multi-step pipelines.

**Performance-critical hot loops.** Streams have overhead — pipeline construction, lambda dispatch, possibly boxing. For tight numeric loops, a plain for is faster. Profile if it matters.

**Side-effect-heavy logic.** Streams are designed for transformation pipelines. If your loop body does I/O, logging, updating multiple counters, or other side effects, a plain loop is more honest.

**Multi-collection coordination.** When you need to iterate two collections together, or use the index, plain for-loops or IntStream.range are often more readable than complex stream compositions.

**Debuggability.** Stack traces from streams point at the synthesised pipeline methods, not your business logic. For deeply complex pipelines, this can hurt debugging. Use .peek() or extract sub-pipelines into named methods if it matters.

The rule of thumb: streams are great when you have a clear "from X transform to Y" shape. They get worse the more side effects or interactions creep in. Trust your instinct — if the stream version is harder to read out loud, write the loop.


⁂ Back to all modules