Streams API
Filter, map, reduce — Java's functional pipeline for collections. When streams are clearer than loops, and when they aren't.
Loops, but declarative
Before streams, processing a collection in Java meant explicit loops:
List<String> activeNames = new ArrayList<>();
for (User u : users) {
if (u.isActive()) {
activeNames.add(u.getName().toUpperCase());
}
}
Collections.sort(activeNames);
With streams (Java 8+):
List<String> activeNames = users.stream()
.filter(User::isActive)
.map(User::getName)
.map(String::toUpperCase)
.sorted()
.toList();
The stream version describes WHAT you want — active users' names, uppercased, sorted. The loop version describes HOW step by step. Both produce the same result.
Streams aren't always better. For simple operations on small collections, a plain loop is often clearer. But for multi-step pipelines, especially anything involving filter/transform/group/sort, streams win on readability once you're used to them.
This chapter covers the pipeline mental model, the main operations, and the practical limits.
The pipeline mental model
A stream has three parts:
**Source.** A collection, an array, a file, a generator. Anything you can call .stream() on, plus Stream.of(...), Files.lines(...), IntStream.range(...), etc.
**Zero or more intermediate operations.** filter, map, sorted, distinct, limit, skip, flatMap, peek. Each returns a new Stream. They're **lazy** — they don't actually process anything until a terminal operation triggers the work.
**Exactly one terminal operation.** collect, count, forEach, reduce, findFirst, anyMatch, toList() (Java 16+). The terminal op consumes the stream and produces a result.
**Streams are single-use.** Once you've called a terminal operation, the stream is exhausted. You can't go back to the start:
Stream<String> s = list.stream();
s.count(); // 5
s.count(); // IllegalStateException — stream already used
If you need the same source twice, call .stream() twice on the original collection.
Common intermediate operations
**filter(Predicate)** — keep elements that match:
users.stream().filter(u -> u.getAge() >= 18)
**map(Function)** — transform each element:
users.stream().map(User::getName) // Stream<User> → Stream<String>
**flatMap(Function)** — transform each element into a stream, then flatten:
List<List<String>> roles = ...;
roles.stream().flatMap(List::stream) // Stream<List<String>> → Stream<String>
orders.stream().flatMap(o -> o.getItems().stream()) // all items across all orders
flatMap is the operation people miss most often. Use it whenever you have a stream of collections and want a stream of their elements.
**distinct()** — remove duplicates (using .equals()).
**sorted()** / **sorted(Comparator)** — sort. With no argument, requires elements to be Comparable.
**limit(n) / skip(n)** — take first n / skip first n. Often used together for pagination.
**peek(Consumer)** — perform a side effect without consuming. Mostly useful for debugging:
users.stream()
.filter(User::isActive)
.peek(u -> log.debug("checking {}", u))
.map(User::getName)
.toList();
Don't use peek for real side effects in production code — the rules around when it runs are unintuitive.
Common terminal operations
**toList()** (Java 16+) — collect into an unmodifiable List:
List<String> names = users.stream().map(User::getName).toList();
This replaced the older .collect(Collectors.toList()). Modern code uses .toList().
**collect(Collector)** — flexible collection into various shapes:
// Set
Set<String> nameSet = users.stream().map(User::getName).collect(Collectors.toSet());
// Map by key
Map<Long, User> byId = users.stream().collect(Collectors.toMap(User::getId, u -> u));
// Group by attribute
Map<String, List<User>> byRole = users.stream()
.collect(Collectors.groupingBy(User::getRole));
// Count by attribute
Map<String, Long> countByRole = users.stream()
.collect(Collectors.groupingBy(User::getRole, Collectors.counting()));
// Join into a single string
String joined = users.stream().map(User::getName)
.collect(Collectors.joining(", "));
**count()** — number of elements:
long activeCount = users.stream().filter(User::isActive).count();
**anyMatch / allMatch / noneMatch** — short-circuit boolean checks:
boolean hasAdmin = users.stream().anyMatch(u -> u.hasRole("ADMIN"));
boolean allActive = users.stream().allMatch(User::isActive);
These short-circuit — anyMatch stops as soon as it finds a match.
**findFirst / findAny** — return an Optional containing one element:
Optional<User> admin = users.stream().filter(u -> u.hasRole("ADMIN")).findFirst();
**min / max** — extreme by comparator:
Optional<User> oldest = users.stream().max(Comparator.comparingInt(User::getAge));
**reduce** — combine all elements into one result:
int totalAge = users.stream().mapToInt(User::getAge).sum();
String concat = words.stream().reduce("", (a, b) -> a + b);
For numbers, prefer mapToInt/mapToLong/mapToDouble followed by sum() or average() — they avoid autoboxing.
**forEach(Consumer)** — perform an action on each element. Use sparingly; if you're just doing side effects, a plain loop is often clearer.
Real-world patterns
**Group and count.** "How many orders per status?"
Map<Status, Long> counts = orders.stream()
.collect(Collectors.groupingBy(Order::getStatus, Collectors.counting()));
**Aggregate within groups.** "Total revenue per region."
Map<String, BigDecimal> revenueByRegion = orders.stream()
.collect(Collectors.groupingBy(
Order::getRegion,
Collectors.reducing(BigDecimal.ZERO, Order::getAmount, BigDecimal::add)
));
**Top N.** "The five oldest users."
List<User> oldest = users.stream()
.sorted(Comparator.comparingInt(User::getAge).reversed())
.limit(5)
.toList();
**Index pairs with IntStream.range.** "Pair each item with its index."
IntStream.range(0, names.size())
.mapToObj(i -> i + ": " + names.get(i))
.forEach(System.out::println);
**Read lines from a file.**
try (Stream<String> lines = Files.lines(Path.of("data.txt"))) {
long count = lines.filter(l -> !l.isBlank()).count();
}
Note the try-with-resources — file streams need closing.
**Filter then aggregate.** "Total revenue from active customers."
BigDecimal total = orders.stream()
.filter(o -> o.getCustomer().isActive())
.map(Order::getAmount)
.reduce(BigDecimal.ZERO, BigDecimal::add);
Parallel streams — and why you usually shouldn't
Adding .parallel() makes the stream process elements in parallel across multiple threads:
long count = users.parallelStream().filter(User::isActive).count();
The JVM splits the work across threads in the common ForkJoinPool. For CPU-heavy operations on large collections, this can be a significant speedup.
**Why you usually shouldn't:**
- **Most stream pipelines are I/O-bound or simple**, not CPU-bound. Parallelism adds overhead without gains.
- **Order is no longer guaranteed** in some operations.
findFirstbecomesfindAny-like in practice. - **The common ForkJoinPool is shared.** All parallel streams in the JVM compete for the same threads. One badly-behaved stream affects everything.
- **Side effects become race conditions.** If your lambdas mutate shared state, parallel streams will corrupt it.
- **Stateful operations like
sortedanddistinctwork, but lose their efficiency in parallel mode.**
**When parallel streams genuinely help:**
- The collection is large (thousands+ elements).
- The per-element work is CPU-heavy (parsing, encoding, calculations).
- The operations are stateless and order-independent.
- You've profiled and confirmed sequential streams are the bottleneck.
For most application code, leave .parallel() off. If you need real parallelism, prefer CompletableFuture for concurrent I/O or explicit ExecutorService for controlled CPU parallelism.
When NOT to use streams
Streams are a tool, not a religion. They make some code clearer, other code worse.
**Plain loops are often clearer for simple cases.**
// Stream
int max = values.stream().mapToInt(Integer::intValue).max().orElse(0);
// Loop — arguably clearer
int max = 0;
for (int v : values) {
if (v > max) max = v;
}
For one-step operations on a list, the loop is fine. Streams shine for multi-step pipelines.
**Performance-critical hot loops.** Streams have overhead — pipeline construction, lambda dispatch, possibly boxing. For tight numeric loops, a plain for is faster. Profile if it matters.
**Side-effect-heavy logic.** Streams are designed for transformation pipelines. If your loop body does I/O, logging, updating multiple counters, or other side effects, a plain loop is more honest.
**Multi-collection coordination.** When you need to iterate two collections together, or use the index, plain for-loops or IntStream.range are often more readable than complex stream compositions.
**Debuggability.** Stack traces from streams point at the synthesised pipeline methods, not your business logic. For deeply complex pipelines, this can hurt debugging. Use .peek() or extract sub-pipelines into named methods if it matters.
The rule of thumb: streams are great when you have a clear "from X transform to Y" shape. They get worse the more side effects or interactions creep in. Trust your instinct — if the stream version is harder to read out loud, write the loop.
⁂ Back to all modules