Stream API
Since Java 8, the Stream API has revolutionized the way we process collections and other data sources. Moving from imperative loops to a declarative, pipeline-based approach, streams offer concise, readable, and potentially parallel data workflows. This post covers everything—from core concepts and operations to best practices, performance tips, and a full API reference.
What Is the Stream API?
-
Not a data structure but a view over data that you can filter, map, reduce, and collect.
-
Lazy: intermediate operations build a pipeline; nothing executes until a terminal operation.
-
Possibly infinite: e.g.,
Stream.iterate(...)
can generate endless data—uselimit(...)
to cap it. -
One-time use: after a terminal operation, the stream is consumed and cannot be reused.
Stream Pipeline Architecture
A stream pipeline has three stages:
-
Source
-
Collections (
List
,Set
), arrays, I/O (Files.lines(...)
), generators.
-
-
Intermediate Operations (zero or more)
-
Stateless: no memory between elements (
map
,filter
,peek
). -
Stateful: require state (e.g.,
sorted
,distinct
,limit
,skip
,takeWhile
,dropWhile
).
-
-
Terminal Operation (one)
-
Produces a result or side-effect (
collect
,reduce
,forEach
,count
,findFirst
, …).
-
Under the hood, each stage wraps the next; a Spliterator
“pulls” elements through the chain only when needed.
Creating Streams
// From a Collection
List<String> list = List.of("a", "b", "c");
Stream<String> s1 = list.stream();
Stream<String> p1 = list.parallelStream();
// From an Array
Stream<Integer> s2 = Arrays.stream(new Integer[]{1, 2, 3});
// Of values or generators
Stream<String> s3 = Stream.of("x", "y", "z");
Stream<Double> randoms = Stream.generate(Math::random).limit(5);
Stream<long[]> fibSeq = Stream.iterate(new long[]{0,1},
p -> new long[]{p[1], p[0]+p[1]})
.limit(10);
// Primitive streams
IntStream ints = IntStream.rangeClosed(1, 10);
LongStream longs = LongStream.of(100L, 200L, 300L);
DoubleStream dbls = DoubleStream.generate(Math::random).limit(3);
Intermediate Operations
(Build up the pipeline—lazy until a terminal op)
Stateless
-
filter(Predicate<? super T>)
-
map(Function<? super T,? extends R>)
-
flatMap(Function<? super T,? extends Stream<? extends R>>)
-
peek(Consumer<? super T>)
Stateful
-
distinct()
-
sorted()
/sorted(Comparator<? super T>)
-
limit(long maxSize)
-
skip(long n)
-
takeWhile(Predicate<? super T>)
(Java 9+) -
dropWhile(Predicate<? super T>)
(Java 9+)
Parallel & Ordering Controls
-
parallel()
/sequential()
-
unordered()
List<String> result =
list.stream()
.filter(s -> s.length() > 1)
.distinct()
.sorted()
.map(String::toUpperCase)
.peek(s -> System.out.println("Processed: " + s))
.collect(Collectors.toList());
Terminal Operations
(Trigger execution)
Reduction & Matching
-
count()
-
min(Comparator<? super T>)
,max(...)
-
anyMatch(Predicate<? super T>)
,allMatch(...)
,noneMatch(...)
-
findFirst()
,findAny()
-
reduce(...)
Collection & Aggregation
-
collect(Collector<? super T, A, R>)
-
collect(Supplier<R>, BiConsumer<R,? super T>, BiConsumer<R,R>)
Traversal & Side-effects
-
forEach(Consumer<? super T>)
,forEachOrdered(...)
Array & Iteration
-
toArray()
,toArray(IntFunction<A[]>)
-
iterator()
,spliterator()
// Sum of even numbers 1–10
int sumEven = IntStream.rangeClosed(1, 10)
.filter(n -> n % 2 == 0)
.sum();
// Word count from a file
List<String> top3Words = Files.lines(Path.of("data.txt"))
.flatMap(line -> Stream.of(line.split("\\W+")))
.map(String::toLowerCase)
.filter(w -> !w.isBlank())
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.<String,Long>comparingByValue(Comparator.reverseOrder()))
.limit(3)
.map(Map.Entry::getKey)
.toList();
Primitive Streams
Avoid boxing overhead with IntStream
, LongStream
, DoubleStream
. They also offer:
-
sum()
,average()
,summaryStatistics()
,min()
,max()
-
Converters:
boxed()
→Stream<Integer>
IntSummaryStatistics stats =
IntStream.of(4, 7, 1, 9, 3).summaryStatistics();
System.out.println("Avg: " + stats.getAverage());
Advanced Collectors
-
groupingBy
Map<Dept, List<Employee>> byDept = employees.stream() .collect(Collectors.groupingBy(Employee::getDept));
-
partitioningBy
Map<Boolean,List<User>> adults = users.stream() .collect(Collectors.partitioningBy(u -> u.getAge() >= 18));
-
toMap
with mergeMap<String,Long> wordCounts = words.stream() .collect(Collectors.toMap(Function.identity(), w -> 1L, Long::sum));
-
collectingAndThen
for post-processingList<User> sorted = users.stream() .collect(Collectors.collectingAndThen( Collectors.toList(), list -> { list.sort(...); return list; } ));
Parallel Streams
long count = bigList.parallelStream()
.filter(x -> x > threshold)
.count();
-
Uses Fork/Join & CPU cores.
-
Great for large, stateless, CPU-bound tasks.
-
Avoid side-effects, shared mutable state, or many stateful ops (
sorted
,distinct
) in parallel.
Lazy Evaluation & Short-Circuiting
-
Lazy: pipelines are built, but no work until terminal.
-
Short-circuit:
anyMatch
,findFirst
,limit
can stop early.
boolean hasJava = words.stream()
.map(String::toUpperCase)
.anyMatch(w -> w.startsWith("J"));
When to Use Streams
-
Data-transformation pipelines: filtering, mapping, grouping.
-
Declarative style: specify what, not how.
-
Easy parallelism:
parallelStream()
. -
One-off bulk ops: no need for index-based loops or manual exit.
Why Use Streams
-
Conciseness: less boilerplate.
-
Readability: pipelines resemble SQL.
-
Lazy evaluation: minimal work.
-
Composability: chain reusable ops.
-
Built-in parallel support.
When Not to Use Streams
-
Tiny datasets: pipeline overhead > benefit.
-
Performance-critical loops needing fine control.
-
Stateful or side-effecting logic inside the stream.
-
Random/indexed access or early
break
/continue
.
Advantages of Streams
-
Immutable-friendly & thread-safe when pure.
-
Rich library of transformations and reductions.
-
Uniform API for object and primitive streams.
-
Clear separation: business logic vs. iteration mechanics.
Complete API Reference
Static Builders
Stream<T> builder(), empty(), of(T…),
ofNullable(T), iterate(...), generate(...), concat(a,b)
Intermediate (lazy)
map, filter, flatMap, peek,
distinct, sorted, limit, skip,
takeWhile, dropWhile,
parallel, sequential, unordered
Terminal
count, min, max,
anyMatch, allMatch, noneMatch,
findFirst, findAny,
reduce(...),
collect(...),
forEach, forEachOrdered,
toArray, iterator, spliterator
Primitive (IntStream
, LongStream
, DoubleStream
)
range, rangeClosed, of, generate, iterate,
filter, map, sum, average, min, max,
summaryStatistics, boxed()
Conclusion
Java’s Stream API empowers you to write declarative, concise, and potentially parallel data-processing code. By mastering its pipeline architecture, operations, and best practices, you can transform messy loops into elegant workflows—boosting readability, maintainability, and performance when used judiciously.
Comments
Post a Comment