Stream API

Posted by karthyk July 05, 2025

Stream API

Since Java 8, the Stream API has revolutionized the way we process collections and other data sources. Moving from imperative loops to a declarative, pipeline-based approach, streams offer concise, readable, and potentially parallel data workflows. This post covers everything—from core concepts and operations to best practices, performance tips, and a full API reference.

What Is the Stream API?

Not a data structure but a view over data that you can filter, map, reduce, and collect.
Lazy: intermediate operations build a pipeline; nothing executes until a terminal operation.
Possibly infinite: e.g., Stream.iterate(...) can generate endless data—use limit(...) to cap it.
One-time use: after a terminal operation, the stream is consumed and cannot be reused.

Stream Pipeline Architecture

A stream pipeline has three stages:

Source
- Collections (List, Set), arrays, I/O (Files.lines(...)), generators.
Intermediate Operations (zero or more)
- Stateless: no memory between elements (map, filter, peek).
- Stateful: require state (e.g., sorted, distinct, limit, skip, takeWhile, dropWhile).
Terminal Operation (one)
- Produces a result or side-effect (collect, reduce, forEach, count, findFirst, …).

Under the hood, each stage wraps the next; a Spliterator “pulls” elements through the chain only when needed.

Creating Streams

// From a Collection
List<String> list = List.of("a", "b", "c");
Stream<String> s1 = list.stream();
Stream<String> p1 = list.parallelStream();

// From an Array
Stream<Integer> s2 = Arrays.stream(new Integer[]{1, 2, 3});

// Of values or generators
Stream<String> s3       = Stream.of("x", "y", "z");
Stream<Double> randoms  = Stream.generate(Math::random).limit(5);
Stream<long[]> fibSeq   = Stream.iterate(new long[]{0,1},
                                          p -> new long[]{p[1], p[0]+p[1]})
                                 .limit(10);

// Primitive streams
IntStream  ints   = IntStream.rangeClosed(1, 10);
LongStream longs  = LongStream.of(100L, 200L, 300L);
DoubleStream dbls = DoubleStream.generate(Math::random).limit(3);

Intermediate Operations

(Build up the pipeline—lazy until a terminal op)

Stateless

filter(Predicate<? super T>)
map(Function<? super T,? extends R>)
flatMap(Function<? super T,? extends Stream<? extends R>>)
peek(Consumer<? super T>)

Stateful

distinct()
sorted() / sorted(Comparator<? super T>)
limit(long maxSize)
skip(long n)
takeWhile(Predicate<? super T>) (Java 9+)
dropWhile(Predicate<? super T>) (Java 9+)

Parallel & Ordering Controls

parallel() / sequential()
unordered()

List<String> result =
  list.stream()
      .filter(s -> s.length() > 1)
      .distinct()
      .sorted()
      .map(String::toUpperCase)
      .peek(s -> System.out.println("Processed: " + s))
      .collect(Collectors.toList());

Terminal Operations

(Trigger execution)

Reduction & Matching

count()
min(Comparator<? super T>), max(...)
anyMatch(Predicate<? super T>), allMatch(...), noneMatch(...)
findFirst(), findAny()
reduce(...)

Collection & Aggregation

collect(Collector<? super T, A, R>)
collect(Supplier<R>, BiConsumer<R,? super T>, BiConsumer<R,R>)

Traversal & Side-effects

forEach(Consumer<? super T>), forEachOrdered(...)

Array & Iteration

toArray(), toArray(IntFunction<A[]>)
iterator(), spliterator()

// Sum of even numbers 1–10
int sumEven = IntStream.rangeClosed(1, 10)
                       .filter(n -> n % 2 == 0)
                       .sum();

// Word count from a file
List<String> top3Words = Files.lines(Path.of("data.txt"))
    .flatMap(line -> Stream.of(line.split("\\W+")))
    .map(String::toLowerCase)
    .filter(w -> !w.isBlank())
    .collect(Collectors.groupingBy(Function.identity(),
                                   Collectors.counting()))
    .entrySet().stream()
    .sorted(Map.Entry.<String,Long>comparingByValue(Comparator.reverseOrder()))
    .limit(3)
    .map(Map.Entry::getKey)
    .toList();

Primitive Streams

Avoid boxing overhead with IntStream, LongStream, DoubleStream. They also offer:

sum(), average(), summaryStatistics(), min(), max()
Converters: boxed() → Stream<Integer>

IntSummaryStatistics stats =
  IntStream.of(4, 7, 1, 9, 3).summaryStatistics();
System.out.println("Avg: " + stats.getAverage());

Advanced Collectors

groupingBy

Map<Dept, List<Employee>> byDept =
  employees.stream()
           .collect(Collectors.groupingBy(Employee::getDept));

partitioningBy

Map<Boolean,List<User>> adults =
  users.stream()
       .collect(Collectors.partitioningBy(u -> u.getAge() >= 18));

toMap with merge

Map<String,Long> wordCounts =
  words.stream()
       .collect(Collectors.toMap(Function.identity(),
                                 w -> 1L,
                                 Long::sum));

collectingAndThen for post-processing

List<User> sorted = users.stream()
  .collect(Collectors.collectingAndThen(
    Collectors.toList(),
    list -> { list.sort(...); return list; }
  ));

Parallel Streams

long count = bigList.parallelStream()
                    .filter(x -> x > threshold)
                    .count();

Uses Fork/Join & CPU cores.
Great for large, stateless, CPU-bound tasks.
Avoid side-effects, shared mutable state, or many stateful ops (sorted, distinct) in parallel.

Lazy Evaluation & Short-Circuiting

Lazy: pipelines are built, but no work until terminal.
Short-circuit: anyMatch, findFirst, limit can stop early.

boolean hasJava = words.stream()
                       .map(String::toUpperCase)
                       .anyMatch(w -> w.startsWith("J"));

When to Use Streams

Data-transformation pipelines: filtering, mapping, grouping.
Declarative style: specify what, not how.
Easy parallelism: parallelStream().
One-off bulk ops: no need for index-based loops or manual exit.

Why Use Streams

Conciseness: less boilerplate.
Readability: pipelines resemble SQL.
Lazy evaluation: minimal work.
Composability: chain reusable ops.
Built-in parallel support.

When Not to Use Streams

Tiny datasets: pipeline overhead > benefit.
Performance-critical loops needing fine control.
Stateful or side-effecting logic inside the stream.
Random/indexed access or early break/continue.

Advantages of Streams

Immutable-friendly & thread-safe when pure.
Rich library of transformations and reductions.
Uniform API for object and primitive streams.
Clear separation: business logic vs. iteration mechanics.

Complete API Reference

Static Builders

Stream<T>      builder(), empty(), of(T…),
               ofNullable(T), iterate(...), generate(...), concat(a,b)

Intermediate (lazy)

map, filter, flatMap, peek,
distinct, sorted, limit, skip,
takeWhile, dropWhile,
parallel, sequential, unordered

Terminal

count, min, max,
anyMatch, allMatch, noneMatch,
findFirst, findAny,
reduce(...),
collect(...),
forEach, forEachOrdered,
toArray, iterator, spliterator

Primitive (`IntStream`, `LongStream`, `DoubleStream`)

range, rangeClosed, of, generate, iterate,
filter, map, sum, average, min, max,
summaryStatistics, boxed()

Conclusion

Java’s Stream API empowers you to write declarative, concise, and potentially parallel data-processing code. By mastering its pipeline architecture, operations, and best practices, you can transform messy loops into elegant workflows—boosting readability, maintainability, and performance when used judiciously.

Search This Blog

Rebel Dragon Coder

Stream API

What Is the Stream API?

Stream Pipeline Architecture

Creating Streams

Intermediate Operations

Stateless

Stateful

Parallel & Ordering Controls

Terminal Operations

Reduction & Matching

Collection & Aggregation

Traversal & Side-effects

Array & Iteration

Primitive Streams

Advanced Collectors

Parallel Streams

Lazy Evaluation & Short-Circuiting

When to Use Streams

Why Use Streams

When Not to Use Streams

Advantages of Streams

Complete API Reference

Static Builders

Intermediate (lazy)

Terminal

Primitive (`IntStream`, `LongStream`, `DoubleStream`)

Conclusion

Comments

Post a Comment

Popular Posts

Road Map for Docker Learning

Stream API

What Is the Stream API?

Stream Pipeline Architecture

Creating Streams

Intermediate Operations

Stateless

Stateful

Parallel & Ordering Controls

Terminal Operations

Reduction & Matching

Collection & Aggregation

Traversal & Side-effects

Array & Iteration

Primitive Streams

Advanced Collectors

Parallel Streams

Lazy Evaluation & Short-Circuiting

When to Use Streams

Why Use Streams

When Not to Use Streams

Advantages of Streams

Complete API Reference

Static Builders

Intermediate (lazy)

Terminal

Primitive (IntStream, LongStream, DoubleStream)

Conclusion

Comments

Post a Comment

Popular Posts

Road Map for Docker Learning

Primitive (`IntStream`, `LongStream`, `DoubleStream`)