- Creating a Stream
- Stream operations
- Examples of Stream operations
- Specialized Streams
- Parallel Streams
- Other libraries
Even the simplest programs use some kind of collection of elements. They are an essential part in programming. Be it arrays, lists, dictionaries or maps, they’re used to store data so it could be easily accessed. Depending on the use case, different data structures are chosen. Arrays are good for storing a sequence of elements. Key-value data can be stored in dictionaries or maps (some programming languages call them associative arrays). Up until Java 8, processing Collections in Java has been inconvenient.
I must admit that some operations are relatively simple to achieve but if you’re faced with a task to filter and then group data, you’re probably going to write several lines of code with multiple intermediary data containers. It would be better to declaratively define what operations you want to be applied to the collection. Think of SQL. You can select, filter and group data from a table declaratively. Java designers introduced the Streams API which allows to manipulate collections in a declarative and functional manner.
Streams are a new API in Java 8 which allows developers to declaratively manipulate collections of data. It promotes the usage of functional style programming. Together with lambda expressions, Streams make the code more concise and readable. Additionally Streams allow to pipe multiple operations one after another. If you’re familiar with Unix command-line pipes then you might find composing stream operations simple to understand. All in all, it helps the code to be more composable.
I think the best way to demonstrate the conciseness of the Streams API is to show the following comparison. The first snippet uses Java 7 and the next uses Java 8 Streams to accomplish the same task.
Although they produce the same result (they group books by authors), using Streams is easier to understand and required significantly less work from the developer. Don’t worry if you don’t understand the code. I’ll go over the basics in the following paragraphs.
Creating a Stream
To get started with Streams, you need a method to create them. Java 8 provides several approaches and we’ll have a look at them one by one.
Creating a Stream from a collection
Collection is the interface which lists, sets, queues and the like implement. With the introduction of Java 8 and default methods, a method called
stream() was added to the Collection interface. It returns a sequential Stream with the collection as its source.
Creating a Stream from arrays
Arrays don’t have the same convenient methods that Collections have. Therefore to create a stream from an array, you need to use the static method
stream() in the Arrays class. In addition to general Streams, it contains overloaded methods for specialized streams as well. Specialized streams are discussed later in this post.
Creating a Stream from values
Instead of creating an array or a collection and then converting it into a Stream, it is possible to create a Stream directly using the static method
of() in the Stream interface.
Creating a Stream from files
Java 8 allows to create a Stream from a file. The
java.nio.file.Files class contains several static methods which return a Stream of the file contents. The following example creates a Stream of Strings which represent the lines of the file.
Creating a stream from functions
Two static methods in the Stream interface allow you to create infinite Streams. Yes, that’s right. Unlike a Collection or an array, a Stream can have no bounds. These methods are
Iterate accepts an initial element and a function which is applied to the previous element to produce the next element. The following example produces an infinite Stream of even numbers.
Generate accepts a Supplier to generate an infinite Stream of unsorted elements. This method is good for generating a Stream of random elements as can be seen in the following example.
The Stream interface defines operations that can be classified into two broad categories - intermediate and terminal operations. Intermediate operations return another stream. This makes it possible to chain multiple operations together to form a query. Terminal operations consume the stream and process the intermediate operations. This means that Streams are lazy. Intermediate operations are processed only when a terminal operation is invoked. Depending on the terminal operation used, the return value can be void or a non-stream value (Integer, List, Map etc.).
Examples of Stream operations
Let’s have a look at some of the most common stream operations and how to use them. Streams introduce many operations that are widely used in functional programming languages such as map, reduce and filter.
In many programming languages (especially functional programming languages) map is a higher-order function that applies a given function to each element of a list. In the context of Java 8 Streams, map accepts a Function and returns a Stream where the given function has been applied to each element of the Stream.
Although we have not yet looked at terminal operations in more detail, I’m using the
forEach() terminal operation in the following example to illustrate what the Stream elements look like. The
library variable is the same list that was created in the first code example of this post.
A method reference to
getName() on the Book class is passed to map. The map operation returns a new Stream where Book objects have been replaced with Strings containing the book name.
Reduce is another higher-order function which is common in functional programming languages. Although the Java designers chose to use the name reduce, it is also widely known as fold or accumulate. Unlike map which returns a Stream, reduce returns a single value by applying an accumulation function to the elements of the Stream. Because a non-stream value is returned, reduce is classified as a terminal operation.
Reduce can be used to concatenate strings. Although there is a better method to join a stream of strings, it perfectly illustrates how reduce works. It accepts an initial value which in this case is an empty string. If the stream contains no elements, then an empty string is returned. The second argument is a BinaryOperator which takes the next element in the stream and concatenates it to the accumulated value. At first the accumulated value is an empty string which was passed as the first argument.
//create a diagram
The Java API specifies two more overloaded reduce methods. It is possible to omit the initial value and pass only a BinaryOperator to the reduce method. But in that case the return value is going to be an Optional since there is no way of knowing what to return if there are no elements in the stream.
The third and final overloaded reduce method is probably the most difficult to understand. It accepts an initial value called identity, a BiFunction and a BinaryOperator.
The accumulator function is used to transform the next element in the stream of type T into type U which is then accumulated with the result of previous accumulations. In essence, it is a combined map and reduce operation. Most of the times it can be represented more simply by explicit calling map and reduce.
Although technically not needed in sequential streams, the combiner function combines two elements of type U into a single one. It is necessary to combine the intermediary results of a parallel stream (parallel streams are covered later in this post). The Streams API does not differ between parallel and sequential streams. Therefore in order to make reduce run correctly when executed parallely, it needs to know how to combine intermediate results.
//explain more with an example, SO link
As stated previously, to improve the readability of this reduce operation, we could explicitly call map and then reduce.
But why does the Streams API provide a method that could be represented better by pipelining multiple stream operations together? The answer is efficiency. From Oracle docs you can find the following:
The accumulator function acts as a fused mapper and accumulator, which can sometimes be more efficient than separate mapping and reduction, such as when knowing the previously reduced value allows you to avoid some computation.
As the name implies, filter is used to filtrate a Stream. It accepts a predicate and returns a new Stream consisting of elements that match the given predicate.
Using the list of books that was created in the beginning of this post, I can find all book objects where the page count is greater than 300.
In a situation where you have duplicate elements in a Stream, you can use distinct to return a new Stream with only unique elements. The uniqueness is determined using the hashCode and equals methods of the objects in the Stream.
Although the stream contains multiple ones and twos, only unique integers are printed out.
A stream can be truncated with limit. This is handy when you don’t want to use the whole stream but only a section of it. Limit accepts an integer specifying the maximum size of the returned stream. If the stream is created from an ordered collection (e.g. List) then the returned stream contains the first n elements of the initial stream. Limit also works if the stream is created from a collection where the order is not defined (e.g. Set). In that case don’t assume any order in the stream returned by limit.
truncating, limit() skipping, skip() finding max min
simple diagram of stream pipelining
numeric streams, specialized streams (intStream)
an associative, non-interfering, stateless function ordering, encounter order limit is expensive
Guava, apache, lambdaj
traversable only once, example lazy
Java 8 Streams: an introduction the get you up and running was originally published by Indrek Ots at That which inspires awe on August 25, 2016.