Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

Java 8 Explained: Applying Lambdas to Java Collections

Lambdas are the main theme of Java 8 and this is a very cool, and long-awaited, addition to Java platform. However, lambdas alone would have been worthless if we didn’t have any means for applying lambdas to collections. The problem of migrating the interfaces to be able to use lambdas in collections is solved with default methods which are also referred as defender methods. In this blog post we will take a dive into bulk data operations for Java collections.

Bulk operations – what’s in it?

As the original change spec says, the purpose of bulk operations is to “add functionality to the Java Collections Framework for bulk operations upon data. […] Operations upon data are generally expressed as lambda functions”. This quote reveals the actual goal of lambdas and shifts the focus of Java 8 release for me: the most important feature is actually the new way of using collections – expressing operations that could be executed concurrently and lambdas are just the better tool for expressing the operations.

Internal vs External Interation

Historically, Java collections were not capable to express internal iteration as the only way to describe iteration flow was the for (or while) loop. For describing the internal iteration we would use libraries, such as LambdaJ:

List<Person> persons = asList(new Person("Joe"), new Person("Jim"), new Person("John"));
forEach(persons).setLastName("Doe");

In the example above, we do not actually say how the last name should be set to each individual person – maybe the operation could be performed concurrently. Now we could could express the operations in a similar way with Java 8:

persons.forEach(p -> p.setLastName("Doe"))

The internal iteration isn’t that much related to bulk operations over collections. This is rather a minor feature that gives us an idea of what will be the effect of syntax changes. What is actually interesting in regards to bulk data operations is the new Stream API.

Stream API

The new java.util.stream package has been added to JDK which allows us to perform filter/map/reduce-like operations with the collections in Java 8.

The Stream API would allow us to declare either sequential or parallel operations over the stream of data:

List<Person> persons = … 
 
// sequential version
Stream<Person> stream = persons.stream();
 
//parallel version 
Stream<Person> parallelStream = persons.parallelStream();

The java.util.stream.Stream interface serves as a gateway to the bulk data operations. After the reference to a stream instance is acquired, we can perform the interesting tasks with the collections:

Filter

Filtering a stream of data is the first natural operation that we would need. Stream interface exposes a filter method that takes in a Predicate (http://javadocs.techempower.com/jdk18/api/java/util/function/Predicate.html ) SAM that allows us to use lambda expression to define the filtering criteria:

List<Person> persons = …
Stream<Person> personsOver18 = persons.stream().filter(p -> p.getAge() > 18);

Map

Assume we now have a filtered data that we can use for the real operations, say transforming the objects. The map operations allows us to apply a function (http://javadocs.techempower.com/jdk18/api/java/util/function/Function.html ), that takes in a parameter of one type, and returns something else. First, let’s see how it would have been described in the good ‘ol way, using an anonymous inner class:

Stream<Student> students = persons.stream()
      .filter(p -> p.getAge() > 18)
      .map(new Function<Person, Student>() {
                  @Override
                  public Student apply(Person person) {
                     return new Student(person);
                  }
              });

Now, converting this example into a lambda syntax we get the following:

Stream<Student> map = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(person -> new Student(person));

And since the lambda that is passed to the map method just consumes the parameter without doing anything else with it, then we can transform it further to a method reference:

Stream<Student> map = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new);

Collect

While stream abstraction is continuous by its nature, we can describe the operations on streams but to acquire the final results we have to collect the data somehow. The Stream API provides a number of “terminal” operations. The collect() method is one of those terminals that allows us to collect the results of the operations:

List<Student> students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(new Collector<Student, List<Student>>() {});

Fortunately, in most cases you wouldn’t need to implement the Collector interfaces yourself. Instead, there’s a Collectors utility class for convenience:

List<Student> students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(Collectors.toList());

Or in case if we would like to use a specific collection implementation for collecting the results:

List<Student> students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(Collectors.toCollection(ArrayList::new));

Parallel and Sequential

One interesting feature of the new Stream API is that it doesn’t require to operations to be either parallel or sequential from beginning till the end. It is possible to start consuming the data concurrently, then switch to sequential processing and back at any point in the flow:

List<Student> students = persons.stream()
        .parallel()
        .filter(p -> p.getAge() > 18)  // filtering will be performed concurrently
        .sequential()
        .map(Student::new)
        .collect(Collectors.toCollection(ArrayList::new));

The hidden agenda here is that the concurrent part of data processing flow will manage itself automatically, (hopefully) without requiring us to deal with the concurrency issues.

Summary

Exiting times ahead! There’s a lot of new things to learn with the new Stream API and lambdas coming in Java 8. There’s of course a lot more to cover than we did in the current blog post, but hopefully we will bring more awesome stuff to you soon.

  • Jakub Milkiewicz

    Hi Anton

    Shouldn’t you have map(Student::new) instead of map(Adult::new);

  • arhan

    indeed! thanks for noticing and reading :)

  • Pankaj Kumar

    These are some good features, I guess I need to look after these and update my java collections tutorial.

  • Jen

    > The map operations allows us to apple a function

    OK, so you’re one of those Apple fanbois. What in the hell does that have to do with Java? We’re professionals here, not a bunch of screaming preteen girls with our iPhones. How about you stop trying to ruin Java with your crap.

  • arhan

    troll detected

  • Majkol

    Do you have any idea why changing parallel to sequential is like 3 times slower on 4 core machine?
    public static void main(String[] args) {
    List listToTest = new ArrayList();
    for(int i=0; i<1000000; i++){
    listToTest.add(i);
    }

    long start = System.currentTimeMillis();
    List list = listToTest.stream().parallel().filter(p -> p > 1).collect(Collectors.toList());

    long end = System.currentTimeMillis();
    System.out.println(“time ” + (end – start));
    }

  • Majkol

    I meant that paraller is slower than sequential here

  • Didier Mounoud

    it sounds like a simple copy of what we can do with Scala since a long time….

  • Holger

    Nice example but even the creation of the person List can be improved. Instead of asList(new Person(“Joe”), new Person(“Jim”), new Person(“John”)).stream() you can use asList(“Joe”, “Jim”, “John”).stream().map(Person::new) and you have your stream of persons. So putting it all together:

    List students = asList(“Joe”, “Jim”, “John”)
    .stream().map(Person::new).peek(p->p.setLastName(“Doe”))
    .parallel().filter(p -> p.getAge() > 18) // filtering concurrently
    .sequential().map(Student::new)
    .collect(Collectors.toList());

  • arhan

    nice!

  • Holger

    You didn’t get the spirit. Use

    long t0=System.nanoTime();
    int[] a=IntStream.range(0, 1_000_000).filter(p -> p > 1).toArray();
    long t1=System.nanoTime();
    int[] b=IntStream.range(0, 1_000_000).parallel().filter(p -> p > 1).toArray();
    long t2=System.nanoTime();
    System.out.printf(“serial: %.2fs, parallel %.2fs%n”, (t1-t0)*1e-9, (t2-t1)*1e-9);

    It would be even better if you had a reasonable final use for the stream’s items rather than storing them into a Collection or an array. The spirit is to *stream* not to store.

    On my machine it prints “serial: 0,06s, parallel 0,02s” just like I would expect. But benchmarking beta versions is questionable anyway.

  • Holger

    And even asList(“Joe”, “Jim”, “John”).stream() can be improved further to Stream.of(“Joe”, “Jim”, “John”). It seems to take some time to get the spirit entirely.

  • Douglas Fernandes B. Arantes

    ohh, very good.
    Congratulations for the post.
    #Java is back!

  • Gabriel Basilio Brito

    java.util.stream is what i’ve been waiting for.

  • Giulliano

    Not only Scala, but C# , Ruby and Python also have these features, Java only added it now but it’s needed for a long time.

  • A. C. Fenderson

    What I want to know is why streams were added. Why couldn’t collections have had a new method added instead?

    Instead of typing something like

    personsOver18 = persons.filter (p -> p.getAge() > 18);

    I have to type the more bulky

    personsOver18 = persons.stream().filter(p -> p.getAge() > 18).collect (Collectors.toList());

  • Lionel Tesolin

    Because of the backward compatibility. Many existing project extends existing implementation of Collection and should be refactored. The only way was to add an abstract default method stream to allow existing code to continue to work.