Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

Three JPA 2.1 features that will boost your application’s performance

JPA Performance

Developers often complain about the subpar performance of JPA. However if you take a closer look at the performance issues, quite often you will find similar root causes. These can include:

  • using too many SQL queries to fetch the required entities from the database, aka the so called n+1 query problem
  • updating entities one by one instead of doing it in using a single statement
  • doing data heavy processing on the Java side, rather than the database side

Luckily, there is no need to suffer from these inefficiencies, if you know what you’re doing. JPA always offered a way to handle these kinds of issues and introduced some additional features in JPA 2.1 that can be used to gain significant performance improvements.

In this blogpost I’m going to explain how to use JPA 2.1 features to avoid the problems listed above.

By the way, if you want to learn more about the typical performance issues in Java projects, we have recently published an insightful report based on our performance survey findings.  Or if you’re looking for a JPA resource, here’s a great cheatsheet of JPA 2.1 features.

Anyway, let’s get straight to fixing the performance issues found when using JPA.

Fixing the “too many SQL queries” problem

Performing too many SQL queries to fetch all required entities is, in my experience, the most common reason for performance issues.

Even the most innocent looking query, if implemented incorrectly can trigger dozens or hundreds of SQL queries to the database. And it doesn’t even have to be in the explicit query form as you will see in this section, rather just a couple of annotations configured incorrectly. So if you think this problem wont affect you, think again.

Imagine the following piece of code in your project. What are your thoughts?

List<Author> authors = this.em.createQuery("SELECT a FROM Author a",
		Author.class).getResultList();

for (Author a : authors) {
	System.out.println("Author "
			+ a.getFirstName()
			+ " "
			+ a.getLastName()
			+ " wrote "
			+ a.getBooks()
					.stream()
					.map(b -> b.getTitle() + "("
							+ b.getReviews().size() + " reviews)")
					.collect(Collectors.joining(", ")));
}

The code snippet above prints out the names of all authors and the titles of their books. That snippet looks really simple. What do you think, how many queries are sent to the database? One? Maybe two (one for each type of entity)?


Oh by the way, if you haven’t already, check out XRebel! It’s a lightweight Java profiler that informs you at development time when you have too many SQL queries as a result of this kind of code. It also does a bunch more including application profiling. You can get a free trial and a free t-shirt just by below and beginning your journey of excellence!

XRebel-shirt-blog-banner-640x100-v1


The rights answer is, it depends on the number of authors in the database. If I use my small example database with only 11 authors and 6 books in it, this code triggers 12 queries. One to get all authors and 11 to get the books for each of the 11 authors. This issue is known as the n+1 query problem and it can easily occur with any libraries that you use for database access. The worst thing is that the performance gets even worse with an increasing dataset, so in production the problem is exacerbated.

The good news is, we have multiple options to avoid this scenario by fetching all the required entities with one query. One of the newest and, from my point of view, the best way to solve this problem is to use a @NamedEntityGraph.

An entity graph specifies a graph of entities that shall be fetched from the database in a query independent way. That means, you create a standalone definition of an entity graph and combine it with a query when you need it. The snippet below shows how to define a @NamedEntityGraph which we fetch the books of a given author.

@Entity
@NamedEntityGraph(name = "graph.AuthorBooks", attributeNodes = @NamedAttributeNode("books"))
public class Author implements Serializable {
…
}

You can now provide this graph as a hint to the entity manager and get the authors and all their books in one query. As you have seen in the definition of the graph, I only provided the name of the property that contains the related entities. Therefore I use the @NamedEntityGraph as a loadgraph, so that all the other attributes are fetched with their defined fetch type, as follows:

EntityGraph graph = this.em.getEntityGraph("graph.AuthorBooks");

List<Author> authors = this.em
		.createQuery("SELECT DISTINCT a FROM Author a", Author.class)
		.setHint("javax.persistence.loadgraph", graph).getResultList();

This example shows a very simple entity graph and you will probably be using more complex graphs in a real application. But this is not a problem. You can define more complex ones by defining multiple @NamedAttributeNodes and you can also use the @NamedSubGraph annotation to create a graph with multiple levels. You can find more information about @NamedEntityGraphs in this post explaining how to use Entity Graphs in detail.

For some use cases you might also need a more dynamic way to define the entity graph, e.g. based on some input parameters. In these cases it makes more sense to use a Java API to programmatically define the EntityGraph.

Fixing the “update entities one by one” problem

Updating entities one by one is another common reason for performance issues in JPA. As Java developers we are used to work with objects and to think in an object oriented way. While this is a good way to implement complex logic and applications, it is also a common cause of performance degradation when working with a database.

From an object oriented point of view it is perfectly acceptable to perform update and delete operations on the entities. But this is very inefficient when you have to update a huge set of entities. The persistence provider will create one update statement for each updated entity and send them to the database with the next flush operation.

SQL provides a more efficient way to do this. It allows you to construct an update statement that updates multiple entities at once. And you can do the same with the CriteriaUpdate and CriteriaDelete statements introduced in JPA 2.1.

If you have used criteria queries before, you will feel very familiar with the new CriteriaUpdate and CriteriaDelete statements. The update and delete operations are created in nearly the same way as the criteria queries introduced in JPA 2.0.

As you can see in the following code snippet, you need to get a CriteriaBuilder from the entity manager and use it to create a CriteriaUpdate object. This is done in a similar way to the CriteriaQuery. The main differences are the set methods which are used to define the update operations.

CriteriaBuilder cb = this.em.getCriteriaBuilder();
// create update
CriteriaUpdate<Author> update = cb.createCriteriaUpdate(Author.class);
// set the root class
Root<Author> a = update.from(Author.class);
// set update and where clause
update.set(Author_.firstName, cb.concat(a.get(Author_.firstName), " - updated"));
update.where(cb.greaterThanOrEqualTo(a.get(Author_.id), 3L));

// perform update
Query q = this.em.createQuery(update);
q.executeUpdate();

For CriteriaDelete operations you just need to call the createCriteriaDelete method on the entity manager to get a CriteriaDelete object and use it to define the FROM and WHERE parts of the query similar to the previous example.

Processing data in the database

Another common source of performance problems is that we, as Java developers, tend to implement all the logic of our application in Java. Don’t get me wrong, there are lots of good reasons to do it this way. But there can also be good reason to perform some part of the logic in the database and only send the result to the business tier.

There are multiple ways to perform logic in the database. You can do a lot of things with plain SQL and if that is not enough, you can still call database specific functions and stored procedures. Here I will have a closer look at stored procedures or to be more precise at the way you can call stored procedures.

There was no real support for it in JPA 2.0. Native queries were the only way you could call a stored procedure. This was changed with the introduction of @NamedStoredProcedureQuery and the more dynamic StoredProcedureQuery in JPA 2.1. In this post, I will focus on the annotation based definition of stored procedure calls with @NamedStoredProcedureQuery. I wrote more about the dynamic StoredProcedureQuery on my blog.

As you can see in the following code snippet, the definition of a @NamedStoredProcedureQuery is pretty straight forward. You need to define the name of the named query, the name of the stored procedure in the database as well as the input and output parameters. In this example, I’m calling the stored procedure calculate with the input parameters x and y. I expect the output parameter sum. Other supported parameter types are INPUT for parameters which are used for input and output and REF_COURSOR to retrieve result sets.

@NamedStoredProcedureQuery(
name = "calculate", 
	procedureName = "calculate", 
	parameters = { 	
@StoredProcedureParameter(mode = ParameterMode.IN, type = Double.class, name = "x"), 
		@StoredProcedureParameter(mode = ParameterMode.IN, type = Double.class, name = "y"), 
		@StoredProcedureParameter(mode = ParameterMode.OUT, type = Double.class, name = "sum") })

The @NamedStoredProcedureQuery is used in a similar way to @NamedQuery. You need to provide the name of the query to the createNamedStoredProcedureQuery method of the entity manager to get a StoredProcedureQuery object for this query. This can then be used to set the input parameters with the setParameter methods and to call the stored procedure with the execute method afterwards.

StoredProcedureQuery query = this.em.createNamedStoredProcedureQuery("calculate");
query.setParameter("x", 1.23d);
query.setParameter("y", 4.56d);
query.execute();
Double sum = (Double) query.getOutputParameterValue("sum");

Conclusion and special bonus

JPA makes it very easy to store and retrieve data from a database. While this is great to get a project started quickly and to solve the vast majority of its requirements, it also makes it easy to implement a very inefficient persistence tier. Some of the most common problems include using too many queries to get the required data, updating entities one by one and implementing all of the logic within Java.

The JPA 2.1 specification introduced several new features to address these inefficiencies, like entity graphs, criteria update and stored procedure queries. My New features in JPA 2.1 cheat sheet describes these and other new features in JPA 2.1 which you should download for free.

Have any comments or better way to make sure JPA performance is great? Share your recipes in the comments below or chat with me on twitter: @thjanssen123!

Check out XRebel and get a free t-shirt!

 


Read next:

  • Roberto Cortez

    Great information! Congratulations on the article :)

  • Thorben Janssen

    Thanks Roberto :)

  • Marcelo Balloni

    What about the second level cache when you use CriteriaUpdate and CriteriaDelete? Haven’t found any information about this issue on the web.

  • Konrad Hauke

    You really do a great job, also on your blog.
    Is your test code available as a project as on GitHub?

  • Jarrad Waterloo

    Sorry EntityGraph whether named or not doesn’t fix the N+1 problem because it only limit columns not rows. As far as rows, it is still doing a fetch join which mean catesion products are going on. If you have numerous relationships hundreds of rows from separate queries per relationship could explode to hundreds of thousands of rows. EntityGraph was the solution that fell short. Joins should be done for 1 to 1 relationships, separate queries for 1 to N and M to N relationships. These queries could be run in parallel, in separate threads of via multiple active result sets if the driver supports it. Unfortunately, the developers can now tell the end result we want but what uses/runs EntityGraph is not using the information properly. Ebean has been doing this right for years. JPA maintainers please take that final step to cross the racing line and fix the N+1 problem before giving something that doesn’t and saying you did. Fetch JOIN which you already have is the problem. That and lazy loaded are both opposite ends of the spectrum and both bad for performance. Strike the balance between the two as described. Kill Ebean by making its functionality standard.