Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

Testing the performance of 4 Java runtime code generators: cglib, javassist, JDK proxy & Byte Buddy


In the first part of this series, we had a look at Java’s strong and static type system. We concluded that this type system allows for writing expressive and robust applications but limits the latitude of framework APIs to incorporate user types. We also saw how Java’s reflection API is not always the best choice for interacting with user types. To see this more clearly, we reasoned about the implementation of a simple security library where using the reflection API would break type safety while we could use code generation in order to retain user types.

In the second part of this article, we then learned about differently libraries for code generation and we had a closer look at Byte Buddy, a library of my own efforts. We then used this library to provide a simple implementation of the prolonged security framework.

In this last part, we want to compare the different libraries in a performance benchmark. If you have not yet read the previous parts, make sure to check them out before reading on. I promise, we won’t go further until you are back.

Code generators speed dating

In the end, a shiny API will not be the only criteria for choosing the best code generation library. A library’s runtime performance might be an even more important factor, especially if the generated code takes up a crucial position within a running application. There exist numerous urban legends on the performance of different code generation libraries, but I’ve never found a proper benchmark to prove a claim in favor of any one specific technology.

Doing a micro-benchmark in Java is not an easy task. If you measure the execution time of a given code block, you do not normally know what it is that you are actually clocking. When Java code is executed, the just-in-time (JIT) compiler always kicks and in the most extreme case plainly erase the measured code.

However, over the last years, several smart people came up with ways to trick the JIT compiler and implemented micro-benchmarking libraries based on these ideas. My personal favorite benchmarking library is the Java Microbenchmarking Harness which is a tool that comes along with the Open JDK.

Before diving into measuring the numbers themselves, it’s essential to answer one question: what is the purpose and the focus of the benchmark? Obviously, some task might be handled more efficiently by one library, while a different task would take more time.

Beyond that, a code generation library can always trade time for creating a runtime class against the time that it takes for invoking its methods once it’s created. Keep all of this in mind while we discuss the subsequent numbers.

Just show me the numbers

So, while keeping in mind all said above, let us first look at the raw numbers of a JMH benchmark that directly compares the runtime for different tasks. All numbers in the following table are listed in nanoseconds per operation with a sample’s standard deviation attached in braces.

Byte Buddy cglib javassist JDK proxy
implement interface with stub methods 153.800 (0.394) 804.000 (1.899) 706.878 (4.929) 973.650 (1.624)
invoke a sub method 0.001 (0.000) 0.002 (0.000) 0.009 (0.000) 0.005 (0.000)
extend class with super method invocation 172.126 (0.533) 1480.525 (2.911) 625.778 (1.954)
2290.246 (7.034)
invoke a super method 0.002 (0.000) 0.019 (0.000) 0.027 (0.000)
0.003 (0.000)

The first line displays the time a library requires for implementing an interface with 18 different methods as no-operation stubs. Based on these runtime classes, the second line displays the time it takes to invoke the stub on an instance of the generated class.

In this measurement, Byte Buddy and cglib perform best because both libraries allow you to hardcode a fixed return value into the generated class while javassist and the JDK proxies only permit the registration of a suitable callback.

This allows us to draw the first vague conclusion that a more specialized implementation of a runtime class’s methods results in better runtime. This might sound more obvious than it actually is, since we could have hoped that the JIT compiler had adapted the performance of both approaches for us.

What about class inheritance?

The third line of the above table displays required times for extending a class with 18 methods. This time, instead of creating method stubs, any overridden method should instead call its super implementation.

You might have noticed that Byte Buddy is listed with two measurements where the second italic numbers is significantly larger. Both numbers represent different approaches of implementing a super method invocation.

As mentioned last week, the JVM only permits a super method invocation from within the very same instance. Thus, the easiest way of invoking a super method is to simply hardcode the invocation into the intercepted method which is done for the first measurement.

This approach is however not too flexible, as it does not, for example, permit a conditional invocation. To overcome this limitation, Byte Buddy allows the creation of an inner class-alike. We saw this approach in action in the previous part of this article where we generated a proxy class that implemented the Callable interface.

For any invocation, an instance of this inner class is the injected into an interceptor method by using a corresponding annotation on one of its parameters. As you can see, the creation of such additional classes minimizes the runtime of invoking an exposed super method compared to other libraries which all follow a similar strategy.

At the same time, the creation of a dedicated class per method introduces an overhead for the creation of the actual subclass. Both cglib and javassist choose a middle ground for addressing this issue what cuts the load of their class creation by the cost of an additional runtime overhead for each execution of a super method.

Final words: it’s all about increasing performance

There is much more we could discuss from here but at the same time, this is a great time to complete this introductory digest on code generation. I hope that this overview helped you appreciate that code generation is nothing elitist; it’s not just reserved for the big frameworks. With a helping library, code generation is a handy tool for implementing cross-cutting concerns along with beautiful APIs and without requiring explicit dependencies, even for small code bases.

Now that Java 8 is getting off the starting blocks, Java’s new meta space does no longer impose the same tight boundaries on the amount of classes a Java application can load in its default configuration. With all this, there is nothing holding you back, so fire ahead. If you have any more questions, just drop a comment or hit me up on Twitter where I reside at @rafaelcodes.

And make sure to check out my after-hours love child Byte Buddy which you can find on and on GitHub!

If you want to know more about Java Bytecode, Mastering Java Bytecode at the Core of the JVM report by JRebel product lead Anton Arhipov is a great resource to get started!


  • Wojciech Kudla

    Rafael, thanks for the numbers but you should also share the benchmark code for all compared libraries. Testing methodology is at least as important as the results.

  • Aleksey Shipilёv

    Rafael, 0.001..0.027 ns/op for calling methods sounds like bullshit to my ears — this is a red flag that should encourage you to revisit what is actually being measured. See e.g. how to approach the problem in a less wrong way.

    This is yet another example that raw numbers do not really matter — it is just data, saying nothing. It matters why those numbers are that particular way, i.e. the analysis that processes data into insights. You allure to some insights after presenting the benchmark data, but I would expect readers would appreciate more thorough step-by-step re-tracking of how you have reached that conclusion.