The latest Virtual JUG session was all about performance, specifically, how to write Java code to leverage low-level dynamics making your code run faster. It was presented by the notorious Java performance expert and Java consultant John Davies.
In his own words, John is an entrepreneur, father, CTO/co-founder of C24, a fast data company that helps enterprises rapidly adopt messaging standards and optimize in-memory computing solutions, a photographer, über-geek, traveller and a frequent conference speaker. His “old-school” software engineering background often helps to resolve the most complicated performance issues and he did amazing job sharing this knowledge with us.
In this post I’ll try to share the highlights of the session and describe what I learned listening to John present. Here’s the recording of the session if you want to watch it in full.
Getting C/C++ performance in Java
John started with a friendly reminder that the little silicon friends we work with don’t see and work with data in the same way us humans do. Computers love their binary representations for everything. Indeed, when we see, a decimal
100, it’s also representable in hex as
0x64 or in binary as
Data sits in chunks of N bytes. Typically an int is 4 bytes, a long is 8 bytes and so forth. This is the elementary school of Java and it shouldn’t be surprising to most developers out there, right? But the implications of that are far-reaching and are sometimes surprising. John did a brilliant job at explaining how we can exploit the fact that computers do things in binary and with this knowledge, boost the performance of certain operations.
To do so, we naturally will have to manipulate the bits in binary representations of data. There are a few bitwise operations is Java, which are:
&– binary and
|– binary or
~– binary not
>>>- binary shift for signed / unsigned data
How many bytes does it take?
Now, the binary data format is really efficient. To calculate the number of bits it’ll take to store some data, you’ll need to take a binary logarithm of the number of possible values. For instance, if we want to store a particular date from a century, we need to work out that there can be 365 * 100 = 36500 different values, and we’ll need log(36500, 2) = 15.15… 16 bits will be enough to represent the exact day in a century.
It’s the same story with
Strings and any other data. For instance, to store the string
“abc” you’ll need 3 bytes (or 4 if you need to terminate the string with some special value, like
In Java a representation of
“abc” takes 48 bytes -- that’s fourty-eight bytes!. That’s quite an overhead if we intend to store and operate on a lot of data. It’s the same with the
Objects, etc. Java
Date takes 48 bytes, but for actually storing a year, a month and a day we just need around 8.
This is important because by filling the memory with all this wrapping data takes additional computational time. It also takes time to manipulate the data, since we need to move all these bytes around. You get the idea, don’t you? Freeing memory also takes additional time, though it’s automatic and most Java developers pretend they have nothing to do with the garbage collection. The impact is there and is inevitable, unless we change the pattern of how we program in Java.
In the session John showed an example of real data and a reasonable Java object that encapsulates that data. He compared the size of the memory chunks required to store raw bytes of a sample CSV file versus how much it takes in the Java object representation. You may be laughing, but sometimes Java’s representation is far hungrier than XML, memory wise.
Let’s stop for a second and just let this sink in. Consider you have millions of objects that look fairly reasonable, just like John’s example. It so happens, that sometimes, Java struggles with these scenarios as it hasn’t been designed to excel for these use cases. If you crunch millions and billions of records, you have to consider what happens to your application memory, to achieve the optimal performance. Two main performance related factors include: stressing the garbage collector and clogging the network with the information, particularly when you run in a distributed system.
Java is quite verbose at the language syntax level and it’s quite unlikely to benefit from the full capacity of your machine. If the internal representation of a piece of data is 4 times larger than the actual data itself, you’ll need 4 times more RAM, network capacity and hard drive space. That’s 4 times the ouch!
All of this wrapping exists at the JVM level (object overhead and JDK classes themselves), so all JVM languages will suffer from this same problem.
Don’t get me wrong, this isn’t a Java bashing session. John stated many times that it’s unlikely an average developer will notice the impact of these issues, but the companies John has worked with as a consultant: large financial institutions for example will certainly notice them. Let’s now see what solutions are available to us.
Solving the memory overhead problem
Naturally, the solution to the out-of-the-box object layout is not to compress the data. Compression is fine for making data smaller in size, but it inevitably adds computing overhead to every operation. We could store the data more efficiently, but querying then becomes the problem.
Compaction solves the issue much more efficiently, according to John. Since we know how we can store the data to minimize the number of bits we use when passing data around, we can implement a storage mechanism ourselves. The best part is, we don’t even need to break the interfaces and interoperability with other libraries and frameworks.
So the general approach would be to create tabular data, say an array of longs or a bitset, save the offsets to the parts of data written in the bitset and read or write the bytes directly into the right place using the binary operations mentioned previously.
Imagine we have a component that works on dates. Compare these two approaches in the image below:
On the right we see that the same date is represented by the binary format. This is an implementation detail and it’s efficiently hidden by the method signatures. You can freely try to implement a more efficient mechanism in any “hot” part of your system without a major refactoring.
At this point, John went into a live demo: he showed the sample code for both the object version of the data and the binary version. And then came the most exciting part of any performance evaluation session: benchmarking!
For the sake of the session and brevity, John timed a couple thousands of the application runs and showed the time differences. You should watch the session in it entirety to learn how big an impact binary encoding can have.
A word of caution: if you want to run any sorts of benchmarks yourself, use the proper tools. The Java Microbenchmark Harness (JMH) is the defacto benchmarking library that makes writing robust and meaningful benchmarks easier. That’s what Oracle performance engineers use when they benchmark parts of the JDK itself. JMH run the code in a very clever way avoiding common pitfalls a typical hand-written benchmark will fall into.
John also discussed Hadoop performance and compared it to Spark engine for data processing and showed the performance results they’ve seen during their consultancy jobs. We had a great deal of really interesting questions this time too. Don’t miss it!
John made the slides are available even before the session itself, so if you want to have them at hand while watching the session, here you go: http://hubs.ly/H01rXwp0.
After the session, I had a chance to sit with John and chat with him on the @ZeroTurnaround interview.
We discussed Java performance, where and how to learn about it, whether performance tuning a single-threaded program is that different from scaling a multi-threaded one. Also, what tools has a developer should master to call themselves a performance engineer without lying too much. Check it out!