Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.

Java Profiling from the Ground Up by Nitsan Wakart

We continue our series of Virtual JUG session recaps with “Java Profiling from the Ground Up” by Nitsan Wakart. Nitsan is the lead performance engineer at Azul Systems, working on the Zing Java VIrtual Machine. So if you have any questions about C4 pauseless garbage collection or any JVM internals, Nitsan is a great person to chat to. Additionally, he’s an avid open source fan, being the main contributor to the JCTools project, the implementations of concurrent and lockless collections, queues and so on. Be sure to check out Nitsan’s blog and follow him on Twitter.

Without further ado, let’s dig into what we gleaned from this session about Java profiling. As always you can find the full recording of the session below and watch it in your own time. In addition to that, I’ll try to explain what I learnt from the session in this post, emphasizing my session highlights.

Java Profiling from the Ground Up

Performance is less about squeezing the last bits of the speed out of your hardware, but more about understanding how your code works to be able to make it run faster. It is about understanding what is happening in your system so you can meaningfully impact its behavior and speed. You can make it faster by coincidence, but often that doesn’t count. You’re nowhere near understanding what actually caused the boost, and will inevitably lose the gains next time you make changes.

So avoid the embarrassment, measure before you make any claims which you do not fully understand. Nitsan referred to a tweet in his session that said it very well:

Try to understand everything, from the hardware to the operating system. Understand all the ins and outs of the JVM you’re running and the performance will be easy. But, and there’s always a but, it’s really hard to know everything. Believe me, I only know 99.5% of everything. So while it’s not possible to know everything in one go, Nitsan explained where exactly profilers fit into the performance pipeline.

Profiling shows you inefficient code, which based is believed to haunt at least half of the Java projects, according to the our recent report: Java Performance Survey results. To profile the code you need, a profiler, duh! There are two main approaches to profiling: tracing and sampling. Tracing instruments code to time how long the machine spends executing each method. Sampling takes a different, but less straightforward approach, so here’s a short introduction into how many sampling profilers work.

Oh by the way, if you haven’t already, check out XRebel. It’s a lightweight Java profiler that informs you at development time when you have too many SQL queries, slow sql queries, inefficient code and much more. You can get a free trial and a free t-shirt just by giving it a go. Click below to being your journey of excellence!


Sampling profilers

A sample profiler collects a snapshot of the running at a random moment of time (kind of). Essentially, a single snapshot is worthless, but when sampling profilers collect data regularly at an interval the profiler can assume where the code spends it’s time. So if we randomly query the system and the results show that half of the time we see a thread executing in the same method, we’ll assume that the method is executing half of the time for the duration of the process.

The assumption that the sampling is random is not true.  Also, we cannot extrapolate from the samples to get realistic data about where the time is spent, to some extent. Why not? Because of safepoints.


A safepoint is a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM. If you’re wondering how often a safepoint occurs, here is a nice summary.

  • Between every 2 bytecodes (interpreter mode)
  • Backedge of non-’counted’ loops
  • Method exit
  • JNI call exit

So in a nutshell, the sampling approach works in the following manner: stop all the threads (at their safepoints), collect data, resume threads. Since threads can be stopped only at safepoints, the profiler doesn’t get a fair uniform distribution of samples. So the data that is extrapolated from these results may be skewed. Depending on how small the scale you’re working to is, say you’re profiling some operation that takes nanoseconds, this bias is worth noting.

Honest profiler

Honest profiler is a sampling CPU profiler that tries to mitigate the bias from the typical sampling approach we talked about previously. We’ve mentioned the Honest profiler briefly in the The Developers Guide to Understanding Performance Problems report, where we looked at various profiler and APM solutions for Java.

The main difference between the honest profiler and the common sampling approaches is the AsyncGetCallTrace invocation. The honest profiler invokes AsyncGetCallTrace to get the information about the thread state, which doesn’t require a safepoint. In a nutshell, here’s the AsyncGetCallTrace approach: register a signal handler for a signal, say SIGPROF; send this signal to the JVM. This will interrupt the running threads and execute your handler code. The handler calls AsyncGetCallTrace and voila, you get the data about the thread state.

The problem with this approach is that you get slightly different data than a typical Java stack trace. The profiler has to process it and map it to the lines in the source code to be useful.

And of course if you’re not careful during the processing, you can easily add bias to the results, which is what you’re trying to avoid! You should not block inside the handler code, and you should not allocate memory, or even use ThreadLocals. Because that might break the fragile equilibrium of the JVM state and bias your profiling results.

The trade-off for getting more precise information while profiling without the safepoint bias is that the profilers have to map time and time again. This includes translating the program counter information into the bytecode location and then into the line of code. As developers we want to know which lines of code are to blame for the poor performance, rather than the CPU instructions. Also, only the threads which are running are profiled, so waiting threads won’t appear on the map, and you’ll only get the Java stack, rather than the native invocations.

Is this good enough? Well to answer this question Nitsan showed a great demo that explores the peculiarities of sampling profilers and shows how the Honest profiler performs under the same conditions. To get more insight rather than the default answer of “it depends”, watch the video of the session and if you have any questions ask them in the comments below or directly to Nitsan.


After the session, I had a chance to ask Nitsan a couple of questions in our regular RebelLabs interview with Virtual JUG speakers. We talked about the performance tools does he uses, the components of performance work people often neglect and where you have to look for the best resources about software performance:

If you have any more questions that you would like to ask Nitsan, ping him on Twitter: @nitsanw and ask him directly! Feel free to ask us what you think too: @ZeroTurnaround.

If you liked this blogpost, I encourage you to use the form below to send us your email address and we’ll surprise you with an occasional email about the best posts that we have published and notify you when we release the next outstanding report, like these ones.


Read next: