This week’s Virtual JUG session, Java Memory Model Pragmatics, was an incredible 2 hour treat by Aleksey Shipilëv, the world famous performance guru and concurrency bug hunter working in the Java performance team at Oracle. Years of experience with low-level details of the JVM and concurrency provide an excellent background to speaking about concurrency, optimisations and memory models.
In this post, I’ll try to recap the most important points that I learnt from the session and will try my best to convince you to take some time out of your busy schedules to watch it. Seriously, this was the most anticipated Virtual JUG session to date. More people RSVP’d than the January session with the father of Java, James Gosling himself.
Here is the full recording of the session, followed by insightful comments and an interview with Aleksey about the performance issues has he seen in the wild, how a programmer should think about performance and why there are so many broken benchmarks in the world.
Even better, after the interview Aleksey took some time to walk us through a series of examples with fascinating concurrent code that prove how important it is to understand concurrency in general and the Java Memory Model.
So, let’s ride quickly through what we learned from this session.
Why do we need a Memory Model
The most complicated thing in every program is to reason about what exactly is happening in the background and how exactly will it work on the hardware. For single-threaded programs this is a lot easier, because our brains can naturally grasp sequences of actions and predict how they will behave. When we leave the realm on these simpler sequences and add concurrency, the challenge gets much harder.
Every memory model tries to formalise and answer a single, very important question:
What values can a given read instruction obtain from the memory?
You’d think it’s simple enough, but given that the underlying platform can crunch and rearrange the original program during compilation, it gets complicated really quickly.
The Java memory model is formalised in detail in the famous chapter 17.4 of the Java Language Specification. You can safely bookmark this link, as you’ll be revisiting and rereading it many times.
During the session Aleksey lead us through a series of requirements that a programmer would expect from a programming language and showed how these can be formalised and explained.
It would be really hard to reason about which values can be seen at a given point of the program execution without atomicity. Atomicity means that a value is either fully written to the memory and can be fully read from there in a single step. Without this, you’d never know if a value that you read has actually been written or if you got unlucky and got a glimpse of an inconsistent state of the variable.
Luckily, in Java we do get the atomicity guarantees with the volatile keyword on any platform, even in a multithreaded environment.
The next problem arises when we try to write and then read memory locations that are close to each other. It’s not a secret that there are many layers of memory involved in any modern program, cache-lines and so forth. Can we be sure that if we concurrently modify sequential array elements, then all writes will be visible and none will overwrite each other?
Aleksey introduced us to the word tearing phenomenon, which can happen when you pack the data together, like using a long variable to hold a collection of 1-bit flags. Since no platform can read 1 bit of data at a time you really have to guard your access to the data with external synchronization.
A good example of when the word tearing can happen is a BitSet, which packs boolean flags into bits of a long value. So here is a quiz question, one of the many that Aleksey had in the session. What can the following program print when it gets the values from the BitSet?
My previous blogpost about JCStress gives the example code that shows this word tearing phenomenon is action.
Also, after the interview with Aleksey, which you’ll find later in this post, we ran through a series of examples with JCStress, the test harness used at OpenJDK to run concurrency testing. Be sure to check it out, potential bugs are often really unexpected and it will teach you a lot about the underlying complexity of the JVM. And the same BitSet example was discussed in detail.
Now we’re getting really deep into the reasoning of concurrent programs. It’d be amazing if we could easily visualise a step-by-step execution of a multithreaded program, wouldn’t it? Sequentially consistent executions of the programs are those which can be explained as if threads were executing one instruction at a time, in an interleaving mode. Imagine reading code and thinking this thread does action A, then another one does B and so forth. This makes following what does your program do quite simple.
However, imposing sequential consistency requirements onto the platform makes it really hard for the compiler to optimise the program, as even the smallest changes to the program, like reordering two read instructions can break sequential consistency. So we need to relax the model and consider just some actions to be important in the program to allow a greater freedom for the compiler to make everything faster.
Happens Before relationship
One relaxed version of sequential consistency is the Happens-Before relationship model. The reason to introduce it into the memory model formalism is to precisely declare which points of the program are important to the concurrent code and when the platform has to synchronise what different threads think of the state of the shared memory.
The Happens-Before relationship allows a developer to understand how the actions of different threads can be ordered in time, including which writes precede certain reads, and consequently, where from the observed values came from.
The volatile keyword or the use of synchronization gives us the guarantees of visibility of written values and declares that the operations on those must be ordered in the execution. Your program will observe the last value written to a field in a given happens-before order and you can keep your sanity without chasing concurrency bugs for days.
One good takeaway from the session is that if you make all your fields volatile and correctly synchronise all accesses to shared data, your program will become sequentially consistent. It does bear some performance overhead, but as smart people put it: first make it work, then make it fast.
The final takeaway from this session that I want to turn your attention to is the safe publication of objects. Yeah, working with objects that are already created and initialised is governed by the volatiles and synchronisation, but what about object creation time? There’s another Java language keyword that can help us here: final.
What does declaring a variable final gives us? It introduces a freeze action at the end of constructor which makes the changes to the fields become visible to all threads. Without it, we’d observe semi-initialised objects where some fields are already written and some are not.
Can you imagine something like that happening to a SecurityManager?
You’d be surprised, but observing a half-initialised object can really happen in real life. After the session and the interview, Aleksey showed us JCStress tests that prove it can really happen, albeit on ARM hardware.
If you want to look deeper into the problem, here’s the presentation on Final Fields Semantics by Vladimir Sitnikov, which Aleksey mentioned several times during the vJUG talk.
Now at the end of the session Aleksey recommended some books and articles that you might like to read to understand things better.
Another source of knowledge you might want to try is the concurrency-interest mailing list. It has a great mix of high level questions and deep technical discussions about concurrency primitives.
I can’t believe it’s over, I want more
First of all, no single blogpost can do justice to the real experience of listening to this session, so be sure to check it out. Sometimes, it is hard to understand on the fly, but Aleksey is a world-class speaker and he did a great job at explaining all the details with code examples. Oh and he was simultaneously answering questions in the IRC chat (sic!). That is actually something I’ve never seen before and that is purely amazing. You can grab the IRC chat logs and see for yourself.
A transcript of the same session given some other time is available at Aleksey’s website: jmm-pragmatics.
Also after the session we caught up with Aleksey and asked him some questions about his work, performance, benchmarks, philosophy and much more. To find out which tools are essential to a performance engineer, what Aleksey thinks about JVM languages, and how to get into the performance engineering, be sure to tune in to the interview below. Spoiler alert: there’s no easy way to become a performance guru, fortunately!
The code for the JCStress tests that you’ve seen in the video above can be found in the JCStress repository, here’s a link to the exact commit.
As always, I’d feel terrible if you miss any updates or announcements about upcoming interviews and blogposts, so here’s a convenient form to subscribe to episodic emails from me.