One of the most fascinating additions to Java 9 is the JVMCI: Java-Level JVM Compiler Interface, a Java based compiler interface which allows us to plug in a dynamic compiler into the JVM. One of the main inspirations for including it into Java 9 was due to project Graal — a dynamic state-of-the-art compiler written in Java.
In this post we look at the reasons Graal is such a fascinating project, its advantages, what are the general code optimization ideas, some performance comparisons, and why would you even bother with tinkering with a new compiler.
Like everyone else we were inspired by the vJUG session by Chris Seaton on Graal – it looks like a great tool and technology and so we decided to play with the technology and share it with the community.
Introduction to Graal and Truffle
Graal is an Open Source version of the JIT compiler that itself is written in Java. It does pretty much everything that the HotSpot JIT does. It has been pulled out of the JVM, and it directly generates machine code for a given platform.
The name comes from the original word: Holy Grail, basically tooting itself as the “Holy Grail of Compilers”.
Truffle is the language implementation on Graal, and it interfaces with (talks to on our behalf) Graal.
Graal can be plugged into the latest JDK9 JVM (also GraalVM based on JDK 8 is available for various platforms) using a command-line flag allowing you to install your own JIT compiler into the JVM.
The advantages of Graal and Truffle
A compiler is a pretty well-hidden component of any system. You don’t work with it directly from your code. Some developers might not even know how important the compiler actually is. However, knowing some details about its inner workings can make your application development easier.
One of the main benefits of Graal and Truffle is that it is kind of language agnostic. Truffle and the bindings for your chosen language define an abstract syntax tree (AST) representation of your program, and Graal works really well with ASTs. This means that you can seamlessly combine multiple languages in a single runtime process without the slow and expensive interactions (think of language interop required to exchange data or execute code) between them.
As long as the programs in every language are represented in the AST form to Graal, they are considered the same source material, so all the code optimisations apply across all source languages and you don’t get any performance hit for using multiple source languages.
How does it actually work?
A typical flow would look like this:
Program Code → AST → Truffle → Graal → Machine code AST → Abstract Syntax Tree (explicit data structures in memory)
We all know that a JIT is embedded inside HotSpot or the JVM. It’s old, complicated, written in C++ and assembly and is fairly hard to understand. It is a black box and there is no way to hook or link into the JIT. All the JVM languages have to go through the same route:
Program Code ⇒ AST ⇒ Bytecode ⇒ Machine code (ASM) (ASM = assembly)
Graal has been written to provide JVM languages with better support. It gives all code direct access to the JVM internals.
It provides more options for languages running on the JVM and also their creation – paving the way for polyglot developers.
The JIT is hidden to Java developers, and Graal exposes it to us. Helping us to write our programs in the language our use-case suits the best.
“Different languages, do different things, each language has its own forte and does best those things. They each have different features and have different approaches to solving problems or help solve problems.“
With Graal, we get the cycle shown below:
Program Code ⇒ JIT ⇒ Machine Code Program Code ⇒ AST ⇒ Machine code (ASM) (notice Graal skips the steps that create byte-code by directly generating platform specific machine code)
Graal basically helps moving the control-flow from Code to the JIT bypassing the JVM (HotSpot, in our case).
Performance: Is Graal faster or slower than my current JVM?
The traditional JVM implementation is slower than the GraalVM, due to the slow interop (conversions) on the JVM. On the other hand the GraalVM is faster, due to inlining and other optimisation techniques – due to having native Truffle implementation. Truffle helps produce high-performance machine code.
Graal is faster, more efficient, and it offers a revolutionary way to support different languages on the JVM. You are able to introduce more than a single language in a project, and be able to run programs faster. You can write using any language while not sacrificing the performance when running the compiled code of your language on the JVM.
The performance of programs running on Graal is often better, if not, the same as a highly tuned native language.
Here’s an illustration of the flow from source code (program) to machine code using Truffle and Graal:
Truffle is a framework, an implementation of Graal, written to talk to Graal on the program’s behalf.
Graal/Truffle performance optimisations
Note: the four diagrams in this section have been re-used from the paper One VM to Rule Them All, the authors to credit are Thomas Wurthinger, Christian Wimmer, Andreas Woß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon and Mario Wolczko.
A number of performance optimisation techniques are used in order to improve the execution speed of a program, as follows:
Node rewriting by profiling the AST
This technique is applied by using the information retrieved from profiling feedback of an AST.
It involves the conversion of dynamically typed nodes into statically typed nodes, as this both helps to improve performance and leads to faster and safer conversion to optimised machine code.
Initially, the nodes in the AST are uninitialised, but over time the nodes get configured (specialised) to handle any data type, based on the data types that are used during the course of the program execution.
This technique involves folding constants and inlining of methods and types.
Methods are inlined by doing a copy-and-paste of the node – the subtree from source to destination.
Types are inlined similarly via a copy-and-paste of the type from source to destination.
Both these lead to the elimination of conversions and interop operations, leading to performance gains.
The AST interpreted code is inlined into one block of compiled code, to produce a single method which is then converted into optimised machine code for the given platform.
Truffle allows de-optimisation which brings optimised machine code back to its original AST interpreter equivalent, to act on continuous profiling feedback.
De-optimisation offers new ways to create optimised compiled code, in case an optimisation strategy does not give the intended performance benefits, triggered by the profiling feedback.
It gives the flexibility to change optimisation strategies based on state changes in the system, for example, a change in the data type of one or more nodes in the AST. This can be understood as the change in the data type of one or more variables or methods in the program.
In this post, we looked at some of the potential benefits of using Graal, a new dynamic compiler written in Java. We also tried to explain some of the general optimisation techniques Graal and Truffle employ to make the code run fast.
It’s currently not production ready, but you can entertain yourself by checking it out and trying to run your projects on it.
We have compiled a number of Graal and Truffle related resources, listed below, that we gathered during the course of writing this post. These can be found on github in the awesome-graal repository.
-  (Oracle code): Turning the JVM into a Polyglot VM with Graal – Chris Seaton
-  (Voxxed days): One VM for all – Thomas Wuerthinger
-  (JCrete): Fastest VM on the planet
-  (Splash 2016): Truffle and Graal: Fast Programming Languages With Modest Effort – Chris Seaton
-  (VJUG): Turning the JVM into a Polyglot VM with Graal – Chris Seaton
-  Awesome Graal resources page