Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

How to ‘EZ bake’ your own lambdas in Java 8 with ASM and JiteScript

You actually can buy this...

Ah, good old Java bytecode. We’ve discussed this in the past with a report called Mastering Java Bytecode at the Core of the JVM, but let’s jog your memory once again: Bytecode is the binary representation of your source code that the JVM is able to read and execute.

Bytecode libraries are widely used in Java today, especially within Java EE, where proxies–dynamically generated at runtime!–are used extensively. Transforming bytecode is also a common use case, for instance for supporting runtime-weaving of aspects in AOP, or to support the extended class-reloading technology that tools like JRebel offer. We also see the usage of the libraries for parsing and analysing bytecode, for instance in the code quality area.

There are multiple bytecode libraries available for you to choose from if you want to start transforming a class’ bytecode, among the most widely used ones are ASM, Javassist and BCEL. In this post we’ll look at bit at ASM, and JiteScript, which is an API on top of ASM, offering a more fluid API for generate classes.

Is ASM short for “awesome”?

Um, probably not. ASM is a library by the ObjectWeb consortium providing a Java API for parsing, modifying and generating JVM bytecode. It’s a widely-used library and often regarded as the fastest one out there for this purpose; the fact that parts of the underlying Lambda implementation in Oracle’s JDK8 uses the ASM library should also attest to this.

A lot of other frameworks and tools also utilize the power of the ASM library, including a lot of the JVM language implementations like JRuby, Jython and Clojure. Giving this, choosing ASM as a bytecode library is an easy choice!

ASM basics with the Visitor pattern

The overarching architecture of the ASM library is the usage of the Visitor pattern. While either reading or writing bytecode using ASM, the Visitor pattern is utilized to–in turn–visit the individual parts of class files’ bytecode.

Parsing the bytecode of a class is as simple as implementing visitors for the parts you’re interested in, and then using a ClassReader to parse an array of bytes containing the bytecode.

Likewise, generating the bytecode for a class is done by using a ClassWriter and then visiting what should be in the class, and then call toByteArray() to combine it into a byte-array containing the bytecode.

Modifying—or transforming—bytecode then just becomes the art of combining the two, having the ClassReader visit the ClassWriter, with other visitors in between to add/edit/remove the various parts.

While using the API is fairly straightforward, it does still require a certain level of general knowledge about the class file format, bytecode operations available, and stack machines. Also some things that are hidden on Java source code level that the compiler takes care is now up to you to implement; like knowing you need explicitly call a super constructor in constructors, and make sure there is a constructor in class if you want to instantiate it; and that the bytecode representation of a constructor is a method named “<init>”.

A simple HelloWorld class implementing the Runnable interface that writes “Hello World!” to System.out when run() is invoked, could be generated using the ASM API as follows:

ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
cw.visit(V1_5, ACC_PUBLIC, "HelloWorld", null,
   Type.getInternalName(Object.class),
   new String[] { Type.getInternalName(Runnable.class)});
 
MethodVisitor consMv = cw.visitMethod(ACC_PUBLIC, "","()V",null,null);
consMv.visitCode();
consMv.visitVarInsn(ALOAD, 0);
consMv.visitMethodInsn(INVOKESPECIAL,
   Type.getInternalName(Object.class), "", "()V", false);
consMv.visitInsn(RETURN);
consMv.visitMaxs(1, 1);
consMv.visitEnd();
 
MethodVisitor runMv = cw.visitMethod(ACC_PUBLIC, "run", "()V", null, null);
runMv.visitFieldInsn(GETSTATIC, Type.getInternalName(System.class),
   "out", Type.getDescriptor(PrintStream.class));
runMv.visitLdcInsn("Hello ASM!");
runMv.visitMethodInsn(INVOKEVIRTUAL,
   Type.getInternalName(PrintStream.class), "println",
   Type.getMethodDescriptor(Type.getType(void.class),
   Type.getType(String.class)), false);
runMv.visitInsn(RETURN);
runMv.visitMaxs(2, 1);
runMv.visitEnd();
 
cw.visitEnd(); 
byte[] bytes = cw.toByteArray();

As visible from the above, using the default Visitor pattern methods of the ASM API does–as mentioned–require some knowledge of which categories the individual opcodes belong to, in order to call the correct visitor method. A way to combat this is to use a GeneratorAdapter when generating methods, which exposes most of the opcodes through appropriately named methods instead, and can pick the right opcode when returning, for instance, a value from a method.

Daddy, can I go play with lambdas?

With Java 8, lambdas were introduced to the Java language; but at a bytecode level, nothing changed! We’re still utilizing the existing invokedynamic functionality added in Java 7. So does this mean we can run lambdas on Java 7?

Unfortunately not. The necessary support runtime classes to create the call sites used by the invokedynamic calls aren’t present; but it’s still a fun experiment to see what we can do with it:

We’re going to generate lambdas without the language-level support!

So what exactly is a lambda? Well, simply stated it’s a method invocation wrapped–at runtime–into a compatible interface. So let’s see if we can also do the wrapping at runtime, utilizing the reflection Method instance as an indication to which method to wrap, but without actually using reflection to do the invocation!

Looking at the bytecode generated for lambdas, we notice that the bootstrap method for the invokedynamic call contains all the information about what method to wrap, which interface to wrap it in to, and the descriptor of the method of the interface. So seemingly, it’s just a matter of creating bytecode but with parameters matching our method and interface instead.

Bytecode generation you say? ASM to the rescue!

So we need the following input:

  • a reference to the Method we want to wrap
  • a reference to the functional Interface we want to wrap it in
  • if an instance method, a reference to a target object to invoke the method on

So let’s define some methods for this:

public <T> T lambdafyVirtual(Class<?> iface, Method method, Object object)
public <T> T lambdafyStatic(Class<?> iface, Method method)
public <T> T lambdafyConstructor(Class<?> iface, Constructor constructor)

We need a way to convert these to something ASM can understand that we can write to bytecode, in this case a MethodHandle that the LambdaMetafactory can read. MethodHandles are represented in ASM by the Handle type, and creating a handle for a given method based on a reflection Method object, is very simple (here for an instance method):

new Handle(H_INVOKEVIRTUAL, Type.getInternalName(method.getDeclaringClass()),
    method.getName(), Type.getMethodDescriptor(method));

So now that we have a method Handle that we can use in an invokedynamic bootstrap method, it’s time to actually generate the bytecode! So lets generate a Factory class, that provides a method for generating our lambda using the invokedynamic call.

Putting this together, we end up with a method like:

public <T> T lambdafyVirtual(Class<?> iface, Method method, Object object) {
  Class<?> declaringClass = method.getDeclaringClass();
  int tag = declaringClass.isInterface()?H_INVOKEINTERFACE:H_INVOKEVIRTUAL;
  Handle handle = new Handle(tag, Type.getInternalName(declaringClass),
      method.getName(), Type.getMethodDescriptor(method));
 
  Class<Function<Object, T>> lambdaGeneratorClass =
      generateLambdaGeneratorClass(iface, handle, declaringClass, true);
  return lambdaGeneratorClass.newInstance().apply(object);
}

When we finally have generated the bytecode (see below), we need to define the bytes. For this purpose we utilize a small hack, calling the defineClass on the JDK’s Proxy implementation in order to inject the Factory class into the same class loader as the class in which the method we’re wrapping is defined in. Furthermore, try to add it to the same package as well, so we have access to protected and package methods as well! This needs to figured out before generating the bytecode though, in order for the class to have the correct name and package. We simply generate a class name using Random; while this is not a great solution by any stretch, for the purpose of this example it’s acceptable.

A battle of verbosity: ASM vs. JiteScript

So we used a classic TV-kitchen technique above, and quietly pulled out a pan from under the table with the finished product! But now is the time actually look at the bytecode generation bits of this little experiment.

Let’s see how this looks when implementing using ASM:

protected byte[] generateLambdaGeneratorClass(
    final String className,
    final Class<?> iface, final Method interfaceMethod,
    final Handle bsmHandle, final Class<?> argumentType) throws Exception {
 
  ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
  cw.visit(V1_7, ACC_PUBLIC, className, null,
      Type.getInternalName(Object.class),
      new String[]{Type.getInternalName(Function.class)});
 
  generateDefaultConstructor(cw);
  generateApplyMethod(cw, iface, interfaceMethod, bsmHandle, argumentType);
 
  cw.visitEnd();
  return cw.toByteArray();
}
 
private void generateDefaultConstructor(ClassVisitor cv) {
  String desc = Type.getMethodDescriptor(Type.getType(void.class));
  GeneratorAdapter ga = createMethod(cv, ACC_PUBLIC, "", desc);
  ga.loadThis();
  ga.invokeConstructor(Type.getType(Object.class),
      new org.objectweb.asm.commons.Method("", desc));
  ga.returnValue();
  ga.endMethod();
}
 
private void generateApplyMethod(ClassVisitor cv, Class<?> iface,
  Method ifaceMethod, Handle bsmHandle, Class<?> argType) {
  final Object[] bsmArgs = new Object[]{Type.getType(ifaceMethod),
      bsmHandle, Type.getType(ifaceMethod)};
  final String bsmDesc = argType!= null ?
      Type.getMethodDescriptor(Type.getType(iface), Type.getType(argType)) :
      Type.getMethodDescriptor(Type.getType(iface));
 
  GeneratorAdapter ga = createMethod(cv, ACC_PUBLIC, "apply",
      Type.getMethodDescriptor(Type.getType(Object.class),
      Type.getType(Object.class)));
  if (argType != null) {
    ga.loadArg(0);
    ga.checkCast(Type.getType(argType));
  }
  ga.invokeDynamic(ifaceMethod.getName(), bsmDesc, metafactory, bsmArgs);
  ga.returnValue();
  ga.endMethod();
}
 
private static GeneratorAdapter createMethod(ClassVisitor cv,
    int access, String name, String desc) {
  return new GeneratorAdapter(
      cv.visitMethod(access, name, desc, null, null),
      access, name, desc);
}

And now lets compare with the same thing using JiteScript, utilizing the instance initializer approach:

protected byte[] generateLambdaGeneratorClass(
    final String className, final Class<?> iface, final Method ifaceMethod,
    final Handle bsmHandle, final Class<?> argType) throws Exception {
 
  final Object[] bsmArgs = new Object[] {
      Type.getType(ifaceMethod), bsmHandle, Type.getType(ifaceMethod) };
  final String bsmDesc = argType != null ? sig(iface, argType) : sig(iface);
 
  return new JiteClass(className, p(Object.class), 
      new String[] { p(Function.class) }) {{
    defineDefaultConstructor();
    defineMethod("apply", ACC_PUBLIC, sig(Object.class, Object.class), 
        new CodeBlock() {{
      if (argumentType != null) {
        aload(1);
        checkcast(p(argumentType));
      }
      invokedynamic(ifaceMethod.getName(), bsmDesc, metafactory, bsmArgs);
      areturn();
    }});
  }}.toBytes(JDKVersion.V1_7);
}

It’s clear that when generating bytecode with predictable patterns like the above code, using JiteScript makes for lot more reader-friendly and concise code. This is also thanks to it’s shorthand utility methods, like sig() instead of Type.getMethodDescriptor(), which were statically imported here.

Mixing it all together

So with with the MethodHandle part implemented, as well as the bytecode generation part, let’s put it to test, and see if what we’ve made actually works!

IntStream.rangeClosed(1, 5).forEach(
   lamdafier.lambdafyVirtual(
     IntConsumer.class,
     System.out.getClass().getMethod("println", Object.class),
     System.out
   ));

And lo and behold, it works; producing the output we hoped for:

1
2
3
4
5

The above example also showcases one of the real strengths of the lambda implementation: its ability to convert/box/unbox types as needed, in this case wrap a void(Object) method into the void(int) method defined in the IntConsumer interface!

Conclusion: Use all the tools!

Getting started with ASM isn’t that hard; yes, some knowledge about bytecode is required, but once the basics are there, digging in and creating your own classes from the ground up can be a fun and satisfying experience. Furthermore, it can give you access to stuff you normally don’t have access to via Java code! Likewise, the knowledge that you, at runtime, can create your own classes specific to the current runtime environment opens up opportunities you perhaps never thought possible.

ASM is a very strong library when it comes to bytecode transformation, but JiteScript shows that with its concise way, it makes for very readable code. Luckily, you don’t have to choose either or, the two are compatible–JiteScript after all is basically just a wrapper for the ASM API.

Try it for yourself!

By putting all the pieces mentioned here together, we’ve created some simple code for generating lambda from Method reflect objects using ASM, utilizing the power of the JDK8 lambda implementation to take care of all the necessary parameter and return type conversions!

So what are waiting for? Go try it for yourself: https://bitbucket.org/michael_rasmussen/lambdafy


Check out more incredibly geeky content about Java bytecode on RebelLabs!

MOAR BYTECODE!
 

  • http://rafael.codes/ Rafael Winterhalter

    Note that JiteScript requires ASM to compute stack map frames and stack sizes. Therefore, JiteScript often takes double the time to create a class compared to pure ASM when you provide the information explicitly.