Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.

Mutation Testing in Java with pitest by Henry Coles

Henry Coles

The latest Virtual JUG session was about a somewhat mystical and intriguing subject: mutation testing! Luckily for us, the speaker we had is one of the world class experts on the topic — Henry Coles.

Henry is the Head of Innovation at NCR, in Edinburgh. For the last 15 years he’s been working at becoming a better software engineer and has been developing award winning systems in industries ranging from energy trading and smart metering through to life insurance and finance.

You can find him on Twitter and ask all the questions about the session, about mutation testing, code and tests quality and software engineering in general.

He is also the author of the mutation testing library for Java, pitest. So, Henry has vast experience and knowledge of the matters involved and was generous enough to share his expertise with us in this session.

Without further ado, here’s the video of the presentation, watch it at your leisure or continue reading to get a grasp of what this session was about, what we learnt from it and also to watch our interview with Henry.

Mutation Testing in Java with pitest

Mutation testing with pitest

To understand what mutation testing is about Henry started the session with a few questions that developers often ponder:

  • How can you safely refactor tests?
  • How can you trust existing tests for an inherited code base?
  • How can you ensure that the tests you have written are effective?

All these questions boil down to the one most important question you can ask about any test suite:
How do you measure the quality of your test suite?
The common approach to answering these questions often sound more like excuses, rather than comprehensive answers. We either make it QA’s problem, overestimate our confidence in the tests we create or just blame everything on test driven development: we saw these tests fail once, and now they are not, so they are effective.

While this may be a good approximation of an answer about quality, it’s not enough to have bullet-proof confidence that your tests won’t let you down. Especially, given the fact that the tests are also just code, which as we know tends to be broken all the time anyway. So either we write tests for our tests, which just defers the question of quality, or we put our hopes onto the qualification of the developers to do the right thing.

There are of course more conservative measures to ensure the quality of tests. For instance, peer reviews or pair programming. Alternatively there are actual metrics which code coverage tools provide to show that we indeed run through the actual code during test runs, etc. But these are slow, labor intensive to verify and not universally applicable.

Mutation testing takes a different approach and provides you the mechanism to ensure that you know which parts of the code are executed by the tests AND which lines are actually tested.

Henry showed us examples where some project code achieves a perfect line code coverage with a test suite which at the same time doesn’t contain any meaningful assertions, so, testing nothing of interest. The answer we’re looking for is quite straightforward: to assess the quality of the test suite one must intentionally introduce a bug into the code we want to test and see if the existing test suite catches our bug. This idea is not new, In fact it was explored in 1971, before I was even born, in a student paper by Richard Lipton: “Fault diagnosis of computer programs”.

Mutation testing answers our question of how we can achieve coverage by scale. And the pitest library does exactly that for your code base: it changes the code to introduce the bugs and runs the tests to see if the catch them.

Mutants, tests and quality assurance

Some terminology is required here before we go much further, but don’t worry it’s not rocket science! Making a change to the code is called mutating it. There are many possible changes that should bear meaningful results to the program, for example:

  • Changing comparison operators: => to >
  • Removing a method invocation: foo.aMethod() to //foo.aMethod()
  • Changing the method calls: foo.aMethod() to foo.anotherMethod()
  • Changing primitive return values: 0 to 1 and vice-versa

The list goes on, but you get the idea. We can automatically pick a change from a long list of possible changes and introduce that individual change to the code. It’s fairly easy to do so on the JVM, where everything is compiled to the bytecode that can then be manipulated without actually changing the source code of your program.

Applying a mutation operation to our code produces a mutant. The best part that we can create a zombie apocalypse! Sure we can, we just create tons of different mutants automatically. Now we face the harder problem of determining if the mutant is actually useful! By this we mean, is the change in the code tested?

The answer to that is very simple on paper: run the tests and see if they fail. In the mutation testing terminology, the mutant that doesn’t cause the tests to fail has survived. Surviving mutants are the most important aspect of mutation testing as they highlight where in the code your tests lack coverage. Any mutants that make the tests fail are called killed mutants.

Cool, right? Can we kill all the mutants though? Sounds a bit like a first person shoot ‘em up! Just because mutants exist that don’t make test cases fail doesn’t mean the program is not tested.

Of course there can be such changes. Software is a complex beast and the behavior of our systems depends on more than a single line of code that we might be changing.

This is known as a problem of the equivalent mutations. This means the code that is changed is semantically equivalent to its prior version. It is a problem, because we cannot determine the equivalence of two programs automatically, which means human validation is necessary.

However, the general approach of mutation testing is super efficient at exploring the quality of your tests. It answers to the following questions:

  • Mutation testing always shows which code that is tested.
  • It gives you a very high degree of confidence in the test suite.
  • It can sometimes find bugs in the actual code.

Pretty sweet, eh? So now, 40 years later we have the pitest library that explores the idea of mutation testing and applies its principles to Java programs. Other languages have other implementations of the concept, but Henry focused on Java.

The session showed a demo of pitest running on the Joda-Time library and some sample projects to show different aspects of what you might expect from mutation testing. One aspect which Henry highlighted is that mutation testing might take a lot of time. Imagine that you have to run all your tests to determine if a single mutant survives? That can take ages on an average enterprise project.

Have no fear, Henry continued to explain how pitest prioritizes the code changes and tests to run in order to get back results as quickly as possible. How certain assumptions, like the tests which catch superficial bugs (these are the ones you hate to write the most, by the way) are most suitable to catch the real bugs too help to reduce the overall duration of mutation testing significantly.

All in all, there’s little point to describe the demo that was going on in the session, you should really watch it yourselves to get the benefit. Here’s a link to the Github repository for the sample project Henry used in the session, called truth, so you can follow it and catch a couple of mutants yourself.

Check this project, consider adding pitest run to a pet-project of yours, or who are we trying to kid, go test it with the main project you work on everyday. It might help you a lot with trusting and refactoring your tests. But most importantly, watch the session in full, it was a very educational one and Henry did an amazing job at explaining mutation testing so we got super inspired about it. Seriously, go watch the video!


After the session, Henry was a great sport and joined us for the RebelLabs interview with a Virtual JUG speaker. Where we had great time chatting about tooling and code quality, which JVM languages he likes and which he feels most productive with. We also talk about whether mutation testing is a necessary tool and how developers can sell it to their managers.

Read next: