I’m currently in the beautiful city of Prague, getting ready for GeeCON which is planned for later this week. Being abroad is no reason to become lazy, so yesterday I went out to the local CrossFit gym, which was amazing, by the way, and I will gladly recommend it to you if you’ll ping me at @shelajev. And so it happened that while I was hanging from a pull-up bar, trying miserably to touch my toes with my fingers, I started thinking about how we track the performance of ourselves and all sorts of things. The word popped into my head: Metrics!
Actually, it’s not just a question about performance. It goes further: how do we assess, in general, anything complex enough not to be immediately comprehensible by our limited minds? Metrics!
In this post, I want to investigate what properties good metrics share and what do we want from them in order to be useful to us.
What can be a metric?
Mathematically speaking, a metric is a characteristic of distance between elements in a set.
What this means is that a metric is a number or a collection of numbers that signify where in the set the element sits. People use it as a simplified quantifier to heuristically judge the goodness of things.
If we think about it, anything can be a metric; we can record lengths of things, frequency, speed, amount of memory used by a system and so forth. These things represent scores of events about which we want develop a simpler view.
In software engineering, we’re mostly interested in what I like to call “hard metrics”, which speak about the software itself. Some examples are:
- rate, frequency (count / time)
- throughput / latency
This is a short but solid list. However, by themselves these numbers don’t tell us anything. The only valuable information they can provide is when we look at trends represented by them. Is my application growing? Can I serve more clients using the same hardware setup? These questions is not about the snapshot of the state, they imply progress over time.
Hopefully the progress and change mean improvement and not a regression. However, pure numbers cannot answer which is it to you.
Metrics simplify the view of things, describe your mental model of the system and can only be used together with this model.
The model you choose doesn’t have to be complex, but a basic understanding of what you’re measuring and how the real world acts is paramount.
Ok, give me some examples
It might be easier to grasp the importance of the right model based on the examples. In this part, I’ll describe what I find valuable in any given metric.
1. Good metrics are LINEAR
Metric value should preferably be linear, or at least quite intuitively convertible into a linear value. A larger number should correspond to a better / worse outcome and the difference in numbers should correspond to how much better or worse things are.
A good example of a linear metric could be a number of SQL queries your application makes to serve a request. Confession: this example is inspired by XRebel, I know. As you can guess, more database access requests make things slower. And you can exactly quantify this slowness: 4 queries are exactly 2 queries worse than 2 queries.
The time that those SQL queries take to process is not so linear. It is a derived metric about the execution and it can vary from time to time and based on other conditions. For example, you probably run the production environment on different hardware and with a different network layout, which changes everything about measuring time.
2. Good metrics are REPEATABLE
A good metric has to be repeatable. The measurement should be formulated in a way that later you can repeat it against maybe a changed system and see the difference (or lack of such). This is just good practice anyway, but if the results of the metric you collected are not repeatable, how can you be sure that they represent something true?
Your next measurements cannot falsify them, so they don’t form any good foundation to the important decisions you were planning to make when you got this data.
3. Good metrics are ACCURATE
This is a very obvious point, but with measuring things you never know. The rule of thumb in benchmarking is if you see a benchmark you haven’t seen before, it is probably flawed.
A good example of an incorrect metric would be something that answers the incorrect question. Obviously if your metric works as the famous XKCD random, it wouldn’t be useful.
But let’s look just a bit further. Imagine you have a system serving web requests, like 80% of us do, and you want to see how long your clients wait for a response. What would you pick: average latency or maximum latency obtained in the experiment?
The correct answer is neither! Both of them hide lots of details about the actual application behavior and won’t necessarily be connected to the mental model of serving requests you have.
4. Good metrics are INDEPENDENT
The independence of metrics is not usually a very big concern, but it also is a good practice. If you collect two metric numbers and one is influenced by the other, then someone, probably involuntarily, will start gaming your heuristic.
Take for example the infamous body mass index (BMI), it’s the value used to determine if an individual has “normal” weight or is overweight. It depends on the height of the person in question and their weight. One way to influence it is, obviously, to change the weight. But if you could change your height at will as well, you’d be able to game the resulting number without any actual changes to the body weight.
Can we make it even better?
Any metric that you’re going to introduce into your system and base your decision on should be correct, repeatable and linear. However there’s a number of other properties that I like to list. If you want your metric to influence the decisions way better than randomly, it should be:
- Specific – a metric should preferably describe one and only one part of the system. Being all over the place means that there will be more moving parts and the result might depend on other conditions that you didn’t foresee.
- Actionable – a metric must represent a call to action. If a certain metric’s result is unacceptable, there should be a clear way to figure out what should be changed or when the optimisations should stop. Otherwise you’ll fall victim to the premature optimisation problem.
- Realistic – the metric should outline your actual problems. For example, it doesn’t make much sense to do CPU profiling on a much more powerful machine than the one used in your production environment. The numbers won’t have the full meaning and while you’ll be able to see trends, the actual outcome might differ later.
- Timely – having the metric results one year later doesn’t have the same value as seeing them immediately. Remember, you won’t be able to influence the change faster than you can spot the trends of how numbers are changing. So, a one-year delay means that any action you take to improve the situation will have the visible impact only much, much later.
These are softer values of a good metric, but given how incredibly fast software engineering is moving, we should always keep them in mind when devising any monitoring solution or try to use numbers to justify an action.
In this post, we looked at how some data is good enough to be called a metric and visited a few examples in which we wouldn’t dare to call the data a metric. The term “big data” doesn’t necessarily mean big or useful information. Our world is being consumed by software and it runs on numbers. And almost any of them can be used as a heuristic to show when things are running smoothly.
The 4 main principles of a good metric–linear, repeatable, accurate, independent–are well-defined properties that we want our metrics to have so that we can build abstractions atop of them and create a better future. Hopefully we can start doing just that as soon as possible!