Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

Attack of the logs! Consider building a smarter log handler

Screen Shot 2013-12-17 at 3.37.43 PM

The difference between logging more and logging smarter

When it comes to firefighting almost any software issue, logs are essential (and here’s a great story about that). However when everything is running smoothly, logging often just takes lots of space on your drives, decreases performance of your system and annoys your system administrators because it implements rotating incorrectly.

I’m not talking about those logs necessary to audit your system, but those that shed a light on the internal state of the system when something breaks and you’re in charge of getting everything working again.

Naturally, your application uses logging levels in a very sane way, where irrelevant information is tagged DEBUG and TRACE might lead you to find something interesting, but when it reaches an erroneous state it uses ERROR to be visible even in the most restricted logging settings.

So here is the challenge: how can you find out what occurred in the system before the actual error was encountered? Something definitely happened, but to understand the problem it’s extremely important to find out what started the issue in the first place. However, if you’re running the system with INFO level logs only, everything important is swallowed.

There are two natural solutions to this problem: more logging and smarter logging. I’ve often seen more logging occur by adding more INFO level logs, or by running the system with DEBUG enabled for a period of time after a new release.

But I want to talk about how we could log smarter using what we can call “opportunistic log handling”. This is basically an approach that can get us all the logs we need, when we need them and keep them out of the way when we don’t. What if we could make our logger smarter, so that it could provide opportunistic information based on time periods or special events and create actual log files when and if a need occurs?

I know what you’re thinking: Yes, this will require some memory for each logger instance, but that is a question of scalability. So before we show you how this can be done, let’s first look at what advantages you can gain by this approach…

Advantages and applicability of opportunistic log handling

One of the clearest advantages to this approach is that we won’t populate log files with unnecessary information. It will make debugging and monitoring easier, as everything in the logs related to specific components will appear only when actual errors occur.

The obvious tradeoff here is the increased memory consumption, as we have to keep messages that are accumulated, but not yet flushed, in memory.

Let’s assume that an average log message is 160 characters long. This is quite an optimistic claim, but hey, if 140 characters is enough to convey a scientific discovery, add some padding for message source and 160 should be enough for everyone.

By taking the napkin from underneath my beer, I can quickly see that I can have over 6500 log messages at the expense of just 1 megabyte.

Opportunistic logging, napkin math

Obviously this is not acceptable to every system, but if you run something smaller than yahoo or reddit, you can probably consider it. I certainly thought about it, and decided to move forward–so let’s get our hands dirty and throw together a sample implementation to see how it feels.

Logbook’s FingersCrossedLogger

Here is an example of something along the lines of what we’re talking about here. FingersCrossedLogger is a Python class from a Logbook library that actually made me think about optionally logging events that happened before the actual error.From its documentation, we get the following.

This handler wraps another handler and will log everything in memory until a certain level (action_level, defaults to ERROR) is exceeded. When that happens the fingers crossed handler will activate forever and log all buffered records as well as records yet to come into another handled which was passed to the constructor.

You can check out the source code for that on GitHub. A quick search reveals that the same concept is available in Monolog for PHP. However, I’m mainly interested in Java, so let’s try to port that to the JVM.

Applying the concept to Java

When it comes to logging, SLF4J appears to be the de facto specification for logging at the abstraction level. A previous RebelLabs post that surveyed a small group of international developers found the following results:

Logging by abstraction layer

Naturally, I want my implementation to be compliant with the slf4j-api artifact, so the first thing is to satisfy that.

<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>1.7.5</version>
</dependency>

And then we just need to define a subclass of org.slf4j.Logger, which is the main user entry point of slf4j-api.

A straightforward implementation doesn’t look complicated at all, however one thing is bothering me a little. The logger doesn’t declare levels for messages; instead, there are specific methods for: info(), debug(), trace(), which works great for the API because it is more portable across frameworks. However, an opportunistic logger needs a specific marker to know when to stop collecting messages to memory and start flushing. I thought about using LocationAwareLogger.XXX_INT constants, which are available in the same API jar, but it seems too unrelated, so there is a dedicated enum for that.

All in all, the definition looks something like this.

public class OpportunisticLogger extends MarkerIgnoringBase {
  private final Logger delegate;
  public final Level triggerLevel;
  private LinkedList messages = new LinkedList();
  private volatile boolean triggered;
 
  public OpportunisticLogger(Logger delegate, Level triggerLevel) {
    if (delegate == null) {
      throw new IllegalArgumentException("Delegate logger shouldn't be null");
    }
    this.delegate = delegate;
    this.triggerLevel = triggerLevel;
  }
 
  private void tryTriggering(Level lvl) {
    if (!triggered && lvl.level <= triggerLevel.level)
      trigger();
  }
 
  private void trigger() {
    for (String msg : messages) {
      delegate.info(msg);
    }
    triggered = true;
  }
 
  private void log(String msg) {
    if (triggered) {
      delegate.info(msg);
    }
    else {
      messages.add(msg);
    }
  }}

The messages store is just a linked list; when messages are retranslated to an actual logger, they all go as info messages. Alternatively, we can figure out which level is enabled in the delegate instance and use that. All in all the implementation is very straightforward.

Check out the whole project on GitHub.

Ok, let’s try to improve performance

The one shortcoming of such an approach to logging is of course heavy memory use to store all the messages. One way to improve the implementation in that area is to use a smarter data structure for them.

A trie data structure comes to mind immediately. It is a tree where a common ancestor of nodes describes a common prefix of said nodes. Why is it good? Storing many strings can take less memory when using a trie, because the space is reused between several instances.

However, as far as I know there is no trie implementation that is considered a default one. Java standard library doesn’t have this at all, and although there was something in Google Collections, it didn’t end up making it to Guava.

Of course there are several easily “Googleable” implementations, but it’s always tricky with randomly-found data structure implementations. It’s so easy to skew the performance, that I personally tend to believe in battle-tested libraries only.

On another note, you can always switch to a more performant logging implementation in general, which would be more relevant to the performance of logging in general than to opportunistic loggers.

For example I’ve consider trying out AsyncLoggers from log4j2 for some time now and as I’m dealing with logs anyway, it feels like a right time to do so. For lazy readers who don’t want to click through links, AsyncLoggers uses LMAX Disruptor to enable higher throughput in a multi-threaded setting.

It probably won’t make your opportunistic logs better or more usable, if you find that the performance of the latter is not acceptable for your needs, but it’s always good to know there are options.

Conclusion and open questions

When starting this post, I just wanted to share the idea of FingersCrossedLogger because of its cool name. But it turned out that it led me to other interesting questions:

  • Have you thought about something like an “Opportunistic Log Handler” in your projects?
  • Do you log information per process if something goes wrong or do you prefer to log everything?
  • How would you implement a trie in Java?
  • What is your general go-to source for all kinds of algorithmic snippets?

In general, it pays to take some time to figure out your logging framework–even if it doesn’t make sense to deal with logging performance in a small application, but when the need arises it’s  easier to switch logging off in production than to figure out if it can be made usable when it’s already too late.

Please leave your comments below, or tweet me @shelajev or @RebelLabs.