Recently I had to take a look at an application that was suffering from thread starvation and memory leaks. At least, these were the claims–and yes, they were partially true.
What I found was some misuse of the old Apache Commons HTTP Client (3.x). When you see multiple threads accessing the same HTTP Connection with a thread unsafe connection manager, it tells me that someone wasn’t reading the documentation of the libraries they are using! I believe that sometimes you have to play the blame game (diplomatically, of course) or else a person won’t learn–not a time for the good old blameless post mortem. Even more annoying is that the http client documentation has at least one full page dedicated to this; the solution boils down to ‘use a thread safe connection manager’, something that is even indicated in the Javadoc, if I recall correctly.
So as you can guess, updating to the new version HttpComponents and using it correctly fixed thread starvation issues. No more deadlocks.
But that wasn’t the only issue with the application. For some reason there were numerous
IllegalStateExceptions being thrown at me–around 5% of all requests made resulted in an
That is worth a decent code inspection, don’t you think? So debugging goes on.
So I started climbing the way up a stack trace that started at a
ReadWriteReentrantLock#unlock method throwing the
IllegalStateException. Over an EHCache
Cache#get method, at which point you start wondering: a bug in EHCache, no it can’t be. Let’s try to Google it: oh, here are a couple Jira tickets on ehcache Jira. Let’s read it: damn, all have been closed with ‘works as intended’, ‘not a bug’ and gives no real pointers as to what could cause it.
So I keep working the way up the stack trace, where I see nothing abnormal happening…and THEN I SEE IT: suddenly, a strangely named javax.servlet.Filter implementation called the
MemoryFixFilter enters the scene.
Apparently this filter uses reflection to allow access to parameters of
java.lang.Thread and clears all new ThreadLocal variables on the current thread after
filter.doChain. Seems very dangerous. And indeed, if you look at the
ReadWriteReentrantLock it uses ThreadLocal which, in some cases, has been cleared by the time it should be used and thus causes a
I figured the filter was written to clean up ThreadLocal instances that are misused somewhere. But I would much rather have some misused filter be the cause of the issue than writing some strange hack that could have all sorts of unexpected side effects. So now the app no longer throws an
IllegalStateException for 5% of the requests and thus, more happy users!
What did I learn, or rather what did I realize I should emphasize to others more often?
- Don’t write code to fix an issue before you know what is causing it!
- Don’t know what’s happening? DEBUG sessions for the win!
- …RTFM for the libraries and frameworks you use! If you don’t read the manual, don’t be afraid to read the code! I’ve partially read the sources from most open source frameworks I’ve used