Blog

When we did our last productivity survey, some critics voiced concerns that although we did reveal some interesting stats on Java development productivity (i.e. tools used,  % of total coding time forfeited to redeploys, etc), we only touched a very narrow slice of their day-to-day life. What do we really know about how developers spend their work week?

So this time around, we made an all-out effort to put together a survey that will be useful to the not only the Java community, but all kinds of coders, team leads, project managers and CxOs all around the world. Some of the questions that we hope to answer are:

  • What do developers spend their days doing?
  • How efficient are those days?
  • What keep them up at night?
  • How do processes, tools and technologies help or hinder them?

This is an ambitious undertaking, especially because we wanted to keep all of this under 5 minutes. The resulting 20-question survey has only 5 Java-specific questions, so feel free to forward it to all your developer friends.

Our Ultimate Goal: To gain a better collective insight into the biggest productivity challenges that developers face today, as well into some of the tools and practices that keep us sane. Seems pretty simple, right?

So, without further ado — jump right in!
http://0t.ee/prosurve12

Thanks!

We just opened a Boston office for sales, we are looking for a CFO and engineers in Estonia, marketing is expanding in Prague and we’ll find a place for you in our hearts and minds if you’re are awesome enough :) Check out the Jobs 2.0 page for more.

Live from the JavaOne 2011 conference we are happy to announce the first major update to our production update management tool, LiveRebel.

With this release we make another step to a go-to management tool for all you Java application update needs. For starters, LiveRebel now supports initial deploying applications through the LiveRebel Command Center. To make your life simpler, we no longer require you to add a liverebel.xml file to the archive, if it’s deployed through us.

To allow greater flexibility we have reworked the command line interface to the Command Center, e.g. allowing to pause the application to wait for database updates that can now be scripted with no manual intervention as a part of the update. We also allow more flexible workflows, so that the developer can review and prepare an update and ship only one application archive to the operations team in charge of the production environment.

We are also happy to report that the first production deployments and customer case studies are going very well and we will soon start posting the accounts on our blog. LiveRebel can truly quickly and painlessly enable the fully automated one-button delivery of changes to the production, that is coveted by the proponents of continuous delivery and similar practices.

Get the updated release to make use of these features, as well as numerous bug fixes and minor improvements as fully described in the changelog.

We are very passionate about our work and our trade. So as the 0x100th day was nearing, we decided to celebrate it with a special challenge to all you developers out there! (and especially Java developers). And if you can solve then you would automatically pass through the first round of interviews at ZeroTurnaround :)

For bonus points write in the comments how long it took you and what tools did you use.

Have fun!

Foreword

A few weeks ago I ran a survey asking a dozen or so questions about Java EE production update. I’d like to thank the 607 individuals who took some time from their busy life to help me out. Today, I’ll return the favor by analyzing the resulting data for your pleasure. If for some reason you don’t trust me, feel free to check the calculations on the original data in this spreadsheet.

I’m sorry I couldn’t do this before, but life was too busy to do a proper analysis in the meantime. As it is, I’m writing these words from the CDG airport in Paris and will likely finish the article in the Porto-Lisbon train (or as it turned out in the Lisbon-Porto train).

Act I: The Pool

Running surveys is always a tricky business. Some amount of bias in a non-random selection is inevitable, so the first thing we need to do is to understand the selection pool. The survey included some questions that help with that.

One of the survey questions was “What industry are the applications you’re working on for?” (for those who cringe at the sight of prepositions at the end of a sentence — ha!). Luckily the answers spread pretty evenly across all the options with Technology, Finance, Online, Banking, Media and Telco having over 10% of representation each, so we can assume that the industries are well represented across the board.

Another question concerned the size of deployment: “About how many servers/instances do you have in production?”. The answers are represented by this graph:

About 93% have 50 servers or less, so the large deployment community is represented by only 43 respondents, only 4 of which have more than a thousand servers. It’s up to you to decide whether this represents the community-at-large, but at the very least it’s highly relevant if you have 50 servers or less in your organization or department.

I asked “What application servers (or other servers) do you use in production?”. This may be interpreted as the popularity lineup of production servers (at least in the 50 servers or less category) or just a categorization of respondents, unrepresentative of the industry – interpret as you like.

Unlike the surveys that ZeroTurnaround ran for the development environment, I allowed to select multiple servers and got a somewhat different lineup. It’s clear that OSS servers are leading the pack, but segmenting the respondents by OSS v/s non-OSS servers didn’t produce any interesting results.

Act II: Redeploys Considered Broken?

As for the data itself, it revealed some interesting patterns and some even more interesting points, where it lacked any pattern whatsoever. One of the hypotheses I was testing was that the “Update” or “Redeployment” functionality provided by containers is used little or not at all. My thinking was that because it’s very hard to impossible to stop redeploying from leaking memory (check out this article & conference presentation), it will quickly throw an OutOfMemoryError. However only 52% of respondents reported OutOfMemoryErrors.

On the other hand only 24% answered that they allow redeploy in production at all and only 13% use it as the primary means for production updates. It turns out that a score of additional problems prevent respondents from using container redeployment, including:

  • Lack of proper facilities for doing database updates
  • Thread races and deadlocks in Java EE containers
  • Thread and resource leaks
  • Security concerns
  • Native library issues
  • Various app server caching issues
  • Performance overhead when redeploying is on — I’m not sure what could this be, unless hot redeployment in production with class/resource scanning was meant
  • Rollback difficulties
  • Process or organizational issues
  • My favourite: Fearthere is very little trust into the reliability of redeployment based on multiple past issues
  • Bonus quote: “Java: Write once, run away”

The majority of these issues point to the same underlying problem — the apps running in the container and new app versions created by redeploying are not well isolated from each other. Apps can leak other apps or versions, take locks on global monitors, leak resources and produce other unwanted side effects. In that context, asking how soon the OutOfMemoryError happens isn’t so relevant anymore, as other issues both overshadow and mask it.

Act III: The Update Process

The next few questions concerned the update process, the importance of downtime and the amount of automation. Some very interesting patterns turned out.

Nearly 73% replied that they allow downtime during their production update. For them, restarting all servers in the off hours is the simplest and cheapest method of update. Only 54% of respondents answered that downtime doesn’t cost anything to them, leaving 19% losing money on every update.

How much money? 37% replied that downtime is “Priceless!”. From those who gave a number, the average price per minute was $30,016. However this included two extreme responses, where respondents gave a per minute price of $1M and $500K. Excluding those two, the average is only $3,230 per minute.

I asked how long it usually takes for a production update to complete. About 10% replied that it takes “Forever!”. The average time was 1.6h, but the maximum was 60h and the std deviation was 4.3h. Interestingly enough the correlation between the time it takes to update and the amount of servers was very near zero, so the length of an update doesn’t depend on the size of the deployment too much.

I asked the respondents to rate how much of a hack their update process is. In the survey the actual numbers weren’t present, but the answers implied a 1 (hack) to 5 (automated) rating.

The average rating was 3.7, which implies that for the majority the process is pretty well understood and mostly automated, with some manual labor required. This is a better result than I expected and explains why it takes only 1.6h on average to complete the update.

In another question I asked if the update process in use was ideal, and the replies were overwhelmingly negative with only 27% replying affirmatively; downtime and lack of automation being quoted as the largest areas of concern. And quite a few have mentioned that although downtime is ok in the off hours, it also means that the team has to be up at night.

The next set of questions concerned the method and tools used to update the production. I asked which way the production update happens and the layout was as follows:

Under 12% of respondents do updates that are risk free and completely unnoticed by the users. The most popular by far method of update, at 46% is just taking the servers down. There is a large disconnect here, as this means that 88% do updates in a manner that is likely to impact the users, whereas only 54% allow downtime during production updates. This means that 34% of respondents at the moment have to violate their own policies to some extent.

This is also illustrated by the tools used to update production.

By far the most popular tools used to update production apps are Unix shell commands. App servers are used more than I expected, considering the previous answers, but perhaps starting/stopping the servers was included in this category. Hudson is gathering popularity, and although not included in the chart, Maven and Ant were the top choices in the Other category.

This means that even though the production updates are fairly automated on average, the automation is mostly scripted manually and little out-of-the box support is provided.

Intermission

So what conclusions can I make from this data?

  • Container redeploys are indeed mostly broken, but for a much wider range of reasons than I supposed
  • The most popular way to update production, at least in the 50 servers or less deployments category, is to take all servers down in the off hours.
  • Lack of automation and downtime during updates is the biggest concern with the current processes and tools.
  • Command line utilities are the tools of choice for doing an update. This points to a lack of ready-made solutions in the area.
  • Although it is not directly visible in the charts, from analyzing the text of replies and looking at the (lack of) correlations between various data points it seems that there is a good amount of chaos in the area. There is a lack of easy-to-use and standardized terminology, processes and tools to support the update process. Everyone comes up with their own solution, and often terminology and roles.

In total the survey points out a range of technical issues with the current production update methods and tools.

Afterword

There are some solutions on the horizon that might help to make it better. I’ll describe here the two main categories, which I’ll call  Autobot and Decepticon.

Autobot solutions aim to automate rolling restarts with no downtime, making it easy as pie to do. They are still mostly in their infancy, but solutions like Puppet, Chef, RunDeck, DeployIt and JClouds are actively working in this direction. There are still a lot of challenges ahead, but they are moving at great strides.

On a side note, the ideal solution in that area would be the one employed by .NET and many dynamic languages, which isolate each app in a separate OS process. The almost-perfect isolation provided by the OS process model would mean that any app version can be terminated at any moment, without any lasting-side effects.

Unfortunately one issue the Autobots are still subject to is the migration of application and user state, which can be challenging and time-consuming. Another challenge is updating the database and other remote dependencies without downtime, which can be hard to achieve when the db/remote changes are incompatible with the current app version and so gradual transition isn’t possible.

The Decepticon solutions improve hot redeployment and at the moment are represented solely by ZeroTurnaround’s LiveRebel. Instead of creating a new instance of the app on every redeploy, LiveRebel applies the code, resource and configuration updates inside the app, preserving all state and avoiding side-effects. It also allows instant rollback on broken updates and will even automatically wait for the database or other remote updates to complete. The goal is to make small updates super cheap, severely decrease time-to-production for minor changes and fixes and is actually (unlike it’s cartoon counterpart) complimentary to the Autobots.

Updating live apps is currently quite challenging and will probably never be trivial, but both Autobots and Decepticons are on the way to a little blue planet near you and together they’ll be able to handle any danger threatening your production environment.

It’s hard to believe that it’s been more than 3 years since the JRebel 1.0 release. It seems like it was only  yesterday that we were putting together the first public beta, then opening the first champagne to celebrate when Nathan Hamblen bought the first license a week later. Thanks to him and all of you, our loyal users, today the JRebel team proudly presents the 4.0 release.

The major features are:

  • Full support for reloading changes to EJBs 3.x. Includes adding new components and adding @EJB references on-the-fly, across Weblogic, WebSphere, JBoss and Glassfish.
  • Support for anonymous class reloading. Previously, adding a new anonymous class would cause the other ones to be renamed (Class$3 -> Class$4) and JRebel would complain that a superclass has changed and fail to reload. Never again.
  • Instrumentation/HotSwap integration. Although JRebel always used a -javaagent to bootstrap, it hasn’t actually used the Instrumentation API before. Now, on Java 5 or later, we make use of this functionality to minimize the runtime performance overhead and to further improve the debugging behaviour. This also lays ground for some future improvements.
  • Full Seam 2.x support. Now you can add new components and wire them in on-the-fly. Enjoy!
  • Better integration across the board. Hibernate Validator and Spring Security are the biggest names, but we have severely expanded our test suite with support for 35 frameworks, not counting the server, standard and miscellaneous integrations.

And of course a score of smaller features and fixes as usual that you can find in changelog.

Well, what are you waiting for? Grab it now!

What began two years ago in the far northern country of Estonia as an attempt to re-invent production updates has culminated into the today’s release of LiveRebel 1.0. Already managing a multitude of production environments, it’s best described by the following tagline:

Java EE Hot Update Done Right. No downtime. No lost sessions. No OutOfMemoryErrors. Fully automated. Instant.

LiveRebel represents a quantum leap over all currently available technology for updating Java EE applications.

  • A fully scriptable server and web console that can manage single-node, clustered or cloud Java EE applications of any size on any container.
  • Versions each class and resource individually instead of reloading the whole application, avoiding the problems associated with container redeployment and rolling upgrades.
  • Roll out updates instantly and opaquely to the users. Code is updated in-place, preserving all existing state.
  • Uses an all-Java JVM plugin (-javaagent) on the nodes causing a 3-5% performance overhead.

In a recent survey we did, we saw that only 27.4% of over 600 respondents were satisfied with the update process of Java EE applications. The rest cited multiple issues with container redeployment, lack of tooling and automation as well as lack of industry-standard processes. With the release of LiveRebel 1.0 we can finally offer a solution to this industry-wide problem.

To learn more about LiveRebel you can see the 5-minute screencast, read the overview or just download the free 90-day evaluation. LiveRebel 1.0 is an annual subscription that costs $200 per JVM instance in pre-production and $600 per JVM instance per year in a production environment. For mission critical operations, please contact sales@zeroturnaround.com for a customized quote.

Hi guys!

Last week I asked you to help me find out how Java EE apps are updated in production. I’d like to thank everyone who replied to the survey so far. I’ve received just over 600 responses so far, and I’m hoping to get a few hundred more to be sure of the data. Here are some interesting things that came out:

  • Only 27.4% of respondents consider the update process ideal. The grievances include lack of tooling support, lack of automation, erratic behavior and many others.
  • Redeployment in production is allowed by as many as 24.2% of respondents, but only 12.2% use it as the primary means of application update. When asked for the reasons, 31.9% of respondents quoted the memory leaks and resulting OutOfMemoryErrors, whereas 46.8% quoted memory problems in addition to other issues with the redeployment.

To find out more today fill in the survey and you’ll see the running tally right away.

Thanks!

Last year while watching “The Social Network”, I couldn’t help but think that the world at large just doesn’t get what makes geeks and geek-trepreneurs tick. Aaron Sorkin wrote a great screenplay, but he had to invent both a non-existant love story and an unconfirmed desire to make it in the prestigious Harvard clubs to account for Zuckenberg’s willingness to spend nights and days hacking on Facebook.

Of course it’s been pointed out many times that there needs to be nothing more motivating for a geek than a chance to create something, earn the respect of his peers and the sheer ecstasy of delivering the first version to the world and seeing it being used, valued and eventually bought.

Something else just hit me last month. As I was putting together a job ad for a Technology Evangelist position, I had to figure out what to put there. And then I started thinking about the meaning of “Evangelist”. I’m from a generation and a country where I learned the meaning of the word in a technological context rather than a religion one. And then I couldn’t stop myself from finding more examples of borrowed religious terms for the startup/technology world, like visionary, cult following, spread the word and so on.

My feeling is that, for many, technology has replaced or supplemented the behavior and psychology usually associated with religion. The well-known drive to work 14 hours a day, sometimes unpaid, can be compared to the belief and piousness expressed by dedicated monks or nuns (with a very similar lack of social life).

The cult-like following that some tech leaders enjoy can be compared, well, with something like a group of  followers in a cult. I could go on, but the most interesting things about analogies are always the differences and the unexpected similarities.

I guess the biggest difference is that technology, unlike religion, is based on solid facts and science. Except not really :)

Nobody really knows why Facebook succeeded so much, where so many others have failed. Nobody has reproduced the magic of Apple products. Creating new companies, products and technologies is still largely an artform, with little in the way of solid guidance. And largely it’s the art of getting co-founders, investors and user to believe passionately in the product or technology you are building, which takes us back to religion again.

Another difference is that unlike religion, the technology world is not organized. While the Catholic Church still shepherds over a billion people, the startup world remains blissfully an independent smattering of random people, organizations and communities in different countries.

Another interesting question is: Will it stay this way forever?

In the last 10 years, a great number of networking organizations sprang to life right before my eyes. It is not unthinkable that as time goes by, they’ll become more and more organized, until like the Church they will shepherd without governing. Just an idea.

So what’s the last thing I want to leave you with? In any religion there are always people who are so pious and so full of vision, that they turn the world around. They call them prophets, in the startup world their role is filled by entrepreneurs.

I’m still looking for that Big Difference. What do you think it is?

At ZeroTurnaround, we like to build products that challenge the status quo.  We have a strong history in supporting Java development teams, but today we’re announcing a step towards supporting Java operations and production teams as well.  It goes like this:

A very common issue in software development is the risk that when you apply a change to a live application, somehow that change may break it – and create a major headache. As deployment in a Java cluster environment is often an expensive process, this commonly leads to lengthy testing and rigid processes that decrease this risk.

We envision managing this risk differently, and we’re now taking our ideas public: today we’re announcing the LiveRebel Public Beta.

LiveRebel is a web console, command line utility and REST API that can version multiple servers and deployed applications at the same time. Instead of the usual deployment process, LiveRebel versions classes and resources individually, allowing instant updates across the cluster while preserving all state, including user sessions. If you don’t like the change you made for any reason, you can roll back instantly.

Under the hood, LiveRebel uses RebelTech, first developed for JRebel (which has been tried, tested, and proven solid by over 10,000 customers), to version the application classes and resources directly. RebelTech allows code and resources to be updated in place, instead of starting a fresh version of the application side-by-side. Due to RebelTech’s integration with JVMs, application servers and frameworks, all application state and user sessions are preserved, without the slightest pause.

Since RebelTech has some limitations to updating classes and cannot create new state (e.g. new fields on existing objects will be null) we intend LiveRebel to be used for smaller updates and fixes. To ensure that every update is compatible, we bundle a sophisticated comparison tool that will ensure that you can roll out and roll back an update safely.

Overall, if you’re interested in updating your live application instantly, then read more about LiveRebel or try it right away.

Older Entries »

Join the Rebellion Facebook Twitter RSS feed