Part II – Failures and Recoveries: DevOps and Traditional IT Ops Compared
In this section, we look to understand what causes failures to happen, how DevOps and traditional IT Ops teams compare when it comes to recovery time, and whether any of these teams bothers to test their recovery processes before initiating an urgent, mission-critical fix. Oh yeah, we said it!
A large minority (40%) of traditional IT Ops teams require more than one hour to fully recover from a failure, while only 22% of DevOps oriented teams require that much time. Additionally, 41% of DevOps oriented teams can recover from a failure within half an hour, while only 30% of tradtional IT Ops teams can get back in that time.
Other findings in this section:
The top 2 causes of production failures are software quality (64%) and human error (60%)
Although the most frequent alert method to production failures is an automated message to an on-call person (63%), the second most common is end user complaint to helpdesk (55%)
On average, app failure recoveries happen over 2x per month, but the median occurrence is 1x monthly.
DevOps oriented teams are nearly 2x more likely to recover from a failure in less than 10 minutes
Nearly 2/3 of all organizations do not test their recovery processes in advance to ensure that they work when needed
Shameless plug: ZeroTurnaround’s own LiveRebel automates whole app releases – code, database and conf – all in-sync, onto physical, virtual or cloud environments quickly, safely and consistently with no downtime or overhead. Releases with LiveRebel are versioned, automated, fully reversible and testable. It helps achieve the kind of automation DevOps calls for. Shameless plug plugged.