Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

Your boring, old data is actually sexier than your newest technology [guest post]

Sometimes, we meet another technology blogger that just seems to fit so well with RebelLabs, that we ask if we can re-post their original content on our site for the benefit of our readers! One of our newest friends is Lukas Eder, formerly of jOOQ and now founder/CEO of Data Geekery. We wanted to share the original post from the jOOQ blog Why Your Boring Data Will Outlast Your Sexy New Technology with you, and see what you think. AGAIN: Lukas is the original author of this content, and we’re re-posting it with his permission.


Screen Shot 2014-02-19 at 11.00.11 AM

So you’re playing around with all those sexy new technologies, enjoying yourself, getting inspiration from state-of-the-art closure / lambda / monads and other concepts-du-jour

Now that I have your attention provoking a little anger / smirk / indifference, let’s think about the following. I’ve recently revisited a great article by Ken Downs written in 2010. There’s an excellent observation in the middle.

[...] in fact there are two things that are truly real for a developer:

  • The users, who create the paycheck, and
  • The data, which those users seemed to think was supposed to be correct 100% of the time.

He says this in the context of a sophisticated rant against ORMs. It can also be seen as a rant against any recent abstraction or database technology with a mixture of nostalgia and cynicism. But the essence of his article lies in another observation that I’ve made with so many software systems:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Having said so, we can quickly understand why Java has become the new COBOL to some, who are bored with its non-sexiness (specifically in the JEE corner). Yet, according to TIOBE, Java and C are still the most widely used programming languages in the market and many systems are thus implemented in Java.

On the other hand, many inspiring new technologies have emerged in the last couple of years, also in the JVM field. A nice overview is given in ZeroTurnaround’s The Adventurous Developer’s Guide to JVM Languages. And all of that is fine to a certain extent. As Bruno Borges from Oracle recently and adequately put it:

Anything not mainstream has more odds to be “sexy” [than JSF]

Now, let’s map this observation back to a subsequent section of Ken’s article:

[...] the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user [...] and takes instructions back from the user and puts them in the database [...]. No matter how beautiful the code was, the user would only ever see the screen [...] and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.

Think about an E-Banking system. None of your users really cares about the code you wrote to get it running. What they care about is their data (i.e. their money) and the fact that it is correct. This is true for many many systems where the business logic and UI layers can be easily replaced with fancy new technology, whereas the data stays around.

In other words, you are free to choose whatever sexy new UI technology you like as long as it helps your users get access to their data.

So what about sexy new database technology?

Now, that’s an entirely different story, compared to sexy new UI technology.

You might be considering some NoSQL-solution-du-jour to “persist” your data, because it’s so easy and because it costs so much less. Granted, the cost factor may seem very tempting at first. But have you considered the fact that:

  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Once your data goes into that NoSQL store, it may stay there much longer than you had wanted it to. Your developers and architects (who originally chose this particular NoSQL solution) may have left long ago. Parts of your system may have been replaced, too, because now you’re doing everything in HTML5. JavaScript is the new UI technology.

And all this time, you have “persisted” UI / user / domain model data in your database, from various systems, from various developers, through various programming paradigms. And then, you realise:

We’re not saying that there aren’t some use-cases where NoSQL databases really provide you with a better solution than the “legacy” alternatives. Most specifically, graph databases solve a problem that no RDBMS has really solved well, yet.

But consider one thing. You will have to migrate your data. Time and again. Youwill have to archive it. And maybe, migrate the archive. And maybe provide reports of the archive. And provide correct reports (which means: ACID!) And be transactional (which means: ACID!) And all the other things that people do with data.

In fact, your system will grow like any other system ever did before and it willhave high demands from your database. While some NoSQL databases have started to get some decent traction, in a way that it is safe to say their vendorsmight still be around in 5-10 years, when the systems will have been replaced by developers who have replaced other developers.

In other words, there is one more bullet to this list:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it
  • Data might tend to outlast the vendors providing tools for operating the data

Beware of the fact that data will probably outlast your sexy new technology. So, choose wisely when thinking about sexy new technology to operate your data.


Thanks to Lukas for this great post, you can contact him on Twitter or GitHub or just search his name and see where he is.

 

  • David Leppik

    I totally agree, and yet I don’t quite agree. Yes, data will (usually) outlast the code and the developer– and even when the data doesn’t, the new data will by necessity be structured in response to the old data model. And the data will need to be migrated. Frequently. And usually painfully. Hell, even going from MySQL to PostgreSQL is painful, once you’ve built up a meaningful amount of code. (I speak from experience.)

    But that isn’t necessarily an argument for ACID and/or SQL or any one technology. It’s an argument for having a single repository (or data flow) for your source of truth. And it’s especially an argument for having tools that make migrating and transforming data safe and easy.

    (Okay, I realize that I just agreed with your main point: namely that you have to think rather than choose the most trendy technology.)

    Here’s an example from my company (www.vocalabs.com). We get data from our clients, in whatever format they choose, from XML to Excel to formatted email. They are the ultimate source of truth. We enter that data into a PostgreSQL database. That is the practical source of truth, and it’s ACID. Our online reports use a custom read-only, in-memory OLAP database generated several times an hour from PostgreSQL queries.

    The fact that our reporting database is read-only means it isn’t slowed down by locks to support ACID. And it’s more consistent than an SQL database. To generate a report page, we’re doing dozens of unrelated queries. You can’t afford to lock the highest-traffic tables in the database every time someone loads a report page, nor can you have inconsistent data within a single report page. And there isn’t a single SQL database that will give you full ACID and high performance aggregation queries at the same time. Not to mention that even the fastest database is limited by the 10ms it takes for the query to travel over the LAN. Multiply that by dozens of queries in a live report page, and you’ve just busted your response time target.

    The point is, even for a small company like Vocalabs, you need multiple databases to handle multiple use cases. You need appropriate consistency within each database– and you need to know what your source of truth is.

    And if you have data that really will outlast you, then the source of truth is some combination of a live database and its backups. Because there will be a failure every few decades. (Okay, so we’re going from “10ms is too slow” to multiple decade retention– just proof that old data really is sexy.)

    Where all of this leads is that you need good tools for safely and reliably transforming data, so you’re not scared to use the right database for each job. Which is why when I recently started an open source OLAP project ( https://github.com/dleppik/ozone/ ), I didn’t start with the sexy stuff (e.g. bitmaps), but with the data transformation model.

  • balder

    Thus, the only thing that keeps ‘standing’ is data (and hopefully the users) Hence:
    the only sexy thing is data! (bulletpoint 5 :-)