In general a trade show is a dangerous place to gauge sentiment. Full of marketing & sales, backslapping & handshakes and marketecture rather than architecture the world is indeed viewed through rose-tinted-spectacles. Strata, the Hadoop Big Data conference in New York last week was very interesting albeit through my rose-tinted-spectacles.
Firstly, the sheer volume of people, over 3,500 is telling. This show used to be a few hundred, primarily techies inventing the future. The show is now bigger, much bigger. A cursory glance at the exhibit hall revealed a mix of the biggest tech companies and hot start-ups. The keynotes, to the disappointment of those original techies, were primarily press-driven product releases lacking real technical substance. This is not such a bad thing though. It’s a sign that Hadoop is coming of age. It’s what happens when technology moves into the main stream.
Second, the agenda has changed quite dramatically. Companies looking to deploy Hadoop are no longer trying to figure out how it might fit into their data centers. They are trying to figure out how to deploy it. 2014 will indeed be the end of trials and the beginning of full-scale enterprise roll-out. The use-cases are all over the place. Analysts yearn for clues and clusters to explain this “Are you seeing mainly telco’s or financial services?” Analysts of course must try to enumerate in order to explain but the wave and shift is seismic and the only explanation is a fundamental shift in the very nature of enterprise applications.
My third theme is the discussion around why Hadoop is driving this move to rewrite enterprise applications. As someone at the show told me, “the average age of enterprise application is 19 years”. Hence,this is part of a classic business cycle. Hadoop is a major technological shift that takes advantage of dramatic changes in the capabilities and economics of hardware. Expensive spinning hard-disk, processing speeds, bandwidth, networks, etc. were limitations and hence assumptions that the last generation of enterprise applications had to deal with. Commodity hardware and massive in-memory processing are the new assumptions that Hadoop takes advantage of. In a few years we will not be talking about ‘Big Data’ we will simply use the term ‘Data’ because it will no longer be unusual for it to be so large in relative terms.
My fourth observation was that Hadoop 2 has changed the agenda for the type of use case. In very rough terms Hadoop 1 was primarily about storage and batch processing. Hadoop 2 is about yarn and run-time applications. In other words processing can now take place on top of Hadoop rather than storing in Hadoop but processing somewhere else. This change is highly disruptive because it means that software vendors cannot rely on customers to use their products in conjunction with Hadoop. Rather, they are talking about building on top of Hadoop. To them Hadoop is a new type of operating system. This disruption is very good news for the new brand of companies that are building pure applications built from the ground up and really bad news for those who believe that they can mildly integrate or even store data in 2 places. That’s not going to happen. Some of the traditional companies had a token presence at Strata that suggests they are still unsure of exactly what they are going to do – they are neither fully embracing or ignoring this new trend.
My final observation is about confusion. There’s a lot of money at stake here so naturally everyone wants a piece of the action. There’s a lot of flashing lights and noise from vendors, lavish claims and a lack of substance. Forking core open source is nearly always a disaster. As open-source guru Karl Fogel says ‘forks happen due to irreconcilable disagreements, technical disagreements or interpersonal conflicts and is something developers should be afraid of and try to avoid it in any way’. It creates natural barriers to use tertiary products and with an open source project moving as quickly as this, one has to stay super-close to the de facto open source project.
A forked version of core Hadoop is not Hadoop, it’s something else. If customers go down a forked path it’s difficult to get back and they will lose competitive edge because they will be unable to use the community of products being built as part of the wider community. Customers should think of Hadoop like an operating system or database. If it’s merely embedded and heavily modified then this is not Hadoop.
So 2014 it is then. As the Wall St Journal put it the Elephant in the Room to Weigh on Growth for Oracle, Teradata