Over the past few weeks I’ve been reviewing a number of case studies of real-world Hadoop use, including stories from name-brand companies in almost every major industry. One thing that impressed me is the number of applications that are providing operational data in near-real time, with Hadoop applications providing analysis that’s no more than an hour out of date. These aren’t just toy applications either – one case study discussed a major retailer that is analyzing pricing for more than 73 million items in response to marketing campaign effectiveness, web site trends, and even in-store customer behavior.
That’s quite a significant achievement. As recently as last year I often heard Hadoop described as an interesting technology for batch processing large volumes of data, but one for which the practical applications weren’t quite clear. It was still seen as a Silicon Valley technology in some circles.
This observation is backed up by two other trends in the Hadoop community right now. Companies like Revolution Analytics are making great strides in making the analytical tools more familiar to data scientists, while Spark is making those tools run faster. Second, vendors (including WANdisco) are focusing on Hadoop operational robustness – high availability, better cluster utilization, security, and so on. A couple of years ago you might have planned on a few hours of cluster downtime if something went wrong, but now the expectation is clearly that Hadoop clusters will get closer to nine-nines of reliability.
If you haven’t figured out your Hadoop strategy yet, or have concerns about operational reliability, be sure to give us a call. We’ve got some serious Hadoop expertise on staff.