Gartner just released a new research note on comparing Hadoop distributions. Although the note itself is behind a paywall, some of the key findings are posted openly. And I find it very interesting that when Gartner shares its thoughts on Hadoop architecture and distributions, they tend to focus much more on the big picture of how to design the best Hadoop for your business.
The item that stood out most was the finding that Hadoop is becoming the default cluster management solution. YARN really changed the focus of Hadoop from a batch processing system to a general purpose platform for large scale data management and computation. The Hadoop ecosystem is evolving so quickly that it can be frightening, but you do get some ‘future proofing’ as well – whenever the next big thing comes along, chances are it will run on Hadoop, just like Spark does.
On a related note, Gartner also recommends focusing on your ideal architecture rather than on the nuts-and-bolts of any particular distribution. That’s just good sense; if you know what you want to do with your data, chances are Hadoop is now mature enough to accommodate you. And of course, WANdisco provides some clever solutions to help all of those Hadoop clusters work better together.
Anyway, the research note is a nice read, particularly if you’re feeling overwhelmed by how complicated Hadoop is getting.