Top 5 lists are always fun, and here’s another top 5 list on Hadoop. It’s fairly familiar to anyone who follows the space, but it does highlight a few important trends. A few comments and quibbles:
- The fact that open source is the foundation of Big Data software shouldn’t be surprising even to the government anymore. After all, even the secretive NSA has publicly acknowledged use of Hadoop.
- The only controversial claim is that Hadoop is set to replace Enterprise Data Warehouses (EDWs). I’ve heard a lot of arguments for and against that point over the last year. It seems the Hadoop will at least complement EDWs and allow them to be used more efficiently, but complete replacement will depend on Hadoop maturing in a couple of key areas. First, it will have to handle low-latency queries more efficiently. Second, it will have to be as reliable and flexible as mature EDWs. Keep an eye on projects like Apache Spark and, of course, Non-stop Hadoop in this area.
- I agree that the Internet of Things (IoT) will be a new and important source of data for Hadoop in the future. However, just a point of terminology: no one will “embed Hadoop” into small devices. Rather, data from these devices will be streamed into Hadoop.
- Siri and the other smart assistants like Cortana are making waves, but IBM’s Watson seems to be years ahead in terms of analyzing complex unstructured situations. Watson does use Hadoop for distributed processing but it has a much different paradigm than traditional MapReduce processing, and it needs to store a good chunk of its data in RAM. That’s another sign that the brightest future for Hadoop will require new and exciting analytics frameworks.