I was lucky to be at the Hadoop Summit in San Jose last week, splitting my time between the WANdisco booth and attending some of the meetups and sessions. I didn’t keep a conference diary, but here are a few quick impressions:
1. Hadoop seems to be maturing operationally. Between WANdisco’s solutions for making key parts of Hadoop failure-proof and the core improvements coming in Hadoop 2.4.0 and beyond, the community is focusing a lot of effort on uptime and resiliency.
2. Security is still an open question. Although technologies like Knox and Kerberos integration provide good gateway and authentication support, there is no standard approach for more granular authorization. This was a consistent theme in several presentations including a case study from Booz Allen.
3. Making analytics faster and more accessible will receive a lot of attention this year. Hive 0.13 is showing dramatic performance improvements; Microsoft showed a demonstration of accessing Hadoop data through Excel Power Queries, there are several initiatives to make R run better in the Hadoop sphere – the list goes on and on.
4. The power of active-active replication continues to surprise people. Almost everyone I talked to at our booth kept asking the same questions: “So this is like a standby NameNode? You have a hot backup? It’s kind of like distcp?” No, no, and no – WANdisco’s NonStop NameNode lets you run several active (fully writable) NameNodes at once, even in different data centers, as part of a unified HDFS namespace. (If you haven’t read our product briefs, that gives you a full HA/DR solution with better performance and much better utilization of that secondary data center. Better yet, skip the product briefs and just ask us for a demo.)
5. Beach balls are a fantastic giveaway. Kudos to our marketing team. 🙂
See you at the next one!