I left Hadoop Summit last month very excited to the see traction the market is having. The number of Hadoop vendors, practitioners and customers continues to grow and the knowledge about the technology continues to deepen.
One of the key areas of discussions on the trade show floor was the limitation and design of the namenode.
In Apache Hadoop 1.x, the namenode was a single point of failure (SPoF).
This SPoF has become such a significant issue, the community has accelerated the need to find ways to mitigate against this earlier design choice. While the community has started to develop a solution which addresses earlier attempts, the overall system is still what we call an active-passive implementation.
Active-passive solutions have been around for many years and were designed to provide recovery where disruption of services was resolved at a different layer of the stack. For example, active-passive security solutions like firewalls have traditionally been deployed with a primary and a standby unit. In the event the primary failed, the secondary would recover and clients communications (TCP) would retry and retransmit until a connection could be established. With services like HTTP, these active-passive solutions are sufficient and widely deployed.
However, when we start to discuss components of an architecture that are key to availability and access, a new term starts to emerge; Continuous Availability.
In the past, active-passive solutions could solve what the industry has accepted as “Highly Available” solutions. However, today’s architectures and technologies are evolving and have shifted to a new need, which is being described as “Continuously Available”.
One area we found ourself explaining was the difference in our 100% Continuous Availability™ solution for HDFS, compared to the design changes being implemented in the Apache Hadoop 2.x. branch. As you can see from the references in the Apache documentation, the new Quorum Journal Manager is an active-passive solution.
“This guide discusses how to configure and use HDFS HA using the Quorum Journal Manager (QJM) to share edit logs between the Active and Standby NameNodes.” REF: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
This is the main difference between open-source HA and the technology developed by WANdisco. WANdisco’s Non-Stop NameNode is an Active-Active solution built using a widely agreed family of protocols, known as Paxos, for solving consensus in a network of unreliable processors.
WANdisco’s implementation, known as DConE is the core IP used to ensure 100% uptime of the Apache Hadoop 2.0 namenode processes and therefore provides continuous availability and access to HDFS during planned and unplanned outages and critical infrastructures.
After spending two days speaking to many attendees, it became very clear to me that WANdisco’s strength in its DConE technology and how it has been applied to the Apache Hadoop namenode are not trivial.
We had the opportunity to talk with representatives from some of the largest Web 2.0 in the Silicon Valley, including Yahoo, LinkedIn, Facebook and Ebay. Being able to demonstrate our active-active solution to key industry technologists, architects and Hadoop developers was the highlight for our team and we are excited about the official release of our WAN solution.