Following up on the recent blog about Hadoop Summit 2014, I wanted to share an update on the state of consensus-based replication (CBR) for HBase. As some of our readers might know, we are working on this technology directly in the Apache HBase project. As you may also know, we are big fans and proponents of strong consistency in distributed systems, however I think the phrase “strong consistency” is a made up tautology since anything else should not be called “consistency” at all.
When we first looked into availability in HBase we noticed that it relies heavily on the Zookeeper layer. There’s nothing wrong with ZK per se, but the way the implementation is done at the moment makes ZK an integral part of the HBase source code. This makes sense from a historical perspective, since ZK has been virtually the only technology to provide shared memory storage and distributed coordination capabilities for most of HBase’s lifespan. JINI, developed back in the day by Bill Joy, is worth mentioning in this regard, but I digress and will leave that discussion for another time.
The idea behind CBR is pretty simple: instead of trying to guarantee that all replicas of a node in the system are synced post-factum to an operation, such a system will coordinate the intent of an operation. If a consensus on the feasibility of an operation is reached, it will be applied by each node independently. If consensus is not reached, the operation simply won’t happen. That’s pretty much the whole philosophy.
Now, the details are more intricate, of course. We think that CBR is beneficial for any distributed system that requires strong consistency (learn more on the topic from the recent Michael Stonebraker interview  on Software Engineering Radio). In the Hadoop ecosystem it means that HDFS, HBase, and possibly other components can benefit from a common API to express the coordination semantics. Such an approach will help accommodate a variety of coordination engine (CE) implementations specifically tuned for network throughput, performance, or low-latency. Introducing this concept to HBase is somewhat more challenging, however, because unlike HDFS it doesn’t have a single HA architecture: the HMaster fail-over process relies solely on ZK, whereas HRegionServer recovery additionally depends on write-ahead log (WAL) splitting. Hence, before any meaningful progress on CBR can be made, we need to abstract most, if not all, concrete implementations of ZK-based functionality behind a well-defined set of interfaces. This will provide the ability to plug in alternative concrete CEs as the community sees fit.
Below you can find the slides from my recent talk at the HBase Birds of Feather session during Hadoop Summit  that covers the current state of development. References [2-5] will lead you directly to the ASF JIRA tickets that track the project’s progress.
- HBase Consensus BOF 2014
- Michael Stonebraker on distributed and parallel DBs