Hortonworks, a WANdisco partner and another member of the Open Data Platform, recently published a list of best practices for Hadoop infrastructure management. One of the top recommendations is configuring multiple zones in Hadoop. Having development, test, and production environments gives you a safe way to test upgrades and new applications without disturbing a production system.
One of the challenges with creating multiple similar zones is sharing data between them. Whether you’re testing backup procedures and application functionality, or prototyping a new data analysis algorithm, you need to see similar data in all the zones. Otherwise you’re not really testing in a production-like environment.
But in a large cluster transferring terabytes of data around between zones can be time consuming and it’s tough to tell how stale the data really is. That’s where WANdisco Fusion becomes an essential part of your operational toolkit. WANdisco Fusion provides active-active data replication between Hadoop clusters. You can use it to effectively share part of your Hadoop data between dev/test/prod zones in real-time. All of the zones can make full use of the data, although you can of course use your normal access control system to prevent updates from certain zones.
DevOps principles are coming to Hadoop, so contact one of our solutions architects today to see how WANdisco Fusion can help you maintain multiple zones in your Hadoop deployment.