Solving the 3 biggest Hadoop challenges

A colleague recently pointed me to this great article on the 3 biggest Hadoop challenges. The article is written by Sean Suchter, the CEO of Pepperdata, and offers a practical perspective on how these challenges are seen and managed through workarounds.

Ultimate none of those workarounds are very satisfactory. Fortunately, Non-Stop Hadoop offers a compelling way to solve these challenges, either in whole or in part.

Resource contention due to mixed workloads and multi-tenancy environments

This problem seems to be the biggest driver of Hadoop challenges. Of the many workarounds Suchter discusses, all seem either manually intensive (tweaking Hadoop parameters for better performance) or limiting from a business perspective (gating production jobs or designing workflows to avoid bottlenecks).

As I’ve written before, the concept of a logical data lake with a unified HDFS namespace largely overcomes this challenge. Non-Stop Hadoop lets you set up multiple clusters at one or several locations, all sharing the same data – unless you choose to restrict the sharing through selective replication. Now you can run jobs on the most appropriate cluster (e.g. using high-memory nodes for in-memory processing) and avoid the worst of the resource contention.

Difficult troubleshooting

We all know the feeling of being under the gun while an important production system is offline. While the Hadoop ecosystem will surely mature in the coming years, Non-Stop Hadoop gives you built-in redundancy. Lose a NameNode? You’ve got 8 more. The whole cluster is shot? You’ve got two others that can fill in the gap…immediately.

Inefficient use of hardware

It’s really a tough problem: you need enough hardware to handle peak bursts of activity, but then a lot of it will sit idle during non-peak times. Non-Stop Hadoop gives you a clever solution: put your backup cluster to work. The backup cluster is effectively just an extension of the primary cluster when you use Non-Stop Hadoop. Point some jobs at the second cluster during periods of peak workload and you’ll have easy load balancing.

To borrow an analogy from the electric power industry, do you want to maintain expensive and inefficient peaker units for the two hours when the air-conditioning load is straining the grid? Or do you want to invest in distributed power setups like solar, wind, and neighborhood generation?

A better Hadoop

Non-Stop Hadoop is Hadoop…just better. Let’s solve your problems together.

0 Responses to “Solving the 3 biggest Hadoop challenges”


  • No Comments

Leave a Reply