Good grief: Spark has barely hit a 1.0 release and already there are several projects vying to improve on it and perhaps be the next big thing. I think this is another sign that Spark is here to stay – everyone is focusing on how to beat it! In fact even the Berkeley lab that developed Spark has come up with an alternative that is supposedly a couple orders of magnitude faster than Spark for some types of machine learning.
The bigger lesson here for CIOs and data architects is that your Hadoop infrastructure has to be flexible enough to deploy the latest and greatest tools. Your ‘customers’ – data scientists, marketers, managers – will keep asking for faster processing time.
Of course here at WANdisco we’ve got some of the best minds in Big Data working on exactly this problem. Our principal scientists have been working on the innards of Hadoop almost since day one, and they’re evolving our Hadoop products to support very sophisticated deployments. For instance, Non-stop Hadoop lets you run several Hadoop clusters that share the same HDFS namespace but otherwise operate independently. That means you can allocate distinct clusters (or carve off part of a cluster) to run dedicated processing pipelines that might require a different hardware or job management profile to support low latency big data operations.
Sound interesting? It’s a fast-moving field and we’re ready to help!