Why 100% Availability is Critical for Big Data Applications

https://creativecommons.org/licenses/by-sa/2.0/ Marla LyJeremy Howard, the former president of Kaggle, opined recently that few people outside the machine learning space “have yet grasped how astonishingly quickly it’s progressing.”  This is no understatement. And as a Big Data company, WANdisco is right in the intellectual middle of this coming revolution.

No human could program an algorithm for a car to safely drive itself, there are simply too many edge cases. Only a learning computer can do this, and no human knows the complete algorithm. The computer actually learns how to drive by watching a human do it. Imagine this being repeated across any number of current activities that we today take for granted only humans can perform.

There’s a critical component of these machine learning robots that might be less flashy, but essential to their success: Big Data.  In the case of the Google driverless car, first an extremely detailed recreation of the world is built. The car must then only see the difference between what’s actually happening and its internal model. That’s where Big Data comes in: petabytes of data about the world and the ability to merge with a stream of incoming data in real time make this miracle work.

That’s also where these systems take a sharp turn from many computing systems of the past; this Big Data must always be available and working. Clearly a system that drives a car must be available more than 99.99% of the time. 99.99% uptime would mean approximately 8 seconds of failure for every 24 hours of driving, clearly not even close to acceptable.

Of course, computers have been critical components in cars for many years. But there’s a big difference between these computers and the machine learning, Big Data driverless car of today. Unlike an embedded system that is self contained in a controlled environment, today’s Big Data technology must work in the high failure environment of distributed systems.

As the inventor of Paxos, Leslie Lamport, defined it:

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”

Given this challenging environment, how does one obtain the kind of guaranteed availability that’s required for critically important functions such as driving a car? WANdisco’s core WAN-capable Paxos technology is the answer, removing single points of failure in existing technology and proving seamless redundancy with 100% data safety in high failure environments.

So while the future promises a Big Data driven revolution of new capabilities, those capabilities rely on systems that must always work. That’s Why WANdisco.

0 Responses to “Why 100% Availability is Critical for Big Data Applications”

  • No Comments

Leave a Reply