You’ll encounter the word “shared” often in the computing world, and I’ve begun to think that that word is sometimes a clue to finding SPOFs (Single Points of Failure). As you likely know, a SPOF is one thing that, if it breaks, brings down a whole system.
For Want of a Nail
“For Want of a Nail…” is a proverb with a rich history, which describes how the loss of a seemingly unlikely and unimportant thing can snowball into monumental consequences. I’d call the nail in the proverb a SPOF for losing the Kingdom. But as Jez Humble points out in his article “On Antifragility in Systems and Organizational Architecture“ commenting on Nassim Taleb’s book AntiFragile, it’s not always easy to recognize the SPOF, or how multiple redundant components together might collectively form a SPOF. Incidentally, a good example of that might be Netflix’s 2012 Christmas Eve outage. In this case, the whole of Amazon EC2 was and remains a Netflix single point of failure.
Smoke that leads to fire
Since WANdisco’s DConE technology operates without a single point of failure, bringing cloud-like capabilities to existing applications, I’m interested in where this capability can be put to good use. When hunting product opportunities, it’s always nice to have help knowing where to look. In this case, I think the word “shared” is a red flag for SPOFs, and is the smoke that helps lead to the fire of SPOF fragility.
I didn’t have to go far for an example, as the Hadoop NameNode, subject of WANdisco’s recent AltoStor acquisition, resides in a shared edits directory which is a single point of failure for a Hadoop deploy.
What examples can you find of “shared” unmasking a SPOF that needs to be addressed?