The recent cyberattack on Sony’s network was a CIO’s nightmare come true. The Wall Street Journal had a good summary of some of the initial findings and recommendations. One of the important points was that data integration, although a huge win for productivity, increases the exposure from a single security breach.
That started me thinking about the use of isolated Hadoop security tiers in Hadoop clusters. I’m as excited as anyone by the prospect of Hadoop data lakes; in general, the more data you have available, the happier your data scientists will be. When it comes to your most sensitive data, however, it may be worth protecting with greater rigor.
Hadoop security has come a long way in recent releases, with better integration with Kerberos and more powerful role-based controls, but there is no substitute for the protection that comes with isolating sensitive data on a separate cluster.
But how do you do that and still allow privileged users full access to the entire set of data for analysis? Non-stop Hadoop offers the answer: you can share the HDFS namespace across the less secure and more secure clusters, and use selective replication to ensure that the sensitive data never moves into the less secure cluster. The picture below illustrates the concept.
Users on the ‘open’ cluster can only see the generally available data. Users on the ‘secure’ cluster can access all of the data.
Feel free to get in touch if you have questions about how to add this extra layer of defense into your Hadoop infrastructure.