Open Source and Analytics

A recent Information Week article predicted that open source analytics tools would continue to gain ground over commercial competitors in 2014 in the Big Data arena. That may seem surprising. After all, you’ve made an investment in moving some unwieldy data into Hadoop.  Why not start to hook up your traditional data analytics and business intelligence tools?

To see why this prediction makes sense, let’s review some of the advantages of Hadoop Big Data infrastructure:

  • Cost efficiency: Hadoop’s storage costs per terabyte are about one-fifth to one-twentieth the cost of legacy enterprise data warehouse (EDW) solutions. Once you have a Hadoop cluster up and running, scaling it out is economical.

  • Visibility: Hadoop lets you store, manage, and analyze wildly disparate data sets with no penalty. Silos that existed due to storage costs or technical incompatibility start to disappear.

  • Future proofing: Hadoop is an open platform with a vibrant community. There’s no risk of lock-in to obsolete tools and vendors.

These same reasons explain why open analysis platforms will continue to see wide adoption.

First, let’s consider cost efficiency and visibility. You’ll find that both tools and talent are more affordable and easier to find when you use open platforms, which means you’ll have a lot more people looking for the gems in your data.

Recall that one feature of Big Data is that you probably don’t know how you’re going to use all of the data you collect in the future. In other words, you don’t know now what questions you’ll be asking next year. You need to unleash your analysts and data scientists to explore this data, and open analysis platforms have a much lower cost barrier than commercial tools. Any budding data scientists can get started without consuming scarce licenses.

Finally, the next generation of data scientists will be trained on open platforms like R. R is gaining traction rapidly and is the key tool in a new data science MOOC offered by Johns Hopkins. Not only will recruiting be easier, but anyone on your team who needs to start working with data can acquire some basic skills easily. Visibility matters: after all, if data is stored in Hadoop and no one is there to analyze it, why bother?


Now getting back to future proofing, data science is a rapidly evolving field.  New tools and methods are springing up almost every day.  Much of that research is being done and published in open platforms like R.  You’ll be able to take advantage of that cutting edge knowledge without having to wait for a vendor to support it in a closed framework.

Embracing this wave of open source analytics tools will help you start to see real ROI from your Big Data investment.

0 Responses to “Open Source and Analytics”

  • No Comments

Leave a Reply