Metadata: Big Data's Secret Superpower

When I heard the President of the United States repeatedly saying the word “metadata” in a speech recently, I realized just how seamlessly the phrase had made it into our common vernacular. For someone who’s long worked with SCM (Software Configuration Management) technology, however, metadata has been a primary focus of mine for over a decade. That’s because SCM is about the creation, management, querying and archival of metadata about the changes in a software codebase.

What is metadata? I describe it as “data about data”. In SCM, we would talk of “integration credit” for a merge. This credit is actually data that’s created and stored about the merge of a file from one branch into another, a common software engineering task. I have likely performed thousands of merges during my career in software development, and each of them have created a piece of data recording my actions.

Note that this metadata does not store any of the content of the merged files. That’s what makes it “data about data”, i.e., metadata. Some industries even consider their SCM metadata a trade secret because it reveals the methods used to build something.

Enter Big Data. “Big Data” is not just “big” in terms of size-on-disk, but the massive breadth of data that’s typically accumulated from many different sources and made available in a single database.

It’s precisely this breadth of data that ignites Big Data’s secret superpower of metadata. That’s partly because metadata is often linked to the intent of an action and partly because interpreting straight data itself often requires specialized knowledge to parse and understand. Linking together metadata from a broad range of sources can reveal connections not otherwise possible, powering the Holy Grail that is Predictive Analytics.

We’re still at the beginning of Big Data’s disruption of all manner of markets and systems. Learning what data sources are valuable, discovering sources of data within a system and exporting them in real time, finding the types of new questions we can ask and have answered by Big Data, are all works in progress.

Here at WANdisco, we are hard at work paving the road ahead for the new paradigms and promise of Big Data. So we’re always interested in challenges faced by our present and future customers. What are your plans for using Big Data in your business?

