Looking back at the past 10 years of software the word ‘boring’ comes to mind. The buzzwords were things like ‘web services’, ‘SOA’. CIO’s loved the promise of these things but they could not deliver. The idea of build once and reuse everywhere really was the ‘nirvana’.
Well it now seems like we can do all of that stuff.
As I’ve said before Big Data is not a great name because it implies that all we are talking about a big database with tons of data. Actually that’s only part of the story. Hadoop is the new enterprise applications platform. The key word there is platform. If you could have a single general-purpose data store that could service ‘n’ applications then the whole of notion of database design is over. Think about the new breed of apps on a cell phone, the social media platforms and web search engines. Most of these do this today. Storing data in a general purpose, non-specific data store and then used by a wide variety of applications. The new phrase for this data store is a ‘data lake’ implying a large quantum of every growing and changing data stored without any specific structure
Talking to a variety of CIOs recently they are very excited by the prospect of both amalgamating data so it can be used and also bringing into play data that previously could not be used. Unstructured data in a wide variety of formats like word documents and PDF files. This also means the barriers to entry are low. Many people believe that adopting Hadoop requires a massive re-skilling of the workforce. It does but not in the way most people think. Actually getting the data into Hadoop is the easy bit (‘data ingestion‘ is the new buzz-word). It’s not like the old relational database days where you first had to model the data using data normalization techniques and then use ETL to make the data in usable format. With a data lake you simply set up a server cluster and load the data. Creating a data model and using ETL is simply not required.
The real transformation and re-skilling is in application development. Applications are moving to data – today in a client-server world it’s the other way around. We have seen this type of reskilling before like moving from Cobol to object oriented programming.
In the same way that client-server technology disrupted mainframe computer systems, big data will disrupt client-server. We’re already seeing this in the market today. It’s no surprise that the most successful companies in the world today (Google, Amazon, Facebook, etc.) are all actually big data companies. This isn’t a ‘might be’ it’s already happened.