Busting Myths About Data
Written by CK Bhatia, Chief Data and Integrations Officer
"Data is the new oil." Stop me if you have heard the phrase. I wince every time I hear this overused, overblown, and overhyped statement. (No, I am not over it!). How dare anyone compare data with a mere commodity like oil.
Sure, like oil, data needs to be refined and processed into information before it has any value. Similar to the negative environmental impact of oil extraction, spills, and pollution, there can be serious consequences from the improper use of data. Security, privacy, and ethical usage of data are serious concerns. But that is where the similarities start to diverge.
Oil is a finite source. It will eventually run out, while we are going to continue to generate exabytes of data forever. Systems and applications will continue to evolve and be replaced. However, the data generated by these systems will continue to exist.
Oil has a variety of uses, but compared to data, it is fairly limited. Countries around the world are trying to reduce their dependency on oil. We have barely begun to utilize data effectively. We are just beginning to see effective personalization by building better predictive models and natural language processing. We have yet to apply it successfully for autonomous driving, personalized medicine, and use cases that we haven’t even thought of yet! We are only going to increase our dependency on data.
The other statement that we hear all the time is, “Data is an asset.”
In my opinion, data is a liability. Think about it. Data needs to be captured, keyed into some application, stored, backed up, secured, and protected, all before you can do anything with it. We need to extract raw data (like oil), convert, and refine it (like oil) into useful Information! Thus, information is the real asset.
For example, I present the number 27 as raw data. But it could mean any number of things. Until the context is provided, the raw data is useless. What does it mean? (No, stop trying to flatter me, it’s not my age).
Raw data needs an associated context (sometimes called metadata), and then it becomes useful information.
How will you use raw data to fuel innovation?
(By the way, 27 is the number of World Series titles my beloved Yankees have won!)