You may have heard that “data is the new oil”. You might also sense a bit of marketing hoopla in this phrase. There are a few ways to define “big data” but none of them fits with this analogy. My definition comes from my experience in transforming disparate data sets into business intelligence for the purpose of enabling an organization to accomplish its tasks. The data by itself is useless. Data is the medium from which value is extracted, not the value itself. In this view it’s more accurate to say that data is the new dirt.
It is true that data needs to be transformed just like oil needs to be refined, but the key difference is that oil is a finite resource and data is infinite. The scarcity of oil determines its value. Oil in raw form has value. A barrel of crude is worth about US$60 today (I would’ve guessed much higher!) but raw data is worthless. In fact it’s less than worthless. It cuts into the bottom line.
Imagine you’re a CIO of a big organization. You’re standing in a state-of-the-art data center humming away with endless rows of sleek cabinets packed with the latest server hardware, each hosting tens of thousands of virtual machines running database server software on untold petabytes of storage. The lighting is low and the place pulses with high-tech power. It’s all very bad-ass. But as you walk down the center aisle you approach a wall where there’s a huge LED display with a seven-digit number spinning out of control – an amount representing the net cost in millions of dollars per year spent storing all this data Your gut tightens as you fathom the volumes of data flooding the data center and the costs spinning straight to hell. Those numbers are burning into your retinas as you stare up at the LED. Your face is about to melt off like the Nazis in the climactic scene in Raiders of the Lost Ark. But you make your wisdom saving throw and recover just in time, calling HR and telling them to hire some data professionals now.
The point is it’s not the data that’s valuable; it’s truth, and these days there’s a scarcity of truth. The number of things into which oil can be refined is limited. On the other hand data can be molded into any information that is somewhere on the scale between insanely valuable and totally useless. Transforming data must be a flexible, adaptive process or its value is never realized. Somewhere in this mountain of data are facts, the rarest nuggets in the world.
Admittedly, my professional view spells big data with a little “b”. What I work with is nothing compared to the vast oceans of Big Data processed by Silicon Valley powerhouses and the internet of things. Big Data in this sense may very well be the new oil for a handful of tech companies, but anyone who has flown from Houston to Galveston knows the impact oil refinery can have on the environment. The data centers that store all this data use a lot of juice, the production of which also affects climate. Social Media also outputs a massive amount of social pollution unchecked.
Looking at the bigger picture, “Data is the New Oil” can also infer that everyone’s data is valuable and every individual should be getting rich, too. In the future there will be stronger, well-defined data rights to support this, but for now it’s delusional and dangerous thinking, a point made in the excellent blog post, “Data isn’t the new oil, it’s the new CO2,” by Martin Martin Tisné, managing director at Luminate, a philanthropic organization I follow. I plan on writing more about Luminate in the future, as well as Mr. Tisné’s article in the MIT Technology Review, “It’s time for a Bill of Data Rights”.