In Part 1 of this 3 part blog I’ll examine the “Oil is the new data” analogy. While the analogy is very true there are some fascinating differences. This blog addresses the difference in “Oil v data creation” and what it means for the evolution of the data market.
Many countries won the geological lottery and are blessed with tremendous amounts of oil. Others are less fortunate and this transferring of oil has been the basis of economic activity, and many wars, over the last 100 years.
But unlike oil, data is not a game of chance set in motion millions of years ago. Data is generated by everything and the ability to generate and capture it, versus hoping to find, is a key difference between data and oil.
With oil its estimated 80% of the worlds reserves are owned by governments or government controlled entities. This means that oil becomes an instrument of foreign policy as well as social development. The periodic constricting of oil from exporting to importing nations influenced how the world developed so, following the oil analogy, could governments or large companies turn off data spigots to influence others?
In the near term perhaps but longer term I think not. Data is infinitely more decentralized than oil and, unlike oil, can be created to address one’s needs. That means companies developing business models around data face less risk than when companies decided to base their business models around oil.
For example consider the airline industry. The availability of a lightweight energy source enabled flight but, as airlines wild swings of profitability proved, airlines continue to be at the mercy of oil prices which were often set by a handful of entities colluding or one state acting in a disruptive manner. Even to this day airlines options are fairly limited should someone decide to squeeze the oil supply.
However airlines generate their own data and passengers volunteer much more for free through loyalty programs. And should an airline decide new data elements are important they can formulate new ways to generate and capture that data. Today for an airline data is as important as oil but its less risky because, unlike oil, the airline controls much of its data creation. And for data it does not create internally there are a variety of markets to source it making oil-style collusion harder.
Another implication of oil being found, not created, is the importance of freshness. It takes approximately 90 days from oil pumping to consumption. But outside working capital tied up that is not a significant issue – oil spent millions of years forming so a few months for pumping, refining and transport are irrelevant. “The freshness” of oil is not a selling point and does not convey pricing implications.
Data is entirely different.
With data freshness matters a lot. In fact the speed at which data is generated, assessed and then utilized can help offset a scarcity of data. Having a lot of data is ideal but for many situations having a very rich, current, dataset is as valuable as a large data set.
So this brings us to another difference with oil – the value of the data will change as a factor of age.
In general fresher data is worth more and its value decreases as it ages. For many companies if they start capturing data today in a short time they will begin seeing the same advantages as their competitors who started before them. Its possible to fairly quickly offset the value of a “long” data set by having a “current and feature rich” data set.
Consider the data needed to develop self-driving cars. Google started generating data from their autonomous driving program well before Tesla burst on the scene.
But who has better data today?
Likely Tesla because they are creating and capturing their own data from their vehicles distributed across thousands of drivers.
To be blunt — whose dataset would you prefer — Google’s “longer” dataset which continues to be refreshed by a limited number of cars or Tesla’s “shorter”dataset which is refreshed daily by thousands of drivers?
Clearly whose data is more valuable changes quickly thanks to the ability to ‘make fresher data’. While Google has street level imagery and autonomous driving data dating back many years Tesla’s data value probably passed Google’s shortly after the launch of the Model S.
However it also important to note that, because some data will be lost, the value of data we retain will eventually begin to increase again. So there is some hope for Google! But in general newer data is more valuable data and its value decreases from its moment of creation until its peer datasets become scarce.
(Why information will be lost even in the digital age will be a future blog post. As a preview Caesar Hidalgo’s excellent book “Why Information Grows, The evolution of order from Atoms to Economies” explains it)
The final implication of data being created is how the oil ecosystem works versus a data ecosystem. Many entities strive to be data’s equivalents of oil’s drillers, refiners and distributors. But in the age of data they will not evolve as their oil peers did.
In the oil analogy ecosystem participants derived great power from the scarcity of oil sources, the financial might needed to build refineries and the scale needed to be a global distributor. But in the data world each company can be its own fully integrated “oil” (data) company. Everyone is their own driller, refiner, distributor and potentially even trader!
On the surface it would seem Facebook, Amazon and others are well on their way to being the “Standard Oil” of our data age — they have a huge lead in data and use it well so the only way to stop them will be through strong government action.
But the points raised in this blog could well be the reasons we do not see the same antitrust activity as did the oil age. Here is what might keep today’s leading data companies in check:
- Companies can produce their own data
- There are a variety of data sources minimizing “single control points”
- In many cases data freshness can overcome historical data size
- The same insight can often be derived from different data sources
- New generation & capture methods will disrupt today’s leaders (Google v Tesla)
- Some government regulations are emerging early as data importance is realized
Is it possible we’ll see increased legal action to control how companies use their data? We’re already seeing examples of that as Google and the EU discuss placing ads in search results. So there is a chance of “fair access” clauses emerging or even complete decoupling of large companies’ data operations from their other business activities.
But in addition to regulation keeping data fairly distributed we’re equally likely to see companies of all sizes become far more sophisticated in how they diversify their data sources. While today we seem on a collision course to a few large data companies we may yet pivot to a broad set of companies where the diversity of marketplace data sources coupled with companies internal data creation prevent data monopolies.
Consider the following examples:
- People are pushing data as an “open source asset” — Kaggle, data.world
- Data marketplaces allow companies sell their data and buy what they need — BDex
- Consumers may begin realizing their data’s value and monetize it themselves — Datacoup
Another important trend will the evolution of “trust”. How consumers and companies manage and revoke their data consents will be a future blog topic and a great control point too. As a teaser — a company losing “marketplace trust” and having to devalue / discontinue use of data investments will be far worse than the “non cash goodwill writeoffs” we’ve seen in the past.
So in summary, data is created, not found. That means:
- Unlike oil, data’s origination points are diverse and expanding
- Everyone can be in the data discovery, refining and distribution business
- Data freshness matters and can offset someone else’s data lead
- Data monopolies are a real risk but will be harder to form than oil monopolies
- The data marketplace is just starting to form but it will minimize monopolies