Big Data: The Aftermath

Memes and fads roll like waves throughout enterprise IT. Some live up to expectations. Others end up disappointing many of those who invested. The “Big Data” meme was exceptional in magnitude: exceptional investments, exceptional expectations, and — for many — an exceptional disappointment.

What’s interesting to me is that — despite that massive collective investment — that thirst to better understand data hasn’t been slaked in the least. If anything, people are thirstier than ever.

Where initial big data efforts were successful, they blossomed into robust data lakes used by specialists building process flows with rich toolchains. They’re obviously delivering business value, otherwise there’d be no investment.

But what about where these efforts didn’t live up to expectations? What can we learn?

You Had to Be There

Please take a moment to recall the unbridled enthusiasm around Big Data. Here was a new, geeky way to use data and do some pretty impressive things. There was much excitement in the air.

In some well-documented cases, the new insights led to new ways of making decisions. The HIPPO methodology was being driven out by inarguable data-driven conclusions that pointed to a different path. HIPPO stands for Highest Paid Person’s Opinion. It reminded me of “Revenge Of The Nerds”.

Fear and greed can be powerful motivators. A wave of FOMO developed — fear of missing out — which led to a seriously large number of “big data” environments being stood up as labs for data experimentation. A lot of these didn’t live up to expectations. Why?

My armchair observations:

Big Data wasn’t easy to do. Setting up the environment. Recruiting data specialists. Sourcing and moving data. Coming up with insights that people care about. This was a big hill for any IT group, even with a dedicated team.
For those that made the climb, the value of the new insights delivered tended to decrease over time. After a few big ones, it became more “meh” from a business perspective.
The type of insights delivered didn’t match with what business functions really wanted from the collective data available.

To summarize, those that were successful in climbing the initial Big Data hill usually saw some initial success, followed by a long period of increasing irrelevance, with more thirsty people than before.

I think that’s for a very specific reason.

Analyzing Vs. Connecting

Analytics and machine learning have been with us for a while. Math meets data. If you’ve ever sat through it, most of the math looks pretty straightforward. A good chunk of it can even be automated, e.g. determine the most relevant variables, best predictive model, etc.

Feeding math with useful data can be much harder, requiring the production of large (yet simple) well-cleansed and well-formed data sets. That’s where the bottleneck typically occurs — it’s in the supply chain.

But, with enough effort, progress can be made, and it’s been more than a few years. In a world that’s seen big data, data warehouses, data lakes, and data marts — why are many business people still so thirsty? IDC says that the industry will spend $250B on big data and analytics tech during 2021, so plenty of money is being spent.

My observation is simple — in many of these cases, these people desperately want to better exploit the connections between different sources of data, which can be a hard problem. Analytics don’t really do that. Once these people can better exploit connections, the whole process of using data more effectively — including analytics — becomes much, much easier.

Lots of simple examples. What customer information do I have across all these different sources? How can I connect it better to provide new insights? Build new applications? Better inform what I’m doing with analytics and machine learning?

Here’s the core idea: big data tries to discover non-intuitive connections from simplified data, e.g., let the data speak. Many business users want something quite different: the ability to better exploit the connections they already are aware of, usually across complex data.

This pattern shows up in a fascinating number of situations. Bioscience wants to connect everything they know about a therapy, or a condition. Insurance wants to deliver a great product yet minimize fraud. Financial services wants to get to all the customer-relevant information across multiple product lines. And on and on and on.

My take? The entire investment we make in analytics, machine learning, big data, data warehouses, data marts, data lakes — and really smart people — is all great. Up to a point, and then returns can diminish.

We can learn great things from simpler forms of data. We also can learn great things from more complex forms of data — but we’re going to have to think about things differently.

MarkLogic Semaphore

Chuck Hollis

Chuck joined the MarkLogic team in 2021, coming from Oracle as SVP Portfolio Management. Prior to Oracle, he was at VMware working on virtual storage. Chuck came to VMware after almost 20 years at EMC, working in a variety of field, product, and alliance leadership roles.

Chuck lives in Vero Beach, Florida with his wife and three dogs. He enjoys discussing the big ideas that are shaping the IT industry.