Popular memes in our corner of the data world are data fabrics and data meshes. The motivation is clear – can we connect all this data in some way that people might find more useful?
Gartner has the most formal definition of data fabric. It does a reasonable job of describing why you might want one – this is why you want to connect data in a more useful way – but doesn’t yet come with an instruction manual on how one might go about actually building and using one.
I also search occasionally to see who’s talking about what. There’s a nice cluster of startups who are tackling this from different angles. Growing investment capital means growing interest.
Let’s agree that connecting data in more useful ways is a good thing – what’s left is to figure out the best way to do that. As we’ve been collectively connecting data together for many decades, obviously people aren’t satisfied with the current approaches.
From my perspective, it’s pretty clear as to why this is, with more on that later.
We at MarkLogic have clear evidence that the most useful way to connect complex data is by using active metadata. We are not alone in that regard. If it helps, think of MarkLogic as a metadata-centric database, different from all others in that regard.
Data and everything we know about it is kept together.
If we step back from the jargon for a moment, it helps to understand what’s really going on here.
We have groups of smart people trying to reason over data. They can share knowledge informally, but that approach doesn’t scale well. At some point, there is a strong motivation to formalize what is known, e.g. create reusable organizational knowledge.
The metadata-centric approach (metadata management, ontologies, knowledge graphs, etc.) tries to formalize that knowledge.
Unfortunately, these representations are often disconnected from the data that created the knowledge, and end up being of limited use.
If we were describing the situation in the physical world, we’d say that our packages didn’t have labels, or they have been separated along the way. What we’d really like is to have data and everything we know about it as one entity, and not scattered everywhere.
Data management people often think a lot about data (the physical packages), but don’t really think much about metadata (labels) and what metadata really needs to be able to do in order to be truly useful.
In reality, metadata is the stuff that encodes human knowledge about the data, so its role is much more important than typically seen.
There is wide agreement that an optimal way of encoding human knowledge is via a semantic knowledge graph, or SKG. The reason is simple: its underlying representation is powerful enough to create most all other required representations as needed: rules, tables, graphs, etc.
There are a few edge cases where this is not true, but they are known. Put that way, it’s the best grand unifier of how we best represent what we know.
But it’s also useful to know how we know something. Since most of our knowledge is derived from data, it makes sense to keep the two together. A semantic knowledge graph that is bound to the data that created it is far more useful than one that isn’t.
Packages and labels in the physical world need to be together for many of the same reasons.
Human knowledge is ideally an agile thing: new facts, new interpretations, new actions. Organizational knowledge isn’t different in that regard. We call intellectually agile people “smart”, we want our organizations to be the same way.
For that reason, I would judge any proposed approach by that metric first and foremost.
Data agility – which underpins the ability to learn and apply quickly – is why we’re doing this in the first place. Put differently, if it’s not data agility we want, then what is it we might want instead?
Metadata-centric approaches can achieve active metadata: how we classify things can change. But in so many situations, metadata has been divorced from data, so thus difficult to act on. There is a compelling case for connecting data with everything we know about it.
Don’t forget, meaning – semantics – can change quickly as well. Recent events might mean we are now looking at things in a new light. The case for active semantics is quite strong.
We will argue that the only way that data agility can be achieved is by deeply integrating active data, active metadata, and active meaning. We have many fascinating examples of this if you are interested.
Any integration-centric approach – combining bits and pieces, sometimes in the cloud – fails to deliver data agility, and thus the initiative fails. We have many examples of this as well. They are interesting as well, but in a different way, as they more clearly make the case for a metadata-centric approach.
It’s not always about connecting data with everything we know about it.
At MarkLogic, we tend to have a classic engagement point with our newer customers. In one corner, the business people who have a serious problem and are actively looking for a better answer. I think of them as the “data consumers”.
In another corner, we have the “data producers”: the teams responsible for surfacing facts and events captured elsewhere in the organization.
Somewhere in the middle, we have “logic providers”: people with a more application-centric focus who are trying to bridge the gap between the available data and the business need.
Everyone involved is usually somewhat frustrated, as they’ve been trying to solve their complex data problem for a while and it hasn’t been going well.
What we propose is a platform that meets the needs of all three primary stakeholders – and more – that enables them to start moving forward with much faster answers to their challenges – data agility.
In that regard, we are making connections. We try to reframe the problem as “reusable organizational knowledge and the data that created it.” Two out of the three groups usually get that concept, but one typically doesn’t. For the data stewards, it’s best explained in terms of surfacing data – and what we know about it – in a more usable, consumable fashion that does a better job with the problem at hand.
Along the way, there are security people, governance people, compliance and audit people, technology portfolio people, infrastructure people and maybe a few folks I’ve forgotten. We’ve met them before, and know what their concerns are.
As an armchair fan of IT architectures, I love distributed, encapsulated and scalable application architectures. I spent my time in the modern dev meets cloud native world, and can appreciate the benefits.
But, at the same time, I would argue that intentionally fragmenting and attempting to reassemble shared organizational knowledge is a strong anti-pattern. I am not alone in that regard.
Start with the SKG – that’s your core data structure – along with the data it needs to reason over. Only then encapsulate and apply your patterns – and not the other way around.
You’ll find that keeping data and metadata together gives you enormous agility advantages, both data agility and application lifecycle agility.
The interest in data fabrics, data meshes, etc. is growing. Clearly, people are starting to cast about for better answers. From my perspective, this is positive, as clearly better answers are needed.
I would argue that – at present – it’s a disconnected industry discussion.
There’s a cluster of interest in knowledge graphs, with semantic ones being the preferred variety. There’s a well-defined metadata management world, with a semantic approach being increasingly preferred. There is a database world that appears to be more interested in metadata, but hasn’t figured out how to do anything interesting with it.
This is against a backdrop of what I call “more cubed”, or more³: more data, more complexity, more demand for useful insights. So there’s plenty of incentive to move forward.
My perspective? Taking a metadata-centric approach to complex data problems yields outsized results. Clearly, it’s a different way of working with information that inevitably leads to different results.
Which brings to mind something Einstein said about doing the same thing over and over again?
Chuck joined the MarkLogic team in 2021, coming from Oracle as SVP Portfolio Management. Prior to Oracle, he was at VMware working on virtual storage. Chuck came to VMware after almost 20 years at EMC, working in a variety of field, product, and alliance leadership roles.
Chuck lives in Vero Beach, Florida with his wife and three dogs. He enjoys discussing the big ideas that are shaping the IT industry.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites