We’re all familiar with small teams of smart people, working together to understand data and what it might mean. These teams might be working in finance, life sciences, logistics, government – any pursuit that involves interpreting digital facts and their interpretation in context.
But the familiar model of using small teams of smart people doesn’t scale well at all.
What works for three doesn’t work as well when there are ten involved. Get to thirty participants and you’ll see serious problems.
Get to hundreds or thousands of participants, and it is now a formidably daunting task – especially if the smart people are scattered across multiple organizations.
Nor can small teams keep up as data complexity scales: whether that involves more data sources, more complexity in data sources, or just the complexity that arises from sheer volume.
Ad-hoc tools like spreadsheets, data marts, and analytic workbenches can only get you so far. And we have plenty of good tools for managing, sharing and analyzing data.
But we don’t have good tools to do the same with what we know about it.
In a previous article, we’ve discussed the importance of facts and what they mean. Evaluating information – and what it means in context – is vitally important to the organizations that invest in these teams.
We’ve also discussed the notion of data agility. We want our teams of smart people to be agile with complex data: being able to quickly make new interpretations regarding any aspect of information, and acting on them quickly.
Sharing data – relatively easy. Sharing our specialized knowledge about data – somewhat harder.
When teams of smart people work together, they will typically develop a language of precise terms and definitions to explain ideas and concepts.
To outsiders it appears as dense jargon, but to the people who use the language, it’s a very efficiently encoded way of describing important things.
That unique and specialized language also encodes what is known about the topic: definitions, rules, relationships, and so on.
Specialized knowledge is created and shared through specialized language.
When you listen to any two experts start to get into a discussion, there’s often a “definition of terms” phase. Once you’re past that – it’s off to the races. Friction is removed.
Conversely, if those experts can’t agree on terms and definitions, it often gets stuck there. Friction overcomes progress.
When people formalize their language, they are also formalizing their knowledge. Formalized knowledge scales much better than unformalized versions, and makes knowledge sharing easier.
This is true in both the digital and analog worlds.
In today’s world, the majority of the facts we evaluate are digital ones: an event, an observation, a data feed, or something similar.
We ingest the data and evaluate it. Is it important? Can it safely be ignored? What should be done with it? What does it mean?
An obscure transaction could be relevant to someone managing risk across a large portfolio. The publication of a new research report could be of immediate importance to a life sciences firm, but literally thousands of such notes are published every month.
In any security or intelligence function, many millions of scraps of new and potentially relevant information become available – what do they mean in context? In complex manufacturing or logistics functions, small problems can quickly morph into serious ones if important connections aren’t understood.
What is the formalized language we use to describe and share our specialized knowledge of these and other important things? The answer is – typically there isn’t a good one.
Richard Feynman‘s first notable achievement was organizing humans as a parallel computer at a time when computers didn’t exist. The enormously complex math calculations required during the WWII war effort led to the recruitment of hundreds of math PhDs and similar.
But they were taking way too long to get their work done.
He broke the math problem into components, assigned each to different teams to work in parallel, and created a novel way to rejoin their results at the end, greatly accelerating the outcome at a critical juncture in history.
The principle he discovered still holds true today: adding more smart people alone won’t solve the problem any faster or better unless there’s a way to connect their combined knowledge effectively. How does this apply today?
Simply put: without shared, formalized language about digital facts and what they mean, teams of smart people won’t scale their specialized knowledge about data effectively.
In the example above, the specialized language used was mathematical descriptions of known facts about particle physics. With that in hand, effective knowledge sharing became possible.
In our current digital world, if there isn’t a formalized language around data and what it means, specialized knowledge about the data can’t be efficiently shared and scaled.
Small teams working together can do this intuitively, larger teams don’t have that luxury.
If we were to share this concept from a systems architecture perspective, perhaps using a whiteboard, we’d draw a picture of each person involved – each with their knowledge about the data, what it means, what to do with it, etc. – as a “node” in an idealized knowledge-sharing network, often across organizational boundaries.
You’d want those people “nodes” to be able to publish and subscribe against a shared pool of data elements along with their full and richly described meaning: precise definitions, terms used and those terms defined, the rules and why they exist, important concepts defined precisely, and so on.
Ideally, the data elements would be fully self-described by metadata that encodes what is known about it, how those concepts and meanings relate to others, and so on – a semantic knowledge graph of complex data and its informed interpretations.
One wouldn’t want to allow data to be separated from its interpretation, for example any data that is subject to strict compliance and/or security measures. Not only is uninterpreted data of very low value, it creates risk in several different ways.
As above, a formalized and codified language about terms and concepts relevant to the data at hand greatly improves the speed and fidelity of knowledge sharing, whether in the digital or analog world.
When everyone is using the same language, friction is removed.
Once you recognize the pattern, you will see it almost anywhere you care to look in any organization, large or small: teams of smart people trying to keep up with torrent of digital facts in the form of complex data.
Without the right tools, it can be difficult, frustrating work.
Any dynamic risk management function has this challenge. Any research or applied R&D pursuit has this challenge. Any complex coordination function has this challenge. Any organization that takes compliance or security seriously has this challenge. Any organization that wants to differentiate itself with unique know-how and expertise has this challenge.
The underlying reason that this specialized knowledge is so difficult to scale is that we aren’t using the right tools to formalize and share what we know about the data.
The people involved don’t share a machine-interpretable and human-interpretable language describing the digital facts and what they mean. If perhaps they are fortunate enough to have this in some form, it is typically divorced from the data that created it.
As a result, any organization with this challenge will find it very difficult indeed to collectively scale their shared, specialized knowledge about data and what it means.
Data agility is the ability to quickly react to new digital facts and new interpretations.
Specialized knowledge about digital facts – data – is hard to scale as there is no shared, formalized language about the digital facts: their precise definitions and interpretations. If there is one, it is typically divorced from the data that created it in the first place.
Ideally, we would keep data and what we know about it – in the form of encoded interpretations – together at all times. Doing so creates data agility.
What we’ll see next is that – surprisingly – the pieces and parts we need for this task have been around for a while, and are somewhat familiar.
We’re just learning to use the tools in new ways.
Download our white paper, Data Agility with a Semantic Knowledge Graph
Jeremy Bentley is the founder of Semaphore, creators of the Semaphore semantic AI platform, and joined MarkLogic as part of the Semaphore acquisition. He is an engineer by training and has spent much of his career solving enterprise information management problems. His clients are innovators who build new products using Semaphore’s modeling, auto-classification, text analytics, and visualization capabilities. They are in many industries, including banking and finance, publishing, oil and gas, government, and life sciences, and have in common a dependence on their information assets and a need to monetize, manage, and unify them. Prior to Semaphore Jeremy was Managing Director of Microbank Software, a US New York based Fintech firm, acquired by Sungard Data Systems. Jeremy has a BSc with honors in mechanical engineering from Edinburgh University.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites