In the second blog post in our Operational Data Hub (ODH) series, we discussed what technical debt is and how it manifests. Now, let’s talk about some root causes of technical debt and how the ODH helps solve it.
When it comes to the technical debt associated with integrating data from silos, there are often many seemingly unrelated business problems that arise from a common root cause. Consider the following two examples that most companies can likely relate to:
Example 1: Customer Service Call Center Problem:
The amount of time customer service representatives spend on trying to service customer requests is very high.
- Why? They may need to potentially search across 16 different systems to find the information they’re looking for.
- Why? The systems all have data about a customer (some of which overlaps) yet they each have different data models.
- Why? They serve different operational functions, and the data has yet to be integrated.
- Why? Combining that data in a cohesive way to allow for a customer 360 has proven to be difficult and error-prone.
- Why? When creating a relational database model, you must consider all data-model variances upfront, and the schema must be created before development can begin.
In the above case, the root cause lies with limitations of modeling with a relational database management system (RDBMS), where schema modeling is a prerequisite activity to development. Moreover, as shown in the next example, because nearly every model change in a relational database is often accompanied by non-trivial code and back-testing changes, modelers attempt to design schemas that account for as many scenarios as possible, potentially making the modeling exercise very complex and time-consuming. In many cases, due to complexity, compromises are made in the modeling process in an attempt to meet a deadline or otherwise “save time.”
Example 2: Investment Bank–Derivatives Post-trade Processing, Project Delivery Problem:
After more than 18 months, the project team still has not started development and does not expect to start for another 3-6 months.
- Why? The data model is not finished.
- Why? They haven’t accounted for all of the asset classes.
- Why? Every time they look at a new asset class, the model has to be redone.
- Why? The source models of each entity are very different, causing difficulty for the modeling team.
- Why? When creating a relational database model, you must consider all data model variances upfront, and the schema must be created before development can begin. Again, despite the different business-use cases, the root cause lies with the limitations of an RDBMS schema, where integrating multiple valid models is difficult, time-consuming and brittle.
ODH to the Rescue
So how can the ODH solve issues like those above? Well, even though ODH implementations may vary from one to the other, they all have in common certain foundational principles that address data management challenges.
Use of document/object models to represent business entities. Self-describing documents (such as XML or JSON) are a natural way to represent business entities or objects. They do not suffer from the so-called “impedance mismatch” associated with object-to-relational mapping and come with many benefits such as:
a) The ability to treat schema as data, given that every payload may also include schema information. This is what gives schemas and models the same level of agility as the data itself.
b) The ability to allow for multiple models that represent the same class of business entity. For example, multiple systems may model customer data in different ways. In an ODH architecture, all of those models may be represented concurrently.
c) The ability to store metadata and data together. This allows provenance and lineage to be captured and provides a strong foundation for data governance.
Data harmonization. Most approaches to integrating disparate data models involve coming up with a new model followed by attempts to “force fit” (by way of ETL) all source data into the new model. Data harmonization, on the other hand, starts with the premise that all source models are not only valid, but also valuable, and hence should be retained as is in an integrated context. These source models are then leveraged to create an integrated canonical model (or models) on an as-needed basis, all the while recording valuable provenance and lineage metadata inside the ODH itself. The result is that instead of a lowest-common-denominator subset of integrated data, the ODH creates an agile superset of the source data.
Use of semantic RDF triples to represent relationships. The Resource Description Framework (RDF) is a set of W3C standards for representing machine-readable concepts about things and relationships between things. It also forms the basis for the concept of the Semantic Web. The unit of representation is called a triple, which consists of a subject, a predicate and an object, collectively comprising a fact/concept or a relationship (e.g., “Euro typeOf currency”). In an ODH, RDF triples provide a myriad of capabilities with respect to managing data and the complexities of data integration.
Indexing to support real-time ad hoc queries and searches. Unlike a data lake that depends on subsequent brute-force processing for data querying, an ODH indexes all data on ingestion to ensure that data is “queryable” as soon as it is loaded.
Support for bidirectional data access. Unlike patterns that support either “run-the-business” or “observe-the-business” functions, the ODH supports both. By allowing real-time updates with transactional support, alongside the ability to impact schemas and data in a way that may be tracked and audited, the ODH is a safe place in which direct updates may be made to integrated data without negatively impacting data governance and accuracy.
In our next blog in the series, we will dig down deeper into semantics and RDF triples and discuss what they can do for organizations’ data integration and management.
LEARN all things ODH by downloading our e-book. It’s a soup-to-nuts read about how our pattern helps companies better run their businesses.
Kate Ranta
Kate Ranta is a Solutions Marketing Manager at MarkLogic. She is a communications and marketing professional with a focus on digital content strategy, inbound marketing, social media campaign management, SEO, and project management.