Using Metadata Repository to Improve MDM Success

June 22, 2021 Data & AI, MarkLogic

In banking every business line is the master in its domain and its data is subject to its rules. Master Data Management needs to consider all these masters (pardon the pun) but strong barriers prevent unifying all these silos. A metadata repository to bridge all these sources of truth and make your MDM project a success is key.

The primary goal of master data management (MDM) is to provide a single unified version of the truth. This prevents users from using overlapping and possibly invalid or inconsistent versions of the data. Examples of the problems caused by not having a single, unified version of the truth include:

  • Users need data and are not able to find it.
  • Customers explicitly tell a bank not to call or email them with marketing messages but because that information is isolated in a silo the mortgage sales group sends mortgage solicitations to the customer: Irritating and alienating them.
  • Silos data duplication causes the same query asked of different systems to return different results, causing endless problems including with regulators.
  • Duplicate and siloed data (including copies in Excel spreadsheets) makes it almost impossible to secure and track the usage of the data, increasing the possibility of breaches and exposure of sensitive information.
  • Data transformations needed to move data from one silo to another leads to errors and make it difficult to compare what should be the same data stored in different sources.

Master data management projects are often large and expensive undertakings that can require years to implement, cost millions of dollars — and usually fail. That’s not good for CIOs who often lose their jobs as a result. And worse, since most MDM projects are done with a waterfall methodology where you do not get any results until all or most of the underlying work has been completed, much of the effort that goes into failed MDM projects is often lost. If the project fails mid-way through there may not be completed deliverables which are sufficiently completed to stand on their own.

Finding a better approach to MDM is a critical need for all industries, but particularly the mortgage industry.

WHY DO MDM PROJECTS FAIL?

Common processes used in master data management include: source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, data classification, taxonomy services, item master creation, schema mapping, product codification, data enrichment and data governance.

This is a large and difficult set of tasks, as the sheer scope of MDM makes it difficult to succeed.

Making it even harder is that MDM projects, by their nature, involve rationalizing data across business lines. Many of the business lines are often happy with the state of their data and processes and do not consider diverting scarce internal resources to achieve an external goal to be an important priority. MDM projects can disrupt their operations while business line owners feel the benefits accrue to others. Worse, MDM projects can make owners of data feel they are losing authority over their data and having it be controlled according to rules issued by a central authority.

The wonder is that any MDM projects succeed at all!

INCREASING THE ODDS OF MDM SUCCESS

While MDM is inherently difficult, using a metadata or document repository as the starting point in creating a single version of the truth can provide an organization with immediate benefits and ease the difficulties in moving forward. With MarkLogic-based metadata repository, effort can be minimized and the probability of success can be greatly enhanced as you can see incremental improvements. Long delays before seeing any benefits is a death knell for most MDM projects.

It is important to keep in mind that a mortgage metadata or document repository offers many benefits on its own. See How to Drive the Mortgage Industry Forward – for a more in-depth look at how such repositories are improving everything from loan servicing costs, customer satisfaction, securitization and more.

The key to facilitating MDM with a universal repository is that the repository consolidates the siloed information in a central store that can be accessed in an integrated fashion. The mechanics of this are discussed in “Building a Universal Mortgage Repository” .

A consolidated view of the data provides immediate benefits in terms of enabling easy and accurate access to the data – a prime goal of MDM. Additionally, it provides a powerful aid for MDM type activities like error detection, data classification, enrichment, and governance. If all the data can be understood from a single view and individual data items can easily be compared and contrasted these activities can be performed with far less effort than would otherwise be necessary.

In many MDM projects providing a central view of the data is considered an outcome of the MDM project. A major reason for this is because different data sources have different schemas and relational based MDM (which is most MDM) have difficulty pulling together the diverse data source.

With relational based MDM, early phases of a project typically involve modeling the underlying data sources and then creating a single data model which the data sources need to fit into. To make this work, extensive ETL is required to transform the data in the silos to make it compatible with the central data model. While this is going on, the world is of course changing and the data schemas in the primary systems are evolving – with every change requiring further analysis and work.

All of this modeling and ETL requires extensive effort before a central repository becomes available and its benefits can be utilized for the benefit of the project. With a MarkLogic-based repository, things are quite different; data can be loaded and accessed as is without modeling or ETL. This allows MDM activities like data classification or error detection to be greatly simplified and speeded up. Modeling of the data can be done incrementally with the modeling needed for specific needs prioritized and done first, rather than in the waterfall model relational pushes you to.

When you have data needs like duplication detection, product and client disambiguation rules, source priority rules, dupe merging, address disambiguation a universal repository makes life a lot easier. For example

  • When doing data duplication having a metadata repository that pulls the data together makes it much easier to find duplicates.
  • When handling multiple addresses for a single individual MarkLogic semantics allow each of the different versions of the data to exist but provide links and rules for how to pull a “golden record” from the mass of data. Semantic triples data and bi-temporality can also provide details on the data lineage and provenance as well as other links between documents.

With all these capabilities available from the beginning of the MDM project, instead of being a final deliverable, MDM projects that use universal metadata repositories can be far more effective than traditional projects.

David Kaaret

David Kaaret has worked with major investment banks, mutual funds, and online brokerages for over 15 years in technical and sales roles.

He has helped clients design and build high performance and cutting edge database systems and provided guidance on issues including performance, optimal schema design, security, failover, messaging, and master data management.