The Importance of Metadata to Life Sciences

Polystructured data holds great promise for life sciences as it brings new insights to bear on key business challenges. But delivering on this promise takes a robust data strategy to connect petabytes of clinical, regulatory, and real-world data in order to solve these critical challenges. Enter metadata.

What’s metadata?

Metadata is simply “data about data”—data that describes, explains, locates, or in general makes it easier to find an information resource. Metadata can be structural (e.g., where is it contained?), descriptive (e.g., who authored the document?), or administrative (e.g., what is the file type?). It helps to think of metadata like the glue that harmonizes, links and adds context to data. If you have ever used an old-fashioned library catalog to find a book (remember the Dewey Decimal System?), then you have used (a non-digital example of) metadata.

What are the advantages of metadata?

Developing a data strategy that prioritizes metadata management offers life sciences companies three key advantages:

Metadata can optimize reuse of existing content and resources. Life sciences companies manage a lot of content—from periodic safety reports and scientific research to clinical trial protocols and regulatory compliance submissions, among other content types. Metadata helps to profile attributes of key assets so that users can find, for example, standardized informed consent language from a prior clinical trial to reuse in a new one. Particularly for life sciences companies that are expanding real-world evidence repositories, it’s important to easily profile and reuse real-world data based on where it has the most impact in the product lifecycle.
Metadata can simplify retrieval of content for various operational business functions. By cataloging key attributes, such as author, date and file type with metadata, users can search for and retrieve an asset based on the attributes that matter for the task at hand. For example, one of the key regulatory challenges for life sciences companies is the identification of medicinal products (IDMP) because it requires accurate reporting to harmonize product identifiers against four International Organization for Standardization (ISO) domains. Metadata can ease the burden of retrieving relevant data that’s spread across source systems to align with the ISO domains.
Metadata can enhance tracking of many different types of content. By knowing key attributes of a particular piece of content (e.g., a submission date), you can track its movement without needing the content itself. Life sciences companies regularly execute complex processes that often require real-time status updates, including checking on regulatory approval submissions, drafting cross-functional reports, and monitoring clinical trial procedures. When combined with real-time alerting, tracking can streamline workflows and reduce unnecessary resource expenditures over time.

Leveraging metadata is difficult without search and semantics

Implementing a data strategy that maximizes the value of metadata is—in non-technical terms—all about easily finding what you’re looking for. In more technical terms, this means you can run complex queries across powerful search indexes without needing to shred the data first. These indexes make your data strategy function like a search engine in which you can search across data and metadata stored together.

Think about retrieving real-world evidence to study patient adherence around a particular drug. By searching data and metadata across various source systems, you can retrieve key evidence of the cause of the non-adherence. Did certain patient segments experience greater side-effects than others? Could the treatment protocols be changed from one to two doses daily, or from injectibles to pills? Is the cost sharing for the drug too high under certain health plans?

In addition to finding what you’re looking for, it’s nice to get a little help connecting the dots. In more technical terms, this is semantics. Semantic data—called “triples”—link together related entities (people, places, or things) to depict a relationship. A robust data strategy should include natively storing triples to provide valuable context around both data and metadata. Leveraging triples with semantics accelerates exploration, categorization, and analytics during drug discovery, among other key enterprise processes.

In short, life sciences companies should prioritize metadata management when developing their data strategy. After all, an old library full of books might contain plenty of knowledge, but it’s hard to navigate without the Dewey Decimal System.

MarkLogic Semaphore

Nick Diamond

Nick develops market strategy and messaging for MarkLogic's healthcare and life sciences business. Before joining MarkLogic, Nick worked in advisory and consulting capacities at the MITRE Corporation and Booz Allen Hamilton, respectively, where he supported the U.S. Department of Health & Human Services with policy and operational challenges around implementation of the Affordable Care Act.

Nick has deep subject matter expertise across the health care domain, with particular strengths in health law, bioethics, and public health. His work has appeared in law reviews, leading peer-reviewed journals, and popular media outlets. He has taught health care ethics at the university level, lectured in both medical schools and schools of public health, and appeared multiple times on MSNBC as an expert in public health law.

Nick studied philosophy and theology as an undergraduate at Georgetown, law at Charleston Law, and bioethics at the University of Pennsylvania. He is currently finishing a master of laws (LL.M.) in global health law at Georgetown Law, where he also serves as an Articles Editor and member of the Article Review Committee for The Food and Drug Law Journal.