Element Level Security Protects Information Within Documents

May 15, 2018 Data & AI, MarkLogic

The inescapable headlines about data breaches mean that we’re all aware of the proliferation of data security issues as well as industry and government efforts to stem them. In addition, data crises at Equifax, Anthem, eBay, JP Morgan Chase, Yahoo, and the U.S. Office of Personnel Management indicate that being a large organization with ample resources does not preclude security vulnerabilities. At a time when integrating and sharing data is more critical to risk management in healthcare, financial services and national security, protecting data means taking measures at the company, database, document and element levels.

Element level security, also known as field-level security, granular-level security, and even cell level security in relational databases, allows you to identify and protect sensitive information within documents. Let’s think about a medical record. We certainly need to protect the entire record from external parties. But it is possible to protect personally identifiable information (PII), such as a social security number, within the medical record. This allows medical staff and even researchers to see health information while hiding information that is relevant to billing process. The reverse is also true—we can limit the amount of health information that billing personnel see.

Schema-Driven Security

Some database platforms, particularly relational databases, achieve this level of security using a schema. If we think of all of the forms we have to fill out a doctor’s office, it’s easy to see how a schema can help label the information. Last name, first name, date of birth, gender, married, insurance number, social security number … it all has a label and it fits neatly into columns and rows. Adding a flag or tag to one piece of that information, or even a series of pieces of information, is easy to do—and then saying person A can see the information in column H … That makes sense, right?

But schemas can be rigid and difficult to change, and we haven’t talked about the free form part of those doctor’s forms—where you can write in information in the “please explain” options or about the text written in the rest of the record. Further, it’s more difficult to secure textual information with cell level security when on a relational database platform.

Element Level Security

A better approach is to use XML or JSON to identify pieces of information within a document or other entity, since both of those formats are self-describing. That is to say, that every field has an element name for XML (or a property name for JSON) as well as a value that is assigned on import. Now you have some context for every piece of data just by looking at the element or property name.

An electronic medical record likely comprises data of all sorts—documents, PDFs, and relational data spring easily to mind. Access to the Word document will give you access to all of the information in it—as Word does not define elements within documents. Converting Word documents to XML and/or JSON documents affords us an opportunity to add information to the contents of the document, and these become the elements and properties. In relational terms, we store the information that lives in columns and rows in a relational database within each XML and JSON document.

That means that each document holds all of the information on how to use it—all without having to define a schema. For security, that means that we can create rules using every defined element within the XML and JSON documents. Those can be added to and changed, but with no impact on the information already described within them.

These XML and JSON code snippets show how you can describe data within documents. It is easy to identify—and secure—the social security number in both.

XML and JSON Are Key

So how does this help with element level security? With XML and JSON, you can label elements and properties, so that a social security number might look like SSN: number. We can then write a rule that says all information with this label is sensitive—and that label can exist within any record or entity.

Another advantage to using XML or JSON is that even if the document structure changes, you can still find elements within it—as long as you do not change the name of the element or label. So different versions of the document and the elements within them are protected – no matter where they are stored or if they’re being used.

This even works if you have different names for the same information such as SS_Number, SSN, social_security. You would just harmonize these labels using the envelope pattern.

This role-based access control works exactly as it does for document-base access, but now it’s allowing access to every defined element in the document.

Government Case Study

A government case study illustrates this beautifully. Let’s assume that analysts are sharing information in spreadsheets like the one we see here.

The U.S. government classifies documents and entities at the highest level of the information contained in the document. That means that even if only one piece of information in the document is classified as secret, then entire document is considered secret. Note that in our sample spreadsheet, we have one cell highlighted in yellow that indicates that this single piece of information is classified as secret, which triggers the same classification for the entire spreadsheet.
Once this document is marked as secret, it is illegal for anyone, even someone with the appropriate clearance level, to open this spreadsheet, delete the one cell containing the secret level information, save the document and share it with others that do not have a clearance. There is a formal process for declassifying a document, but doing this for large numbers of documents tends to be economically prohibitive.
This problem also applies to the multiple tables associated with a relational database. A whole database must be designated secret or top secret, even if 99.99 percent of the information is unclassified data, with one or two bits of information in higher classification levels.This is a major problem when intelligence agencies try to share information to their Department of Defense, law enforcement or disaster relief partners that do not have that higher level of clearances.
By applying element level metadata, we could associate classification labels (UNCLASS, SECRET, etc.) to each value of the above spreadsheet. Then analysts can run a query exporting only the data that users are allowed to see, based on classification level.

Granular Data Protection

In short, element-level security goes beyond the existing document-level security to allow specific elements of a document to be hidden from particular users. The increased granularity means greater data protection.

This is particularly important in light of the findings of a recent Market Connections survey of federal agencies that found that more than one-third of respondents say their agency has experienced unauthorized examination of agency files or databases, and two in ten have experienced modification of agency files or databases from an insider. One quarter say they have experienced this from an outsider.

So no matter the industry, element-level security can help you continue to protect your data—even if the unthinkable happens.

For More Information

Here’s more technical information on element level security

And find out how government agencies are using element level security

Evelyn Kent