MarkLogic design patterns are reusable solutions for many of the commonly occurring problems encountered when designing MarkLogic applications. These patterns may be unique to applications on MarkLogic or may be industry patterns that have MarkLogic specific considerations. Unlike recipes, MarkLogic design patterns are generally more abstract and applicable in multiple scenarios.
Separate data intended for consumption by external processes from data intended to make the MarkLogic database system more powerful and flexible. Create an overall envelope parent element or object that contains a “headers” section and an “instance” data section, which are separate within the document. This aligns with how the MarkLogic Data Hub Framework and MarkLogic Entity Services create their envelopes.
In MarkLogic, the JSON or XML document that is stored becomes the “interface to the indexes.” This means that when you add an element to an XML document, or add a nested object to a JSON structure, you are also causing that data and its relationship to parents and values to be indexed. Thus, the primary mechanism to adding indexed data is simply adding elements and nested objects.
It is useful to separate out the data that is used to do something in MarkLogic from the data that is stored because your data services API needs it.
Data services may want data that:
In contrast, MarkLogic processing and indexing may improve if data:
While the same goal can be achieved using extensive transforms on the data as it is ingested, and then again as it is accessed by the data services, it is much more efficient and clear to have the externally-accessed “core” data stored as-is.
For the sake of simplicity, we will discuss this pattern in XML terms (documents, elements and nodes) going forward, but all concepts also apply to JSON.
In summary, we address two conflicting goals by using the envelope pattern:
Some systems have additional goals as well, such as multiple APIs that consume data in radically different formats (e.g., CSV vs. JSON, XML vs. RDF). In that case, there may be more sections than the headers and instance sections (such as “triples” or “html-preformatted”).
This pattern is often used to quickly integrate large sets of data together into MarkLogic. With this approach, raw or “good enough” data is directly ingested into MarkLogic, and a relatively small number of elements are initially included in the “headers” section to maintain uniform indexing, retrieval, and analysis across many data sets.
All data in the “instance” section can be accessed, rendered using default rendering, exported, and managed, and the system accessing the data can be developed very quickly using the most valuable data first.
When used with the Data Hub Framework—where raw content is initially ingested into a Staging database—the “instance” section would include more uniform or harmonized data.
Keeping data used purely by MarkLogic processes separate from data accessed by data services allows developers to add data to the “headers” section as needed without breaking external layers or sub-systems. This can reduce time to analyze, re-code, test, and coordinate on large projects.
The envelope design pattern should generally be used in all designs. You should have a specific and compelling reason not to use this pattern before omitting it from your design.
We recommend to use this pattern when:
“All access through a service” is a pattern that ensures that all updates add the “headers” section and that all queries remove it. This makes the “headers” section invisible to callers, preserving flexibility within the MarkLogic data layer (within .js and .xqy code inside MarkLogic itself).
Adding “index-able” data is separated from returning data formats. A change to headers will not be externally visible to clients depending on the “instance” data.
Consider the following issues when implementing the envelope pattern:
Consider a set of articles like this one in XML format that need to be stored, searched, and accessed:
<article>
<abstract>
<para>You can build a fence by deciding the areas to separate, and then making a barrier from wood or metal that sits between them.</para>
</abstract>
<para>It is often said that good fences make good neighbors.</para>
<para>Choosing areas to divide with your fence is the first step. Jim Smith has built a lot of fences, and says that in Paris, France, people divide garden areas from other areas, but in Cleaveland, OH, people divide chidren's play areas from the street most often</para>
<articleInfo>
<title>How to build a fence</title>
<revision>
<date>1/15/2002</date>
<revnumber>1.0</revnumber>
</revision>
<author><firstname>Nihal</firstname><surname>Jain</surname></author>
</articleInfo>
</article>
Figure 1 shows a simplified approximation of the docBook schema. Let’s assume that callers need this data in this exact format or it will be considered invalid. There are two problems you should consider if you want to search or facet using a range index on the revision date. First, the desired data is in a non-specific <date>
element; therefore, adding a range index on “date” is likely to also include other dates if <date>
is ever used in other contexts. Second, the date is in a format that is not compatible with the XML spec for an xs:dateTime
. To solve these two issues, we run this transform on ingest:
declare variable $article external;
declare namespace meta = "http://marklogic.com/patternExample/meta";
let $textDate := $article/articleInfo/revision/date/text()
let $xsDate := xdmp:parse-dateTime("mm/dd/yyyy", $textDate)
let $internalDate := <meta:revisionDate>{$xsDate}</meta:revisionDate>
return
<envelope
xmlns="http://marklogic.com/entity-services">
<headers>
{$internalDate}
</headers>
<instance>
{$article}
</instance>
</envelope>
Code in Figure 2 extracts a transformed/formatted version of the date and creates a more specifically-named element in another namespace, <meta:revisionDate>
, which allows for unambiguous indexing and access to the desired xs:date
value.
Now, to search for all articles in January of 2002, we would add a date range index to <meta:revisionDate>
and query like this:
declare namespace es = "http://marklogic.com/entity-services";
declare namespace meta = "http://marklogic.com/patternExample/meta";
(: generic function to query documents, including headers, but return only the instance data :)
declare function es:queryData($q) {
for $envelope in cts:search(/es:envelope, $q)
return $envelope/es:instance/element()
};
let $fromQ := cts:element-range-query(xs:QName("meta:revisionDate"),
">=", xs:date("2002-01-01"))
let $toQ := cts:element-range-query(xs:QName("meta:revisionDate"),
"<=", xs:date("2002-01-31"))
let $jan2002Q := cts:and-query(($fromQ, $toQ))
return es:queryData($jan2002Q)
Note that the function es:queryData($q)
returns any child element of the <es:instance>
element, so it is not specific to articles.
For data representing profiles in a social network, such as LinkedIn or Facebook, we may store a person’s profile as XML, but their relationships as RDF. The RDF may go in the “triples” section.
Here is a hypothetical person profile in a social network application:
declare namespace sn = "http://marklogic.com/patterns/example/social-network";
<sn:person>
<sn:name>Alfred</sn:name>
<sn:uniqueUserName>Alfred_Jones_1974</sn:uniqueUserName>
<sn:interests>
<sn:interest levelofinterest="7">Semantics</sn:interest>
<sn:interest levelofinterest="10">MarkLogic</sn:interest>
<sn:interest levelofinterest="3">Polyglot Persistence</sn:interest>
</sn:interests>
<sn:friends>
<sn:friend>Sally2227</sn:friend>
<sn:friend>MargaretTheProgrammer</sn:friend>
<sn:friend>Neeraj</sn:friend>
</sn:friends>
</sn:person>
Each user is ideally modeled as a document, because it is self-contained and hierarchical. However, the social network itself is a graph, so the relationship data is ideally modeled using RDF triples:
Alfred <foaf:knows> Sally
Alfred <foaf:knows> Margaret
Alfred <foaf:knows> Neeraj
To augment the profile in Figure 3 with semantic triple information about the social network “Alfred” is part of, run this code when each document is inserted or updated:
let $thisPersonName := $newPerson/sn:uniqueUserName/text()
let $knowsGraph :=
for $friendName in $newPerson/sn:friends/sn:friend/text()
return sem:triple(
sem:iri($thisPersonName),
sem:iri("http://xmlns.com/foaf/0.1/knows"),
sem:iri($friendName) )
let $envelope :=
<envelope xmlns="http://marklogic.com/entity-services">
<es:triples>
{$knowsGraph}
</es:triples>
<es:instance>
{$newPerson}
</es:instance>
</es:envelope>
return $envelope
Running the code in Figure 4 results in the structure we want: the “person” record is left as-is, bundled into an envelope with semantic triples that describe the social network derived from this profile:
<es:envelope xmlns:es="http://marklogic.com/entity-services">
<es:triples>
<sem:triple xmlns:sem="http://marklogic.com/semantics">
<sem:subject>Alfred_Jones_1974</sem:subject>
<sem:predicate>http://xmlns.com/foaf/0.1/knows</sem:predicate>
<sem:object>Sally2227</sem:object>
</sem:triple>
<sem:triple xmlns:sem="http://marklogic.com/semantics">
<sem:subject>Alfred_Jones_1974</sem:subject>
<sem:predicate>http://xmlns.com/foaf/0.1/knows</sem:predicate>
<sem:object>MargaretTheProgrammer</sem:object>
</sem:triple>
<sem:triple xmlns:sem="http://marklogic.com/semantics">
<sem:subject>Alfred_Jones_1974</sem:subject>
<sem:predicate>http://xmlns.com/foaf/0.1/knows</sem:predicate>
<sem:object>Neeraj</sem:object>
</sem:triple>
</es:triples>
<es:instance>
<sn:person xmlns:sn="http://marklogic.com/patterns/example/social-network">
<sn:name>Alfred</sn:name>
<sn:uniqueUserName>Alfred_Jones_1974</sn:uniqueUserName>
<sn:interests>
<sn:interest levelofinterest="7">Semantics</sn:interest>
<sn:interest levelofinterest="10">MarkLogic</sn:interest>
<sn:interest levelofinterest="3">Polyglot Persistence</sn:interest>
</sn:interests>
<sn:friends>
<sn:friend>Sally2227</sn:friend>
<sn:friend>MargaretTheProgrammer</sn:friend>
<sn:friend>Neeraj</sn:friend>
</sn:friends>
</sn:person>
</es:instance>
</es:envelope>
This example is slightly different than the article repository example in that we introduce a triples section to highlight its purpose. The instance section is simply the original “person” record.
Related patterns (TBD) include all patterns to add data outside of the actual documents being inserted and returned. These include patterns to store additional information in the URI scheme, collections, properties fragments, or RDF triples.
The envelope pattern has become ubiquitous in MarkLogic implementations. The pattern is leveraged heavily in the MarkLogic Data Hub Framework, and is likely found in any MarkLogic implementation that involves data integration.
Damon is a passionate “Mark-Logician,” having been with the company for over 7 years as it has evolved into the company it is today. He has worked on or led some of the largest MarkLogic projects for customers ranging from the US Intelligence Community to HealthCare.gov to private insurance companies.
Prior to joining MarkLogic, Damon held positions spanning product development for multiple startups, founding of one startup, consulting for a semantic technology company, and leading the architecture for the IMSMA humanitarian landmine remediation and tracking system.
He holds a BA in Mathematics from the University of Chicago and a Ph.D. in Computer Science from Tulane University.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites