This article contains a comprehensive discussion on maps and operators. In a previous post, Returning Lexicon Values using XPath Expressions, I made a quick reference to map:map operators and demonstrated how to use it to compute the difference of two maps in order to filter results for processing. Here, I’ll be providing a more in-depth discussion on how exactly maps work, as well as delve deeper into the powerful features they provide.
To begin, let’s formally define some basic constructs for how a map:map
works. Maps are in-memory key/value structures, introduced in MarkLogic version 5. Out-of-the-box, maps provide the ability to perform fast in-memory inserts/updates/lookups/existence checks. Maps are also mutable structures, so you can change them without creating copies like you would changing XML Node types. This allows all operations to execute very efficiently with no side-effects, not common in functional programming languages like XQuery.
To review more of the basic operations you can perform with maps, refer to the MarkLogic map functions page.
Here are a few basic operations you can do with map operators:
map:map
– Creates a new map or creates a map with data from an xml serialization of a map:map.map:put
– Puts a value by key into a mapmap:get
– Gets a value from a map by keymap:keys
– Returns all the keys present in a map.map:delete
– Removes a value by key from a map.map:count
– Returns the count of the keys in the mapmap:clear
– Clears the map of all key/valuesmap:new
– Creates a new map:map
, but accepts a sequence of existing or map:entry(k,v). This is a very composable and convenient way to join multiple maps together.map:entry
– Create map:map
with a single key/value structure.map:contains
– Returns true if the key exists in the map.map:with
– Updates a map inserting a value into it at the given key.Maps are also supported as output for many lexicon-based functions, including:
cts:element-*-values,cts:values
) – Returns a map where the key and the value are the same.cts:element-*-value-co-occurrences, cts:value-co-occurences
) – Returns a map where the key is equal to the first tuple and the value is a sequence of the second tuple.Let’s now walk through various examples of using maps, that way we can get a better understanding of how and why to use them. This first example sticks a series of different value types inside a map, then walks the keys to describe each value.
xquery version "1.0-ml"; let $map := map:map() let $puts := ( map:put($map, "a", "a"), map:put($map, "b", <node>Some node</node>), map:put($map, "c", (1,2,3,4,5)), map:put($map, "d", function() {"Hello World"}) ) for $key in map:keys($map) return fn:concat("Key:", $key, " is ", xdmp:describe(map:get($map, $key))
When executed in Query Console, the code above returns:
Key:c is (1, 2, 3, ...) Key:b is <node>Some node</node> Key:d is function() as item()* Key:a is "a"
As you can see, a map can flexibly store values, nodes, and even functions.
Another useful feature of maps is that they can be passed around by reference, which allows information to be shared between different modules/transactions while maintaining a single instance across them. In the example below, we are going to take attendance by allowing multiple spawned functions to add entries to a global map across separate transactions. In the final function, we will check if a value is present and answer accordingly.
xquery version "1.0-ml"; let $map := map:map() let $foo := xdmp:spawn-function(function() { map:put($map, "foo", "Foo is here") }) let $bar := xdmp:spawn-function(function() { map:put($map,"bar", "Bar is hear(yawn)") }) let $baz := xdmp:spawn-function(function() { if(map:contains($map, "bar")) then map:put($map, "baz", "Baz is here, only if bar is here.") else map:put($map, "baz", "Baz is here, but why is bar always late") }) return $map
And this returns:
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="foo"> <map:value xsi:type="xs:string">Foo is here</map:value> </map:entry> <map:entry key="bar"> <map:value xsi:type="xs:string">Bar is hear(yawn)</map:value> </map:entry> <map:entry key="baz"> <map:value xsi:type="xs:string">Baz is here, only if bar is here.</map:value> </map:entry> </map:map>
But, be aware of the fact that each spawn-function
call is “non-blocking” for the return, so it could return before all results that come back. In the next example, we will have the “bar” function sleep for one second before it executes its map:put
.
... let $bar := xdmp:spawn-function(function() { xdmp:sleep(1000), map:put($map,"bar", "I am lazy bar") }) ... return $map|
Which returns…
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="foo"> <map:value xsi:type="xs:string">Foo is here</map:value> </map:entry> <map:entry key="baz"> <map:value xsi:type="xs:string">Baz is here, but why is bar always late</map:value> </map:entry> </map:map>
(As you can see “baz” is pretty upset bar is not present.)
To provide “blocking” you can pass
result=true
option toxdmp:spawn-function
or usexdmp:invoke-function
in its stead.)
Maps are also directly serializable to JSON using xdmp:to-json
. In fact, map:map
and json:object
in MarkLogic are like cousins, as they can be used interchangeably support identity/casting between types. A fundamental difference, however, is that the json:object
maintains key order, but map:map
does not. So in cases where you care about the ordering of the keys, you can use a json:object and all puts will preserve the order. As seen in the example below, you can compose a json:object
and use map functions to populate it with data and render the output to JSON . The following object composes the same json structure using map:map
and json:object:
let $json-object := map:map() let $puts := ( map:put($json-object, "name", "Gary Vidal"), map:put($json-object, "age", 40), map:put($json-object, "birthdate", xs:date("1974-09-09")) ) return xdmp:to-json($json-object) Returns: {"birthdate":"1974-09-09", "name":"Gary Vidal", "age":40}
let $json-object := json:object() ... return xdmp:to-json($json-object) Returns: {"name":"Gary Vidal", "age":40, "birthdate":"1974-09-09"}
In the example above, the order is preserved using json:object
, where the map:map is not.
The documentation for map operators can be found at Map Operators.
Operator | Description |
---|---|
+ | Computes the union (distinct) of two maps, such as (ex. $map1 + $map2 ). |
- | means the difference of two maps (think of it as set difference) (ex $map1 - $map2 ).
This operator also works as an unary operator. So, -B has the keys and values reversed (-$map1 ) |
* | means intersection (ex. $map1 * $map2 ) where only the keys present in both maps are returned. |
div | means inference. So A div B would consists of keys from the map A, and values from the map B, where A’s value is equal to B’s key or simply a join. |
mod | (ex. $map1 mod $map2 ) is equivalent to -A div B |
Now that we have reviewed some of the basic operators, let’s see them in action with some examples!
xquery version "1.0-ml"; let $map1 := map:new(( map:entry("a", "a"), map:entry("b", "b") )) let $map2 := map:new(( map:entry("a", "b"), map:entry("b", "b"), map:entry("c", "c") )) return $map1 + $map2
Returns:
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="b"> <map:value xsi:type="xs:string">b</map:value> </map:entry> <map:entry key="c"> <map:value xsi:type="xs:string">c</map:value> </map:entry> <map:entry key="a"> <map:value xsi:type="xs:string">a</map:value> <map:value xsi:type="xs:string">b</map:value> </map:entry> </map:map>
As you can see, all keys from $map1
and $map2
are combined and only the distinct values are returned. It’s important to understand this distinction, because if you are counting the values after you perform the union, you will get the distinct union’s count, not a merge of the two maps where the duplicate values were repeated.
In the example below, we want to compute the difference between two maps.
let $map1 := map:new(( map:entry("a", "a"), map:entry("b", "b"), map:entry("e", "e") )) let $map2 := map:new(( map:entry("a", "a"), map:entry("b", "b"), map:entry("c", "c"), map:entry("d", "d") )) return $map1 - $map2
Returns:
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="e"> <map:value xsi:type="xs:string">e</map:value> </map:entry> </map:map>
Wait! Why did it return only the entry for key:'e'
? This is due to the ordering of the difference, that only computes the difference of keys in $map1 not present in $map2. In order to compute all differences, you must do a bit more math to solve, but the answer is quite simple.
($map1 - $map2) + ($map2 - $map1)
This returns keys (c
, d
, e
).
Inversion of a map is what it sounds like — you are simply inverting the map:map so each value becomes a key and every key becomes a value. Since all keys are strings, you will lose the type if the values are non-string types. The string function will be computed for all non-string values during inversion.
xquery version "1.0-ml"; let $map := map:new(( map:entry("a", 1), map:entry("b", ("v1", "v2")), map:entry("c", function() {"Hello World"}), map:entry("d", <node>Some node</node>) )) return -$map
Returns:
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="v2"> <map:value xsi:type="xs:string">b</map:value> </map:entry> <map:entry key="function() as item()*"> <map:value xsi:type="xs:string">c</map:value> </map:entry> <map:entry key="<node>Some node</node>"> <map:value xsi:type="xs:string">d</map:value> </map:entry> <map:entry key="1"> <map:value xsi:type="xs:string">a</map:value> </map:entry> <map:entry key="v1"> <map:value xsi:type="xs:string">b</map:value> </map:entry> </map:map>
In the following example, we are assuming we have key/values present in both maps, and we only want those key/values that intersect.
let $map1 := map:new(( map:entry("a", "a"), map:entry("b", "b"), map:entry("e", "e") )) let $map2 := map:new(( map:entry("a", "a"), map:entry("b", "b"), map:entry("c", "c"), map:entry("d", "d") )) return $map1 * $map2
Returns keys (a, b).
It is important to note that the intersects operation is computed on key and value, so in cases where both maps share the same key, but not the same value for that key, then the keys do not intersect.
For inferencing/join, we will focus on a more practical example of joining students’ names to test scores. In the example below, each user is assigned an id noted as the value of the map:entry. Another map stores the id and all the scores for each test. You can see the scores are now joined directly to the name via its id value.
For real-world use case, this could easily be stored in MarkLogic as XML/JSON fragments, with range indexes enabled for (name, id, score). This is where cts:value-co-occurrences should be used to return map as output for (name, id) and (id, score).
let $map1 := map:new(( map:entry("jenny", "a1"), map:entry("bob" , "b1"), map:entry("tom" , "c1"), map:entry("rick" , "d1") )) let $map2 := map:new(( map:entry("a1", (90,95,100,88)), map:entry("b1", (77,68,82,60)), map:entry("c1", (0,0,85,89)) )) return $map1 div $map2
Returns
<map:map >"http://marklogic.com/xdmp/map" >"http://www.w3.org/2001/XMLSchema-instance" >"http://www.w3.org/2001/XMLSchema"> <map:entry key="jenny"> <map:value xsi:type="xs:integer">90</map:value> <map:value xsi:type="xs:integer">95</map:value> <map:value xsi:type="xs:integer">100</map:value> <map:value xsi:type="xs:integer">88</map:value> </map:entry> <map:entry key="tom"> <map:value xsi:type="xs:integer">0</map:value> <map:value xsi:type="xs:integer">0</map:value> <map:value xsi:type="xs:integer">85</map:value> <map:value xsi:type="xs:integer">89</map:value> </map:entry> <map:entry key="bob"> <map:value xsi:type="xs:integer">77</map:value> <map:value xsi:type="xs:integer">68</map:value> <map:value xsi:type="xs:integer">82</map:value> <map:value xsi:type="xs:integer">60</map:value> </map:entry> </map:map>
So we learned a little bit about all the operators, now let’s start to combine them to solve more complicated problems.
One problem encountered often is determining the difference between two structures. Let’s consider two documents that have similar structures, but have differences in ordering or values. We want to create a diff-gram that determines what inserts or updates need to occur between two documents. Using the difference and intersects operators, we can compose a complete diff-gram that runs quite efficiently. In the example below, we take two nodes, iterate each one, put each path inside a map:map
, and then compute the difference using map operators.
xquery version "1.0-ml"; let $version1 := <node> <last-modifed>2001-01-01</last-modifed> <title>Here is a title</title> <subtitle>Same ole title</subtitle> <author>Bob Franco</author> <author>Billy Bob Thornton</author> <added>I am added</added> </node> let $version2 := <node> <last-modifed>2001-01-12</last-modifed> <title>Here is a title</title> <subtitle>Same ole title</subtitle> <author>Billy Bob Thornton</author> <author>James Franco</author> <added1>I am added1</added1> </node> let $version1-map := map:map() let $version2-map := map:map() let $_ := ( (:Map values paths to maps:) $version1/element() ! map:put($version1-map, xdmp:path(.), fn:data(.)), $version2/element() ! map:put($version2-map, xdmp:path(.), fn:data(.)) ) let $same := $version1-map * $version2-map let $inserts := $version1-map - $version2-map let $deletes := $version2-map - $version1-map return <diff>{( map:keys($same) ! <same path="{.}">{map:get($same,.)}</same>, map:keys($deletes) ! <delete path="{.}">{map:get($deletes,.)}</delete>, map:keys($inserts) ! <insert path="{.}">{map:get($inserts,.)}</insert> )}</diff>
Returns:
<diff> <same path="/node/subtitle">Same ole title</same> <same path="/node/title">Here is a title</same> <delete path="/node/last-modifed">2001-01-12</delete> <delete path="/node/added1">I am added1</delete> <delete path="/node/author[2]">James Franco</delete> <delete path="/node/author[1]">Billy Bob Thornton</delete> <insert path="/node/last-modifed">2001-01-01</insert> <insert path="/node/added">I am added</insert> <insert path="/node/author[2]">Billy Bob Thornton</insert> <insert path="/node/author[1]">Bob Franco</insert> </diff>
When working with Excel (2007+) in MarkLogic, you will often need to convert between R1C1 and the index value of a column, and vice-versa. Calculating the column name over 255 columns for every row can be expensive, so computing this lookup table only once can drastically improve the performance of an application. In the example below, the table is only computed once, using the inversion (-) operator to create the reverse direction.
xquery version "1.0-ml"; declare variable $ALPHA-INDEX-MAP := let $map := map:map() let $alpha := ("", (65 to 65+25 ) ! fn:codepoints-to-string(.)) let $calcs := for $c1 in $alpha for $c2 in $alpha for $c3 in $alpha[2 to fn:last()] where $c1 = "" or fn:not($c2 = "") return ($c1 || $c2|| $c3) let $_ := for $col at $pos in $calcs return map:put($map, $col, $pos) return $map ; declare variable $INDEX-ALPHA-MAP := -$ALPHA-INDEX-MAP; (: Which index corresponds to ZA? :) map:get($ALPHA-INDEX-MAP, "ZA"), (: Which alpha corresponds to 32? :) map:get($INDEX-ALPHA-MAP, "32")
Since MarkLogic 7, maps support aggregate functions such as min/max/sum/avg. To perform aggregates, we will use built-in functions corresponding to the aggregate we want to apply and pass the map:map
as the argument. Let’s look at our student scores example and assume that we are getting values from MarkLogic lexicon functions like cts:value-co-occurrences
, illustrated in the sample code below:
let $student-scores := map:new(( map:entry("jenny", (90,95,100,88)), map:entry("bobby", (77,68,82,60)), map:entry("rick", (0,0,85,89)) )) return $student-scores
Now we want to compute the average score per student:
fn:avg($student-scores) <map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="rick"> <map:value xsi:type="xs:decimal">43.5</map:value> </map:entry> <map:entry key="jenny"> <map:value xsi:type="xs:decimal">93.25</map:value> </map:entry> <map:entry key="bobby"> <map:value xsi:type="xs:decimal">71.75</map:value> </map:entry> </map:map>
Now let’s compute the max score per student:
fn:max($student-scores) <map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="rick"> <map:value xsi:type="xs:integer">89</map:value> </map:entry> <map:entry key="jenny"> <map:value xsi:type="xs:integer">100</map:value> </map:entry> <map:entry key="bobby"> <map:value xsi:type="xs:integer">82</map:value> </map:entry> </map:map>
The same can be done for min:
fn:min($student-scores) <map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="rick"> <map:value xsi:type="xs:integer">0</map:value> </map:entry> <map:entry key="jenny"> <map:value xsi:type="xs:integer">88</map:value> </map:entry> <map:entry key="bobby"> <map:value xsi:type="xs:integer">60</map:value> </map:entry> </map:map>
… and sum:
fn:min($student-scores) <map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="rick"> <map:value xsi:type="xs:integer">174</map:value> </map:entry> <map:entry key="jenny"> <map:value xsi:type="xs:integer">373</map:value> </map:entry> <map:entry key="bobby"> <map:value xsi:type="xs:integer">287</map:value> </map:entry> </map:map>
And finally, count:
fn:count($student-scores) returns 1
Well that was not expected … or was it? The count
function is ambiguous in what should it count. Should it count the values of the map by key or the map itself? A simple solution is as below and satisfies our need to count the number of tests (although not as efficient as using an aggregate function):
map:new( map:keys($student-scores) ! map:entry(., fn:count(map:get($student-scores, .))) )
Returns:
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <map:entry key="rick"> <map:value xsi:type="xs:unsignedLong">4</map:value> </map:entry> <map:entry key="jenny"> <map:value xsi:type="xs:unsignedLong">4</map:value> </map:entry> <map:entry key="bobby"> <map:value xsi:type="xs:unsignedLong">4</map:value> </map:entry> </map:map>
I hope this helps you understand the potential power of using map:map operators and aggregates.
View all posts from Gary Vidal on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites