Understanding map:map operators, aggregates and use cases

November 25, 2014 Data & AI, MarkLogic

This article contains a comprehensive discussion on maps and operators. In a previous post, Returning Lexicon Values using XPath Expressions, I made a quick reference to map:map operators and demonstrated how to use it to compute the difference of two maps in order to filter results for processing. Here, I’ll be providing a more in-depth discussion on how exactly maps work, as well as delve deeper into the powerful features they provide.

What is a Map?

To begin, let’s formally define some basic constructs for how a map:map works. Maps are in-memory key/value structures, introduced in MarkLogic version 5. Out-of-the-box, maps provide the ability to perform fast in-memory inserts/updates/lookups/existence checks. Maps are also mutable structures, so you can change them without creating copies like you would changing XML Node types. This allows all operations to execute very efficiently with no side-effects, not common in functional programming languages like XQuery.

To review more of the basic operations you can perform with maps, refer to the MarkLogic map functions page.

Basic Map Operations

Here are a few basic operations you can do with map operators:

  • map:map – Creates a new map or creates a map with data from an xml serialization of a map:map.
  • map:put – Puts a value by key into a map
  • map:get – Gets a value from a map by key
  • map:keys – Returns all the keys present in a map.
  • map:delete – Removes a value by key from a map.
  • map:count – Returns the count of the keys in the map
  • map:clear – Clears the map of all key/values
  • map:new – Creates a new map:map, but accepts a sequence of existing or map:entry(k,v). This is a very composable and convenient way to join multiple maps together.
  • map:entry – Create map:map with a single key/value structure.
  • map:contains – Returns true if the key exists in the map.
  • map:with– Updates a map inserting a value into it at the given key.

Lexicon Support for Maps

Maps are also supported as output for many lexicon-based functions, including:

  • scalar lexicon functions (cts:element-*-values,cts:values) – Returns a map where the key and the value are the same.
  • value-co-occurrence functions (cts:element-*-value-co-occurrences, cts:value-co-occurences) – Returns a map where the key is equal to the first tuple and the value is a sequence of the second tuple.

Example Using Maps

Storing Values, Nodes, and Functions

Let’s now walk through various examples of using maps, that way we can get a better understanding of how and why to use them. This first example sticks a series of different value types inside a map, then walks the keys to describe each value.

xquery version "1.0-ml";
let $map := map:map()
let $puts := (
  map:put($map, "a", "a"),
  map:put($map, "b", <node>Some node</node>),
  map:put($map, "c", (1,2,3,4,5)),
  map:put($map, "d", function() {"Hello World"})
)
for $key in map:keys($map)
return
  fn:concat("Key:", $key, " is ", xdmp:describe(map:get($map, $key))

When executed in Query Console, the code above returns:

Key:c is (1, 2, 3, ...)
Key:b is <node>Some node</node>
Key:d is function() as item()*
Key:a is "a"

As you can see, a map can flexibly store values, nodes, and even functions.

Passing Maps by Reference

Another useful feature of maps is that they can be passed around by reference, which allows information to be shared between different modules/transactions while maintaining a single instance across them. In the example below, we are going to take attendance by allowing multiple spawned functions to add entries to a global map across separate transactions. In the final function, we will check if a value is present and answer accordingly.

xquery version "1.0-ml";
let $map := map:map()
let $foo := xdmp:spawn-function(function() {
  map:put($map, "foo", "Foo is here")
})
let $bar := xdmp:spawn-function(function() {
  map:put($map,"bar", "Bar is hear(yawn)")
})
let $baz := xdmp:spawn-function(function() {
  if(map:contains($map, "bar")) then 
    map:put($map, "baz", "Baz is here, only if bar is here.")
  else map:put($map, "baz", "Baz is here, but why is bar always late")
})
return
  $map

And this returns:

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="bar">
    <map:value xsi:type="xs:string">Bar is hear(yawn)</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, only if bar is here.</map:value>
  </map:entry>
</map:map>

But, be aware of the fact that each spawn-function call is “non-blocking” for the return, so it could return before all results that come back. In the next example, we will have the “bar” function sleep for one second before it executes its map:put.

...
let $bar := xdmp:spawn-function(function() {
  xdmp:sleep(1000),
  map:put($map,"bar", "I am lazy bar")
})
...
return
  $map|

Which returns…

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, but why is bar always late</map:value>
  </map:entry>
</map:map>

(As you can see “baz” is pretty upset bar is not present.)

To provide “blocking” you can pass result=true option to xdmp:spawn-function or use xdmp:invoke-function in its stead.)

Maps and JSON

Maps are also directly serializable to JSON using xdmp:to-json. In fact, map:map and json:object in MarkLogic are like cousins, as they can be used interchangeably support identity/casting between types. A fundamental difference, however, is that the json:object maintains key order, but map:map does not. So in cases where you care about the ordering of the keys, you can use a json:object and all puts will preserve the order. As seen in the example below, you can compose a json:object and use map functions to populate it with data and render the output to JSON . The following object composes the same json structure using map:map and json:object:

let $json-object := map:map()
let $puts := (
  map:put($json-object, "name", "Gary Vidal"),
  map:put($json-object, "age", 40),
  map:put($json-object, "birthdate", xs:date("1974-09-09"))
)
return
  xdmp:to-json($json-object)

Returns:
{"birthdate":"1974-09-09", "name":"Gary Vidal", "age":40}
let $json-object := json:object()
...
return
  xdmp:to-json($json-object)

Returns:
{"name":"Gary Vidal", "age":40, "birthdate":"1974-09-09"}

In the example above, the order is preserved using json:object, where the map:map is not.

Map Operators

The documentation for map operators can be found at Map Operators.

OperatorDescription
+Computes the union (distinct) of two maps, such as (ex. $map1 + $map2).
-means the difference of two maps (think of it as set difference) (ex $map1 - $map2). This operator also works as an unary operator. So, -B has the keys and values reversed (-$map1)
*means intersection (ex. $map1 * $map2) where only the keys present in both maps are returned.
divmeans inference. So A div B would consists of keys from the map A, and values from the map B, where A’s value is equal to B’s key or simply a join.
mod(ex. $map1 mod $map2) is equivalent to -A div B

Now that we have reviewed some of the basic operators, let’s see them in action with some examples!

Union (Distinct) ($map + $map)

xquery version "1.0-ml";
let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b")
))
let $map2 := map:new((
  map:entry("a", "b"),
  map:entry("b", "b"),
  map:entry("c", "c")
))
return
  $map1 + $map2

Returns:

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="b">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
  <map:entry key="c">
    <map:value xsi:type="xs:string">c</map:value>
  </map:entry>
  <map:entry key="a">
    <map:value xsi:type="xs:string">a</map:value>
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
</map:map>

As you can see, all keys from $map1 and $map2 are combined and only the distinct values are returned. It’s important to understand this distinction, because if you are counting the values after you perform the union, you will get the distinct union’s count, not a merge of the two maps where the duplicate values were repeated.

Difference ($map – $map)

In the example below, we want to compute the difference between two maps.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
  $map1 - $map2

Returns:

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="e">
    <map:value xsi:type="xs:string">e</map:value>
  </map:entry>
</map:map>

Wait! Why did it return only the entry for key:'e'? This is due to the ordering of the difference, that only computes the difference of keys in $map1 not present in $map2. In order to compute all differences, you must do a bit more math to solve, but the answer is quite simple.

($map1 - $map2) + ($map2 - $map1)

This returns keys (cde).

Inversion (-$map)

Inversion of a map is what it sounds like — you are simply inverting the map:map so each value becomes a key and every key becomes a value. Since all keys are strings, you will lose the type if the values are non-string types. The string function will be computed for all non-string values during inversion.

xquery version "1.0-ml";
let $map := map:new((
  map:entry("a", 1),
  map:entry("b", ("v1", "v2")),
  map:entry("c", function() {"Hello World"}),
  map:entry("d", <node>Some node</node>)
))
return -$map

Returns:

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="v2">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
  <map:entry key="function() as item()*">
    <map:value xsi:type="xs:string">c</map:value>
  </map:entry>
  <map:entry key="&lt;node&gt;Some node&lt;/node&gt;">
    <map:value xsi:type="xs:string">d</map:value>
  </map:entry>
  <map:entry key="1">
    <map:value xsi:type="xs:string">a</map:value>
  </map:entry>
  <map:entry key="v1">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
</map:map>

Intersects Operator ($map * $map)

In the following example, we are assuming we have key/values present in both maps, and we only want those key/values that intersect.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
   $map1 * $map2

Returns keys (a, b).

It is important to note that the intersects operation is computed on key and value, so in cases where both maps share the same key, but not the same value for that key, then the keys do not intersect.

Inference/Join Operator ($map1 div $map2)

For inferencing/join, we will focus on a more practical example of joining students’ names to test scores. In the example below, each user is assigned an id noted as the value of the map:entry. Another map stores the id and all the scores for each test. You can see the scores are now joined directly to the name via its id value.

For real-world use case, this could easily be stored in MarkLogic as XML/JSON fragments, with range indexes enabled for (name, id, score). This is where cts:value-co-occurrences should be used to return map as output for (name, id) and (id, score).

let $map1 := map:new((
   map:entry("jenny", "a1"),
   map:entry("bob"  , "b1"),
   map:entry("tom"  , "c1"),
   map:entry("rick" , "d1")
))

let $map2 := map:new((
  map:entry("a1", (90,95,100,88)),
  map:entry("b1", (77,68,82,60)),
  map:entry("c1", (0,0,85,89))
))
return
 $map1 div $map2

Returns

<map:map >"http://marklogic.com/xdmp/map" 
    >"http://www.w3.org/2001/XMLSchema-instance" 
    >"http://www.w3.org/2001/XMLSchema">
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">90</map:value>
    <map:value xsi:type="xs:integer">95</map:value>
    <map:value xsi:type="xs:integer">100</map:value>
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="tom">
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">85</map:value>
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="bob">
    <map:value xsi:type="xs:integer">77</map:value>
    <map:value xsi:type="xs:integer">68</map:value>
    <map:value xsi:type="xs:integer">82</map:value>
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

Solving More Complex Issues

So we learned a little bit about all the operators, now let’s start to combine them to solve more complicated problems.

Diffing Data

One problem encountered often is determining the difference between two structures. Let’s consider two documents that have similar structures, but have differences in ordering or values. We want to create a diff-gram that determines what inserts or updates need to occur between two documents. Using the difference and intersects operators, we can compose a complete diff-gram that runs quite efficiently. In the example below, we take two nodes, iterate each one, put each path inside a map:map, and then compute the difference using map operators.

xquery version "1.0-ml";
let $version1 := 
  <node>
    <last-modifed>2001-01-01</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Bob Franco</author>
    <author>Billy Bob Thornton</author>
    <added>I am added</added>  
  </node>
let $version2 := 
  <node>
    <last-modifed>2001-01-12</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Billy Bob Thornton</author>
    <author>James Franco</author>
    <added1>I am added1</added1>
  </node>
let $version1-map := map:map()
let $version2-map := map:map()
let $_ := ( 
  (:Map values paths to maps:)
  $version1/element() ! map:put($version1-map, xdmp:path(.), fn:data(.)),
  $version2/element() ! map:put($version2-map, xdmp:path(.), fn:data(.))
)
let $same    := $version1-map * $version2-map
let $inserts := $version1-map - $version2-map
let $deletes := $version2-map - $version1-map
return 
  <diff>{(
    map:keys($same)    ! <same path="{.}">{map:get($same,.)}</same>,
    map:keys($deletes) ! <delete path="{.}">{map:get($deletes,.)}</delete>,
    map:keys($inserts) ! <insert path="{.}">{map:get($inserts,.)}</insert>
  )}</diff>

Returns:

<diff>
  <same path="/node/subtitle">Same ole title</same>
  <same path="/node/title">Here is a title</same>
  <delete path="/node/last-modifed">2001-01-12</delete>
  <delete path="/node/added1">I am added1</delete>
  <delete path="/node/author[2]">James Franco</delete>
  <delete path="/node/author[1]">Billy Bob Thornton</delete>
  <insert path="/node/last-modifed">2001-01-01</insert>
  <insert path="/node/added">I am added</insert>
  <insert path="/node/author[2]">Billy Bob Thornton</insert>
  <insert path="/node/author[1]">Bob Franco</insert>
</diff>

Creating Fast 2-Way Lookup Tables

When working with Excel (2007+) in MarkLogic, you will often need to convert between R1C1 and the index value of a column, and vice-versa. Calculating the column name over 255 columns for every row can be expensive, so computing this lookup table only once can drastically improve the performance of an application. In the example below, the table is only computed once, using the inversion (-) operator to create the reverse direction.

xquery version "1.0-ml";
declare variable $ALPHA-INDEX-MAP := 
  let $map := map:map()
  let $alpha := ("", (65 to 65+25 ) ! fn:codepoints-to-string(.))
  let $calcs := 
    for $c1 in $alpha
    for $c2 in $alpha
    for $c3 in $alpha[2 to fn:last()]
    where $c1 = "" or fn:not($c2 = "")
    return 
      ($c1 || $c2|| $c3)
  let $_ := for $col at $pos in $calcs return map:put($map, $col, $pos)
  return
    $map
;  
declare variable $INDEX-ALPHA-MAP := -$ALPHA-INDEX-MAP;

(: Which index corresponds to ZA? :)
map:get($ALPHA-INDEX-MAP, "ZA"),
(: Which alpha corresponds to 32? :)
map:get($INDEX-ALPHA-MAP, "32")

Aggregating map:map Data

Since MarkLogic 7, maps support aggregate functions such as min/max/sum/avg. To perform aggregates, we will use built-in functions corresponding to the aggregate we want to apply and pass the map:map as the argument. Let’s look at our student scores example and assume that we are getting values from MarkLogic lexicon functions like cts:value-co-occurrences, illustrated in the sample code below:

let $student-scores := map:new((
  map:entry("jenny", (90,95,100,88)),
  map:entry("bobby", (77,68,82,60)),
  map:entry("rick",  (0,0,85,89))
))
return
  $student-scores

Now we want to compute the average score per student:

fn:avg($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:decimal">43.5</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:decimal">93.25</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:decimal">71.75</map:value>
  </map:entry>
</map:map>

Now let’s compute the max score per student:

fn:max($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">100</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">82</map:value>
  </map:entry>
</map:map>

The same can be done for min:

fn:min($student-scores)

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">0</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

… and sum:

fn:min($student-scores)

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">174</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">373</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">287</map:value>
  </map:entry>
</map:map>

And finally, count:

fn:count($student-scores)
returns
1

Well that was not expected … or was it? The count function is ambiguous in what should it count. Should it count the values of the map by key or the map itself? A simple solution is as below and satisfies our need to count the number of tests (although not as efficient as using an aggregate function):

map:new(
  map:keys($student-scores) ! map:entry(., fn:count(map:get($student-scores, .)))
)

Returns:

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
</map:map>

I hope this helps you understand the potential power of using map:map operators and aggregates.

Additional Resources

Gary Vidal