Punctuation in XPath, part 2: slash (“/”)

May 31, 2011 Data & AI, MarkLogic

Posts in this series:Punctuation in XPath, part 1: dot (“.”) Punctuation in XPath, part 2: slash (“/”)Punctuation in XPath, part 3: “@” and “..”Punctuation in XPath, part 4: predicates (“[…]”)Punctuation in XPath, part 5: “//”

The slash operator (“/”) in XPath is the connector you use to connect steps in a path expression. A path expression can return a sequence of nodes or atomic values (but not both). Let’s look at some examples, based on the following simple document (go ahead and enter this into CQ):

declare variable $doc := document {
  <groups>
    <group>
      <item>first item</item>
      <item>second item</item>
    </group>
    <group>
      <item>third item</item>
      <item>fourth item</item>
    </group>
  </groups>
};

Given the above declaration, the following four-step expression will return a node sequence containing four <item> elements (which you can verify if you add this to the text box in CQ and then click the “Text” button):

$doc/groups/group/item

Path expressions can also return sequences of atomic values. The following expression returns a sequence of strings (“first”, “second”, “third”, “fourth”). This is done by referring to “.” — the context item expression.

$doc/groups/group/item/substring-before(.,' item')

Restrictions on “/”

There are a few restrictions on the “/” operator you should be aware of. Firstly, you can’t apply “/” to a sequence of atomic values. For example, if you wanted to convert the above sequence to upper-case, you might be tempted to write the following (try this in CQ and see what happens):

$doc/groups/group/item/substring-before(.,' item')/upper-case(.) (:illegal:)

To do this, you’d need to use a “for” expression instead:

for $str in $doc/groups/group/item/substring-before(.,' item')
return upper-case($str)

The upshot is that only the last (rightmost) step in a path expression can return a sequence of atomic values (as opposed to nodes). Similarly, you’ll get an error if you try to do this:

(1 to 3)/concat('#',.)

This returns an error: XDMP-NOTANODE: (err:XPTY0019) (1 to 3)/fn:concat(“#”, .) — 1 is not a node

Again, you’d instead have to use a “for” expression for it to work:

for $n in (1 to 3) return concat('#',$n)

Another restriction on “/” is that the expression as a whole may not contain a mix of nodes and atomic values; it has to be one or the other. For example, you can’t use a path expression to return a sequence of pairs of items and their string-values:

$doc/groups/group/item/(., string(.))    (:illegal:)

And as previously, you need to use a “for” expression instead for it to work:

for $item in $doc/groups/group/item return ($item, string($item))

What does “/” actually do?

Each path expression step (everything between the “/” marks) is evaluated once for each of the nodes returned by the expression to its left. Given our original example:

$doc/groups/group/item

You can expand this out to an equivalent expression using “for” (in this case, we’re still using “/” but only against one node at a time):

for $step1 in $doc return                     (: $step1 bound once :)
  for $step2 in $step1/groups return          (: $step2 bound once :)
    for $step3 in $step2/group return         (: $step3 bound twice :)
      for $step4 in $step3/item return $step4 (: $step4 bound four times :)

The “$doc”, “groups”, and “group” steps are each evaluated once, while the “item” step is evaluated twice (once for each <group>), yielding a total of four <item> elements.

However, “/” cannot simply be thought of as shorthand for something else. Not only does it have restrictions on its use (noted above), it also has some additional behavior:

  • it sorts each node sequence in document order, and
  • it removes duplicate node references.

We can create arbitrary node sequences in XPath, using the comma operator. The following sequence contains duplicates and is not in document order:

($doc/groups/group[2],
 $doc/groups/group[1],
 $doc/groups/group[1])

If we do nothing more than apply the “/” operator to this sequence, the result will be re-sorted in document order, with duplicates removed:

($doc/groups/group[2],
 $doc/groups/group[1],
 $doc/groups/group[1])/.

The innocuous looking “/.” at the end has the effect of removing duplicates and sorting the result in document order.

The takeaway is that if you use “/” and the result is a node sequence, you can be assured that the sequence will never contain the same node more than once. Also, the nodes will be sorted in document order. (The exception in XQuery is when you have set the ordering mode to “unordered”, in which case the resulting order is implementation-dependent. MarkLogic Server’s default ordering mode is “ordered”.)

“/” as an alternative to “for”

Since arbitrary expressions can occur as path expression steps, that means you can use “/” to iterate over a sequence of nodes and perform some action, such as create new nodes or update the database. The following expression creates four new <result> elements, one for each <item> element:

$doc/groups/group/item/(<result>{string(.)}</result>)

Here’s a pattern I’ve used several times:

xdmp:xslt-invoke("create-docs.xsl",$doc)/xdmp:document-insert(base-uri(.),.)

The xdmp:xslt-invoke() function returns a sequence of document nodes (which can each be associated with an output URI using <xsl:result-document>). These are then each inserted into the database using xdmp:document-insert().

In both of the above cases, a “for” expression could have been used instead, but it’s nice to know that “/” can be used too.

Leading slash

The “/” operator can also occur by itself or at the beginning of a path expression. In either of these cases, it has a special meaning: “the root of the current document” a.k.a. “the document node of the document containing the context node.” (This makes sense when you consider that “/” means the root of the file system in Unix-style file paths, the original inspiration for XPath.)

To be even more precise, “/” by itself can be thought of as an abbreviation for this:

(fn:root(self::node()) treat as document-node())

This means that if the context node is not defined (or if it doesn’t have a document node ancestor), then it’s an error to use “/” by itself (but keep reading for an exception to this that’s specific to MarkLogic Server).

And “/” at the beginning of an expression is short for the following (same as above except with a trailing slash):

(fn:root(self::node()) treat as document-node())/

Another way of thinking about this is that /foo is short for (/)/foo, and thus it’s also an error to use “/” at the beginning of an expression when the context node is not defined. To prove this, enter the following in CQ:

xquery version "1.0";
/foo

However, MarkLogic Server, in its “1.0-ml” implementation of XQuery, provides a convenient shorthand that makes the above expression legal. If we change the XQuery version declaration, we’ll see different behavior:

xquery version "1.0-ml";
/foo

It’s not very useful, but it’s a longwinded way of returning the same result as “$doc” by itself. In this case, the expression “/” (parenthesized to make it able to occur as a step expression) is evaluated four times (once for each <item>), yielding the same document node in each case, and since duplicates are removed, the result contains just the one document node that $doc is bound to.

Summary

To summarize, “/” is what you use to build a path expression. A path expression:

  • can return a sequence of nodes
  • can return a sequence of atomic values
  • may not return a mixture of both
  • returns nodes with duplicates removed, and in document order (unless the XQuery ordering mode is “unordered”)

Leading slash returns the root of the current document, unless there is no current document (context node) and you’re using 1.0-ml, in which case it returns all documents in the database.

Further Learning: 

Punctuation in XPath, part 3: “@” and “..”

Punctuation in XPath, part 4: predicates (“[…]”)

Punctuation in XPath, part 5: “//”

Evan Lenz