Posts in this series:Punctuation in XPath, part 1: dot (“.”)Punctuation in XPath, part 2: slash (“/”)Punctuation in XPath, part 3: “@” and “..” Punctuation in XPath, part 4: predicates (“[…]”)Punctuation in XPath, part 5: “//”
We’ve already seen some examples of predicates that use square brackets (“[…]”). In this post, we’ll look at exactly how they work, using the following sample document:
declare variable $doc := document { <people> <group> <person>Peter</person> <person>Paul</person> <person>Mary</person> </group> <group> <person>June</person> <person>Ward</person> <person>Beaver</person> </group> </people> };
Predicates are used to filter a sequence based on some test. Consider the following expression:
$doc/people/group/person[. eq 'June']
This expression selects all <person> elements and then filters out those elements whose string-value is not equal to “June”. The test expression . eq 'June'
must return true for the node to be included in the final result.
Positional Predicates
Predicates can also be used to select nodes at a particular position within the sequence. For example, this expression selects each first <person> child of its parent:
$doc/people/group/person[1]
In this case, since there are two <group> elements, we end up with two people in the result: Peter and June. As you can see, a number value in a predicate is treated differently than a boolean. If the test expression returns a number (as in the above case), then the predicate is interpreted like this:
$doc/people/group/person[position() eq 1]
However, you shouldn’t think of “[1]” merely as syntax sugar for “[position() eq 1]”. Any expression that returns a number is evaluated this way. For example, the number could be returned by a function call or stored in a variable, as in this case:
$doc/people/group/person[$var]
If the value of “$var” is a number, then it is treated as a positional predicate. If it’s anything else, however, then it’s treated like a normal test expression, using the normal rules for converting values to a boolean. For example, an empty string or an empty sequence are converted to false.
What if you only want the first <person> among all the <person> elements in the document, rather than every first child? In that case, you’d have to apply the predicate to the whole expression to its left ($doc/people/group/person), rather than just the last step (person). This can be done by using parentheses:
($doc/people/group/person)[1]
In this case, the predicate is no longer a part of the “person” axis step. Instead, it filters the entire expression to its left, returning only Peter.
Forward and Reverse Axes
Whenever a predicate is part of an axis step, it is treated specially depending on which axis is being used. In particular, what position() returns inside a predicate is dependent on whether a forward or reverse axis is being used. For forward axes, positions are assigned using document order. For reverse axes, positions are assigned using reverse document order. As you may recall from the last article, $doc/people/group/person
is actually short for:
$doc/child::people/child::group/child::person
Since the child::
axis is one of the forward axes, that means that position() is assigned in document order. Putting it into the context of the document above, that means the context positions for elements returned by the last step (person
) are assigned as follows:
Node | Context position |
---|---|
<person>Peter</person> | 1 |
<person>Paul</person> | 2 |
<person>Mary</person> | 3 |
<person>June</person> | 1 |
<person>Ward</person> | 2 |
<person>Beaver</person> | 3 |
Hence $doc/people/group/person[1]
returns both Peter and June, as we saw above. The “person[1]” step is evaluated twice (once for each <group>), which is why the numbering restarts for June in the above table.
Things are different if we use one of the five reverse axes (the other eight axes are all forward axes):
parent::
ancestor::
ancestor-or-self::
preceding::
preceding-sibling::
In axis steps that use one of these axes, the context positions are assigned in reverse document order. Let’s start with a node deep within the document:
declare variable $beaver := $doc/people/group/person[. eq 'Beaver'];
Starting from <person>Beaver</person>, we can select some node sequences that come before it, using the reverse axes:
Expression | What/”who” it selects |
---|---|
$beaver/preceding::person | Peter, Paul, Mary, June, and Ward |
$beaver/preceding-sibling::person | June and Ward |
$beaver/ancestor::* | <people> and <group> |
If you were to then add a positional predicate to the step, it would select the first one in reverse document order. In other words, “[1]” selects the last node in document order.
Expression | What/”who” it selects |
---|---|
$beaver/preceding::person[1] | Ward |
$beaver/preceding-sibling::person[1] | Ward |
$beaver/ancestor::*[1] | <group> |
Taking the first example, using the “preceding” axis, here are the context positions as they’re assigned, working backwards from <person>Beaver</person>:
Node | Context position |
---|---|
<person>Ward</person> | 1 |
<person>June</person> | 2 |
<person>Mary</person> | 3 |
<person>Paul</person> | 4 |
<person>Peter</person> | 5 |
It’s easy to see from this that $beaver/preceding::person[1]
returns Ward, $beaver/preceding::person[2]
returns June, etc.
Now, here’s the surprising part: axis steps always return nodes in document order. What? Didn’t we just see an example of them being returned in reverse document order? Well, no. What we saw was the context positions being assigned in reverse document order. The node sequence that is actually returned will still always be in document order. To prove this, we can take the predicate outside the step (again, by adding parentheses):
Expression | What/”who” it selects |
---|---|
($beaver/preceding::person)[1] | Peter |
($beaver/preceding-sibling::person)[1] | June |
($beaver/ancestor::*)[1] | <people> |
In the above cases, the predicate is not a part of an axis step and so it doesn’t matter what expression is to the left. It is simply filtered in sequence order. In each case, the parenthesized expression returns a sequence of nodes in document order (because path expressions returning nodes always return nodes in document order).
This is true for axis steps in general, even if “/” isn’t used. If a context node is defined (as it normally is in XSLT), then (ancestor::*)[1]
is a legal expression and always returns the outermost element ancestor of the context node (first in document order), whereas ancestor::*[1]
always returns the parent element of the context node (first in reverse document order).
Summary
To understand positional predicates, you need to be clear about what sequence of nodes is being filtered and how the context positions are assigned. In the general case, context positions are assigned according to the order of the sequence being filtered. The exception to this is when the predicate is part of an axis step that uses a reverse axis.
I’ll leave you with a teaser for the next and final post in this series (about what “//” means): what nodes does the following expression select, and why?
$doc//person[1]
Ready for the last part of the series?Go to Punctuation in XPath, part 5: “//” to learn what double-slashes mean in XPath.