W3C XQuery is a query language for XML. The most common use cases for XQuery involve XML publishing to create XML for Web messages, dynamic web sites, or publishing applications. The original data may be found in XML files, or it may be in a data store such as a relational database.
Some of the queries in this tutorial operate on XML stored in files, some on an XML view of a relational database, and some work on both. All of the examples in this tutorial have been tested with DataDirect XQuery. Because all XQuery implementations do not access relational data in the same way, this tutorial uses fn:collection(), which DataDirect XQuery uses to access relational tables.
Most XQuery functionality, such as arithmetic operators, comparisons, function calls, and functions, is familiar to most programmers. This tutorial focuses on the three major capabilities of XQuery that make it distinctive, and which are fundamental to processing and creating XML.
Together, these capabilities make it easier to process and create XML with XQuery than with other languages when using data from XML or relational sources.
The three major capabilities of XQuery that make it distinctive are the ability to:
For a more extensive XQuery tutorial, see XQuery: A Guided Tour.
DataDirect XQuery, an implementation of XQuery, allows you to query both relational and XML sources and combine the data into one result.
Just as SQL needs to be able to access any row or column in a relational table, XQuery needs to be able to access any node in an XML document. XML structures have both hierarchy and sequence, and can contain complex structure. Path expressions directly support hierarchy and sequence, and allow you to navigate any XML structure.
In this section, we discuss path expressions using an XML document, and then show path expressions used on an XML view of a relational table.
Let's explore path expressions using the following XML document.
<?
xml
version
=
"1.0"
?>
<
portfolio
id
=
"Maggie"
>
<
name
>
<
first
>Maggie</
first
>
<
last
>Pristera</
last
>
</
name
>
<
stocks
>
<
stock
>
<
ticker
>AMZN</
ticker
>
<
shares
>3000</
shares
>
</
stock
>
<
stock
>
<
ticker
>EBAY</
ticker
>
<
shares
>4000</
shares
>
</
stock
>
<
stock
>
<
ticker
>IBM</
ticker
>
<
shares
>2500</
shares
>
</
stock
>
<
stock
>
<
ticker
>PRGS</
ticker
>
<
shares
>23</
shares
>
</
stock
>
</
stocks
>
</
portfolio
>
fn:doc() returns a document. The following example shows how to use fn:doc() with an absolute URL.
doc("file:///c:/data/xml/portfolio.xml")
The following example shows how to use fn:doc() with a relative URL.
doc("portfolio.xml")
By setting the Base URI, you can set the directory that is used to resolve relative URLs.
declare base-uri "file:///c:/data/xml/";
doc("portfolio.xml")
A path expression consists of a series of one or more "steps", separated by a slash (/ ) or double slash (//). Every step evaluates to a sequence of nodes. For example, consider the expression:
doc("portfolio.xml")/portfolio/name
The first step, doc("portfolio.xml")
, returns a document node that represents the portfolio document.
The second step, portfolio
, is a name test that specifies the name of an element; it returns the portfolio element at the top of the document, which is a child of the document node.
The third step, name
, returns the element named "name", which is a child of the portfolio element.
Here is the result of the preceding query expression.
<
name
>
<
first
>Maggie</
first
>
<
last
>Pristera</
last
>
</
name
>
If a name test is preceded by the @ character, the name test matches an attribute rather than an element. For example, the expression doc("portfolio.xml")/portfolio/@id
returns the id attribute of the portfolio element.
The double slash (//) allows steps to operate on any descendant of a node. For example, the expression doc("portfolio.xml")//name
matches any element named name, anywhere in the portfolio document.
A predicate can be added to a step to set conditions for matching nodes. Predicates often set a condition on the children of a node. For example, the following path matches stock elements that contain a ticker element with the value "AMZN".
doc("portfolio.xml")//stock[ticker='AMZN']
Using the sample data, this expression produces the following result:
<
stock
>
<
ticker
>AMZN</
ticker
>
<
shares
>3000</
shares
>
</
stock
>
Conditions in a predicate can be combined using "and" and "or", as in the following expression.
doc("portfolio.xml")//stock[ticker='AMZN' or ticker='EBAY']
Conditions can be negated using fn:not(); for example, the following expression matches stock elements that do not have a ticker element with the value "AMZN":
doc("portfolio.xml")//stock[not(ticker='AMZN')]
One type of predicate is a numeric predicate, which sets a condition on the position of a node in a sequence. For example, the following expression finds the first stock element in a portfolio.
doc("portfolio.xml")//stocks/stock[1]
To understand how numeric predicates work in XQuery, you must know how XQuery evaluates a slash (/), as described in the following steps:
For example, in the preceding expression, when the numeric predicate is evaluated, the context node is a stocks element, the name test stock evaluates to a sequence of stock elements, and the numeric predicate matches the first stock in this sequence.
The following expression matches the first ticker element on each stock element.
doc("portfolio.xml")//stock/ticker[1]
To get the first ticker element in the document, use parentheses to make the expression on the left of the numeric predicate evaluate to the sequence of all ticker elements in the document.
(doc("portfolio.xml")//stock/ticker)[1]
When XQuery is used to query relational data, relational tables are treated as though they are XML documents, and path expressions work the same way as they do for XML. Because relational tables have a simple structure, path expressions used for tables are usually simple.
No standard way exists to access a relational table in XQuery, so each XQuery implementation has its own way of doing this. In DataDirect XQuery, we use fn:collection() to access a relational table. For example, the following expression accesses the holdings table.
collection('holdings')
Each XQuery implementation must also decide how to map relational tables into XML in the XML view. The SQL 2003 standard has defined a standard set of mappings for this purpose as part of SQL/XML. Here is a SQL/XML mapping of the holdings table; this mapping represents each row as a holdings element, and represents each column of the table (userid, stockticker, shares) as an element that is a child of the holdings element.
<
holdings
>
<
userid
>Jonathan</
userid
>
<
stockticker
>AMZN</
stockticker
>
<
shares
>3000</
shares
>
</
holdings
>
...
<
holdings
>
<
userid
>Minollo</
userid
>
<
stockticker
>AMZN</
stockticker
>
<
shares
>3000</
shares
>
</
holdings
>
...
Once you understand the structure of the XML view, you can easily see how path expressions are applied to it. For example, the following expression finds holdings for the user whose userid is "Minollo".
collection('holdings')/holdings[userid='Minollo']
Because relational data is queried as if it were XML, some people think that relational tables are actually extracted from the database, turned into XML documents, and then queried, but this would be very inefficient. To the user, DataDirect XQuery makes all data look like XML, but to a SQL database, the implementation speaks SQL. Before evaluating the preceding expression, DataDirect XQuery converts it to a SQL expression similar to this one:
SELECT userid, stockticker, shares
FROM holdings
WHERE userid='Minollo'
Now that we have seen how to locate anything in an XML document or a relational table, let's learn how to create new XML structures using XML constructors.
The most simple constructors are literal XML constructors, which use the same syntax as XML. For example, the following XML text is also an XQuery expression that creates the equivalent XML structure.
<
stock
role
=
'eg'
>
<
ticker
>AMZN</
ticker
>
<
shares
>3000</
shares
>
</
stock
>
This example uses only elements and attributes, but processing instructions, comments, and CDATA sections can also be used in XML constructors.
In literal XML constructors, you can use curly braces ({ }) to add content that is computed when the query is run. This is called an enclosed expression. For example, the following expression creates a date element whose content is the current date, which is computed when the query is run.
<
date
>{ current-date() }</
date
>
The result would be an element named date with the current date.
To see why enclosed expressions are necessary, consider the following expression:
<
date
> current-date() </
date
>
This expression evaluates to the following XML element:
<
date
> current-date() </
date
>
Path expressions are frequently used in enclosed expressions. The following expression creates a portfolio element for Minollo, and then extracts Minollo's holdings from the holdings table.
<
portfolio
name
=
'Minollo'
>
{ collection('holdings')/holdings[userid='Minollo'] }
</
portfolio
>
The following document describes how to install, configure and run DataDirect XQuery® examples. Download the example for free today and learn how easy it is to integrate, query, and publish heterogeneous data sources using DataDirect XQuery®.
XQuery has an expression called a FLWOR expression, which is similar to a SQL Select statement that that has From and Where clauses. FLWOR is pronounced "flower", and is an acronym for the keywords used to introduce each clause (for, let, where, order by, and return).
Here is a FLWOR expression that returns holdings for AMZN.
for $h in collection('holdings')/holdings
where $h/stockticker = 'AMZN'
order by $h/shares
return $h
In the preceding query, the FLWOR expression performs the following functions:
FLWOR expressions are frequently used to combine related information. The possible combinations are generated by using variables in the for clause and using a where clause to filter out combinations that are not useful. This is known as a "join". Consider the following expression:
for $u in collection('users')/users,
$h in collection('holdings')/holdings
where $u/userid=$h/userid
order by $u/lastname, $u/lastname
return
<
holding
>
{
$u/firstname,
$u/lastname,
$h/stockticker,
$h/shares
}
</
holding
>
This expression finds every pair of users elements and holdings elements whose userid child element has the same value, and then builds a holding element that describes the user and his holdings.
Now, let's look at a FLWOR expression that uses a let clause:
let $h := collection('holdings')/holdings
return count($h)
A let clause binds a variable to a sequence, which often contains more than one item. In the preceding query, $h is bound to all of the holdings elements in the collection, and the return clause is evaluated. Note the difference between a for clause and a let clause: A for clause always iterates over a sequence, binding a variable to each item; a let clause simply binds a variable to the entire sequence.
In the preceding expression, the result is 8
. In contrast, if you use the following for clause:
for $h in collection('holdings')/holdings
return count($h)
The result is a sequence of numbers: 1 1 1 1 1 1 1 1
.
In some cases, you may find it useful to combine for and let clauses. In the following expression, these two clauses are combined to produce a result that counts the number of stock holdings for each user.
for $u in collection('users')/users
let $h := collection('holdings')/holdings[userid=$u/userid]
order by $u/lastname, $u/firstname
return
<
user
nstocks
=
"{count($h)}"
>
{
$u/firstname,
$u/lastname
}
</
user
>
Many applications need to create rich XML structures from relational sources. For example, Web sites generally create hierarchical displays of the data found in a relational database, and web messages are often very complex hierarchical structures. For these applications, XQuery can act as an "XML report writer".
The database tables used in this section are as follows:
users Table
userid firstname lastname othername
Minollo Carlo Innocenti
Maggie Maggie Pristera William
holdings Table
userid stockticker shares
Maggie PRGS 23
Minollo PRGS 4000000
.
.
statistical Table
id companyname ticker percentagechange annualrevenues location
1 Apple Computer, Inc. AAPL -40.80% 5250 Cupertino
2 Accrue Software, Inc. ACRU -57.60% 4.21 Freemont
.
.
.
Here is a query that creates a portfolio for each user.
<
portfolios
>
{
for $u in collection('users')/users
order by $u/userid
return
<
portfolio
id
=
"{$u/userid}"
>
<
name
>
<
first
>{data($u/firstname)}</
first
>
<
last
>{data($u/lastname)}</
last
>
</
name
>
<
stocks
>
{
for $h in collection('holdings')/holdings
where $h/userid = $u/userid
order by $h/stockticker
return
<
stock
>
<
ticker
>{data($h/stockticker)}</
ticker
>
<
shares
>{data($h/shares)}</
shares
>
</
stock
>
}
</
stocks
>
</
portfolio
>
}
</
portfolios
>
Here is the result of the preceding query.
<
portfolios
>
<
portfolio
id
=
"Maggie"
>
<
name
>
<
first
>Maggie</
first
>
<
last
>Pristera</
last
>
</
name
>
<
stocks
>
<
stock
>
<
ticker
>AMZN</
ticker
>
<
shares
>3000</
shares
>
</
stock
>
<
stock
>
<
ticker
>EBAY</
ticker
>
<
shares
>4000</
shares
>
</
stock
>
<
stock
>
<
ticker
>IBM</
ticker
>
<
shares
>2500</
shares
>
</
stock
>
<
stock
>
<
ticker
>PRGS</
ticker
>
<
shares
>23</
shares
>
</
stock
>
</
stocks
>
</
portfolio
>
<
portfolio
id
=
"Minollo"
>
<
name
>
<
first
>Carlo</
first
>
<
last
>Innocenti</
last
>
</
name
>
<
stocks
>
<
stock
>
<
ticker
>AMZN</
ticker
>
<
shares
>3000</
shares
>
</
stock
>
<
stock
>
<
ticker
>EBAY</
ticker
>
<
shares
>4000</
shares
>
</
stock
>
<
stock
>
<
ticker
>LU</
ticker
>
<
shares
>40000</
shares
>
</
stock
>
<
stock
>
<
ticker
>PRGS</
ticker
>
<
shares
>4000000</
shares
>
</
stock
>
</
stocks
>
</
portfolio
>
</
portfolios
>
NOTE: In the preceding query the data function is used, which returns only the value of the stockticker column. Without the data function, the value would be surrounded with an element named stockticker, resulting in, for example:
<
ticker
>
<
stockticker
>AMZN</
stockticker
>
</
ticker
>
In some applications, you may need to use XML and relational data together. For example, a configuration file or an incoming web message might provide information needed to parameterize a query. Suppose you have an XML file that contains a request for a particular kind of report, and your query is to produce that report. For example, the following XML file, request.xml, contains a request to show the performance of Maggie's stocks during the period from 2003-01-01 to 2004-06-01.
<?
xml
version
=
"1.0"
?>
<
request
>
<
performance
>
<
UserId
>Maggie</
UserId
>
<
start
>2003-01-01</
start
>
<
end
>2004-06-01</
end
>
</
performance
>
</
request
>
Here is a query that creates a portfolio for the user specified in a request file, during the requested period.
declare base-uri "file:///c:/programs/examples/JoinXMLToRelational/";
declare variable $request := doc('request.xml')/request;
for $user in $request/performance/UserId,
$start in $request/performance/start,
$end in $request/performance/end
return
<
portfolio
UserId
=
"{$user}"
>
{ $request }
{
for $st in collection('holdings')/holdings,
$stats in collection('statistical')/statistical
where $st/userid = $user
and $stats/ticker = $st/stockticker
return
<
stock
>
{ $stats/companyname }
{ $st/stockticker }
{ $st/shares }
{ $stats/annualrevenues }
{
let $hist :=
for $h in collection('historical')/historical
where $h/ticker = $st/stockticker
and xs:date($h/datetraded) gt xs:date($start)
and xs:date($h/datetraded) lt xs:date($end)
return $h
return
<
performance
>
<
min
>{min($hist/adjustedclose)}</
min
>
<
max
>{max($hist/adjustedclose)}</
max
>
<
daily
>
{
for $h in $hist
return <
day
>{$h/datetraded, $h/adjustedclose }</
day
>
}
</
daily
>
</
performance
>
}
</
stock
>
}
</
portfolio
>