Increasingly, XML is stored in relational databases as structured XML. Some applications will continue to shred XML into a relational format or store XML as CLOB values. Other applications can now take advantage of relational databases that support a native XML data type to represent an XML document or XML document fragment. This paper shows how you can use DataDirect XQuery® to query XML stored as character data and query XML stored as XML.
When XML is stored as character data, it must be parsed before it can be queried as XML. You can parse XML stored as character data using Java external functions.
The SQL standard is evolving to allow XQuery to be used directly in a SQL query, providing a convenient way to query XML, but many relational databases that are currently shipping do not support this ability or implement it with limited functionality. Because DataDirect XQuery® uses SQL to query relational databases, this means that DataDirect XQuery® cannot directly evaluate XQuery expressions against XML stored in a column of a relational database.
Fortunately, you can always return an entire column that contains XML, and workarounds exist that allow you to write almost any query you need. Unfortunately, these workarounds force you to choose between database independence and performance.
Future versions of DataDirect XQuery® will solve this problem. In the meantime, you can execute XQuery queries against XML columns using one of the following three approaches:
USERID |
XMLCOL |
Jonathan |
< HOLDINGS >
< SHARE COMPANY = "Amazon.com, Inc."
USERID = "Jonathan" >3000</ SHARE >
< SHARE COMPANY = "eBay Inc." USERID = "Jonathan" >4000</ SHARE >
< SHARE COMPANY = "Int'l Business Machines C"
USERID = "Jonathan" >2500</ SHARE >
< SHARE COMPANY = "Progress Software"
USERID = "Jonathan" >23</ SHARE > </ HOLDINGS > |
Minollo |
< HOLDINGS >
< SHARE COMPANY = "Amazon.com, Inc."
USERID = "Minollo" >3000</ SHARE >
< SHARE COMPANY = "eBay Inc." USERID = "Minollo" >4000</ SHARE >
< SHARE COMPANY = "Lucent Technologies, Inc."
USERID = "Minollo" >40000</ SHARE >
< SHARE COMPANY = "Progress Software"
USERID = "Minollo" >4000000</ SHARE > </ HOLDINGS > |
... |
... |
If your database does not support the XML type and you store XML documents as character data, you must parse the XML before it can be queried. This can be done with a Java external function called from within an XQuery query. The following Java external function creates a DOM tree by parsing its input:
public static Document txt2xml(String txt) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
}
catch
(ParserConfigurationException e) {
e.printStackTrace();
return
null
;
}
Document doc =
null
;
try
{
doc = builder.parse(
new
InputSource(
new
StringReader(txt)));
}
catch
(SAXException e1) {
e1.printStackTrace();
return
null
;
}
catch
(IOException e1) {
e1.printStackTrace();
}
return
doc;
}
Using DataDirect XQuery®, you can call this function by declaring it in the prolog and using it in a query. For example:
declare namespace p='ddtekjava:txt2xml';
declare function p:txt2xml($inp as xs:string) as document-node() external;
for $row in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
p:txt2xml($row/XMLCOL)/HOLDINGS/SHARE[@COMPANY='Amazon.com, Inc.']
Using DataDirect XQuery®, it's straightforward to return an entire XML column because it does not require evaluation of the contents of the column. For the HOLDINGSXML table, the following query returns a column:
collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
The result looks like this:
<
XMLCOL
>
<
HOLDINGS
>
<
SHARE
COMPANY
=
"Amazon.com, Inc."
USERID
=
"Jonathan"
>3000</
SHARE
>
<
SHARE
COMPANY
=
"eBay Inc."
USERID
=
"Jonathan"
>4000</
SHARE
>
<
SHARE
COMPANY
=
"Int'l Business Machines C"
USERID
=
"Jonathan"
>2500</
SHARE
>
<
SHARE
COMPANY
=
"Progress Software"
USERID
=
"Jonathan"
>23</
SHARE
>
</
HOLDINGS
>
</
XMLCOL
>
<
XMLCOL
>
<
HOLDINGS
>
<
SHARE
COMPANY
=
"Amazon.com, Inc."
USERID
=
"Minollo"
>3000</
SHARE
>
<
SHARE
COMPANY
=
"eBay Inc."
USERID
=
"Minollo"
>4000</
SHARE
>
<
SHARE
COMPANY
=
"Lucent Technologies Inc."
USERID
=
"Minollo"
>40000</
SHARE
>
<
SHARE
COMPANY
=
"Progress Software"
USERID
=
"Minollo"
>4000000</
SHARE
>
</
HOLDINGS
>
</
XMLCOL
>
In contrast, any query that evaluates path expressions against the contents of an XML column fails. For example, the following query fails and raises an error:
collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL/HOLDINGS
Because the // operator requires all descendant nodes to be evaluated, this operation fails if applied to an XML column. For example, the following query fails and raises an error:
collection('HOLDINGSXML')//HOLDINGSXML
DataDirect XQuery® supports an extension expression named evaluate in memory that forces an expression to be evaluated in memory using XQuery directly, not using SQL. This is a simple way to use XQuery functionality that is not available in SQL.
For example, the following query fails and raises an error because DataDirect XQuery® cannot evaluate path expressions against XML in the database:
for $h in collection('HOLDINGSXML')/HOLDINGSXML
where $h/USERID eq 'Minollo'
return
$h/HOLDINGS[@COMPANY = 'Progress Software']
If we rewrite the query, using the evaluate-in-memory extension expression to force the XML to be evaluated in memory, the query succeeds:
for $h in collection('HOLDINGSXML')/HOLDINGSXML
let $holdings := (# ddtek:evaluate-in-memory #) {$h/XMLCOL}
where $h/USERID eq 'Minollo'
return
$holdings/SHARE[xs:string(@COMPANY) eq 'Progress
Software']
The result looks like this:
<
HOLDINGS
>
<
SHARE
COMPANY
=
"Progress Software"
USERID
=
"Minollo"
>4000000</
SHARE
>
</
HOLDINGS
>
DataDirect XQuery® can call any SQL function as an external function. DB2 and Oracle provide SQL functions that allow you to return parts of an XML value stored in the database using path expressions similar to XPath 1.0.
For example, the DB2 extractCLOB function accepts two parameters; the first parameter is the XML to be queried, and the second parameter is a path expression to be evaluated against the XML.
To call an external function, you must first declare it in the query prolog:
declare function ddtek-sql:extractCLOB(
$inp as node(), $xp as xs:string) as node() external;
Once the function is declared, it can be called in a query:
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:extractCLOB(
$h/node(),
"/HOLDINGS")
In the preceding query, note that the first parameter is a path expression that points to the XML to be queried. The second parameter is a string containing the path expression to be used.
For more information about declaring and invoking SQL functions, refer to the DataDirect XQuery® User's Guide and Reference.
The following examples show how to use some common DB2 SQL functions to query XML stored in the database.
When the DB2 function extractCLOB returns an XML value, the result can only be returned without further processing as part of the result of the complete XQuery expression. DataDirect XQuery® allows the following exception: the result of an extract function can be used as the content expression of an element constructor that is returned as-is from the XQuery expression. You can use the evaluate-in-memory extension expression to work around this limitation. This extension expression forces DataDirect XQuery® to evaluate the XQuery expression in memory.
In addition, DataDirect XQuery® does not support DB2 functions that return sequences of values, including extractIntegers, extractCLOBs, and so on.
NOTE: DB2 XML Extender functionality is not enabled by default; it must be explicitly enabled. In addition, the user ID executing DB2 SQL functions must have the correct permissions to use XML Extender functionality. Refer to your DB2 documentation for more information about enabling XML Extender functionality and setting permissions.
This example returns the value of all HOLDINGS elements.
declare function ddtek-sql:DB2XML.extractCLOB(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:DB2XML.extractCLOB(
$h/node(),
"/HOLDINGS")
In this example, notice how the argument of fn:node-name is the result of a SQL extractCLOB function that normally would have to be returned as-is from the expression. In this case, you must use the evaluate-in-memory extension expression to force DataDirect XQuery® to evaluate the XQuery expression in memory; otherwise, this query fails and raises an error.
declare function ddtek-sql:DB2XML.extractCLOB(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
let $holdings := ddtek-sql:DB2XML.extractCLOB($h/node(),
"/HOLDINGS")
let $holdings2 := (#ddtek:evaluate-in-memory#) {$holdings}
for $n in $holdings2//(*|@*)
return
fn:node-name($n)
Now let's look at using Oracle SQL functions in the same way.
Similar to the DB2 extractCLOB function, when the Oracle function extract returns an XML value, the result can only be returned without further processing as part of the result of the complete XQuery expression. DataDirect XQuery® allows the following exception: the result of an extract function can be used as the content expression of an element constructor that is returned as-is from the XQuery expression. You can use the evaluate in-memory extension expression to force DataDirect XQuery® to evaluate the XML in memory.
Using our sample data, the following examples show how to use some common Oracle SQL functions to query XML stored in the database.
This example returns all SHARE elements.
declare function ddtek-sql:extract(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
<
extract
path
=
'/SHARE'
>{
ddtek-sql:extract(
$h/node(),
"/HOLDINGS/SHARE")
}</
extract
>
This example returns the value of all SHARE elements for which the COMPANY attribute is set to "Progress Software".
declare function ddtek-sql:extractValue(
$inp as node(), $xp as xs:string) as xs:string external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:extractValue(
$h/node(),
"/HOLDINGS/SHARE[@COMPANY = 'Progress Software']")
In this example, notice how the argument of the fn:node-name is the result of a SQL extract function that normally would have to be returned as-is from the expression. In this case, you must use the evaluate-in-memory extension expression to force DataDirect XQuery® to evaluate the XML in memory; otherwise, this query fails and raises an error.
declare function ddtek-sql:extract(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
let $SHARE := ddtek-sql:extract($h/node(),"//SHARE")
return
(# ddtek:evaluate-in-memory #) {fn:node-name($SHARE)}
DataDirect XQuery® can query XML stored as character data, but you must provide a Java external function to parse the XML.
Because DataDirect XQuery® uses SQL to query relational databases, this means that DataDirect XQuery® cannot directly evaluate XQuery expressions against XML stored in a column of a relational database. As an alternative, DataDirect XQuery® provides three ways to query XML that is stored in a column of a database as XML:
Future versions of DataDirect XQuery® will provide this functionality in a transparent, portable, and efficient way.
We used the following methods to insert the XML data into the XML column named XMLCOL using the example data shipped with DataDirect XQuery®.
CREATE TABLE HOLDINGSXML (userid varchar(50) not null primary key,
XMLCOL DB2XML.XMLCLOB NOT LOGGED)
INSERT INTO HOLDINGSXML
(SELECT
h.USERID,
xml2clob(XMLELEMENT(name "HOLDINGS",
XMLAGG(
XMLELEMENT(name "SHARE",
XMLATTRIBUTES(s.COMPANYNAME AS COMPANY,h.USERID AS USERID),
h.SHARES))))
FROM HOLDINGS h, STATISTICAL s
WHERE h.STOCKTICKER = s.TICKER
GROUP BY h.USERID)
CREATE TABLE HOLDINGSXML(userid varchar2(50) primary key, XMLCOL XMLTYPE)
INSERT INTO HOLDINGSXML
(SELECT
h.USERID,
XMLELEMENT("HOLDINGS",
XMLAGG(
XMLELEMENT("SHARE",
XMLATTRIBUTES(s.COMPANYNAME AS COMPANY,h.USERID AS USERID),
h.SHARES)))
FROM HOLDINGS h, STATISTICAL s
WHERE h.STOCKTICKER = s.TICKER
GROUP BY h.USERID)