Previous tutorials explained how to use XQuery to deal with EDI messages, for example, how to transform a directory of EDI messages, create EDI messages out of your database, converting tab delimited files into EDI, etc. But sometimes you're not dealing with EDI; companies still frequently rely on proprietary flat file formats to exchange or circulate information.
Consider the following example of a user who needed to convert a file in a proprietary flat file format into a specific XML structure. The sample file looks like this:
10 0123456789MR JOHN SMITH
20 0123456789ACCT1
30 0123456789SALARY 500000
30 0123456789BONUS 1000
20 0123456789ACCT2
30 0123456789OTHER 100
10 1234566790MR DAVID DOE
20 1234567890ACCT1
30 1234567890SALARY 10000
This isn't EDI, CSV, tab delimited or any of the other standard or semi-standard ways to represent information in text files; it's a proprietary format based on a specific interpretation of the size and position of the fields. The following is the description of records and fields in the proprietary flat file format:
one block
is
made
of
:
one record 10 (customer)
1
to
N record 20 per record 10 (accounts)
1
to
N record 30 per record 20 (transactions)
record 10
record type
on
10 characters
customer id
on
10 characters
customer lastname
on
20 characters
customer firstname
on
30 characters
record 20
record type
on
10 characters
customer id
on
10 characters
account id
on
10 characters
record 30
record type
on
10 characters
customer id
on
10 characters
label
on
10 characters
value
on
10 characters
How can you deal with such a flat file, without building our own parser? DataDirect XML Converters and Stylus Studio can definitely help! XML Converters support "custom conversions"; a custom conversion defines how files belonging to the same format family should be converted to XML by XML Converters. As the definition of a custom conversion can be a tedious task, Stylus Studio provides an editor that allows you to create custom conversion through an intuitive graphical interface. The easiest way to explain how that would work is to watch short online video tutorial which describes
<
root
>
<
record_10
>
<
type
>10</
type
>
<
customer_id
>0123456789</
customer_id
>
<
lastname
>MR JOHN</
lastname
>
<
firstname
>SMITH</
firstname
>
</
record_10
>
<
record_20
>
<
type
>20</
type
>
<
customer_id
>0123456789</
customer_id
>
<
account_id
>ACCT1</
account_id
>
</
record_20
>
<
record_30
>
<
type
>30</
type
>
<
customer_id
>0123456789</
customer_id
>
<
label
>SALARY</
label
>
<
value
>500000</
value
>
</
record_30
>
<
record_30
>
<
type
>30</
type
>
<
customer_id
>0123456789</
customer_id
>
<
label
>BONUS</
label
>
<
value
>1000</
value
>
</
record_30
>
<
record_20
>
<
type
>20</
type
>
<
customer_id
>0123456789</
customer_id
>
<
account_id
>ACCT2</
account_id
>
</
record_20
>
<
record_30
>
<
type
>30</
type
>
<
customer_id
>0123456789</
customer_id
>
<
label
>OTHER</
label
>
<
value
>100</
value
>
</
record_30
>
<
record_10
>
<
type
>10</
type
>
<
customer_id
>1234566790</
customer_id
>
<
lastname
>MR DAVID</
lastname
>
<
firstname
>DOE</
firstname
>
</
record_10
>
<
record_20
>
<
type
>20</
type
>
<
customer_id
>1234567890</
customer_id
>
<
account_id
>ACCT1</
account_id
>
</
record_20
>
<
record_30
>
<
type
>30</
type
>
<
customer_id
>1234567890</
customer_id
>
<
label
>SALARY</
label
>
<
value
>10000</
value
>
</
record_30
></
root
>
Now that we've converted the proprietary flat file into a specific XML structure, the question becomes, what next?
Once a flat file to XML custom conversion has been configured, the combination of DataDirect XQuery and XML Converters proves particularly helpful: not only we are able to move from a proprietary flat file format to well formed XML, but we also have all the power, performance and scalability of XQuery to deal with such XML to transform it (potentially, even augment it!) into the final format that we need. The following XQuery — which implicitly relies on the custom converter described above — does the trick; notice that we created a couple of functions to take care of the positional grouping which is required to generate the desired output; that makes the XQuery much easier to read and more reusable:
declare
function
local:getRelated20s($item) {
let $nextItem :=$item/following-sibling::*[local-name()!=
"record_20"
and local-name()!=
"record_30"
][1]
for
$related
in
$item/following-sibling::*[local-name()=
"record_20"
]
where
if
($nextItem) then $related << $nextItem
else
true
()
return
$related
};
declare
function
local:getRelated30s($item) {
let $nextItem := $item/following-sibling::*[local-name()!=
"record_30"
][1]
for
$related
in
$item/following-sibling::*[local-name()=
"record_30"
]
where
if
($nextItem) then $related << $nextItem
else
true
()
return
$related
};
<DataSets>
{
for
$record_10
in
doc(
'converter:file:///c:/blog1/sample.conv?sampleinput.txt'
)/root/record_10
return
<DataSet>
<CustomerId>
{$record_10/customer_id/text()}
</CustomerId>
<CustomerName>
{$record_10/lastname/text()}
</CustomerName>
<CustomerFirstName>
{$record_10/firstname/text()}
</CustomerFirstName>
<Accounts>
{
for
$record_20
in
local:getRelated20s($record_10)
return
<Account>
<AccountId>
{$record_20/account_id/text()}
</AccountId>
<Transactions> {
for
$record_30
in
local:getRelated30s($record_20)
return
<Transaction>
<Label>{$record_30/label/text()}</Label>
<Value>{$record_30/value/text()}</Value>
</Transaction>
} </Transactions>
</Account>
}
</Accounts>
</DataSet>
}
</DataSets>
Once again, thanks to the fact that DataDirect XQuery and XML Converters make it possible to manipulate a wide variety of data sources (even proprietary flat files!), you are able to deal with transformation (and possibly aggregation) problems from an XML point of view, shielding as much as possible those tasks from the low level details of the data sources.
In conclusion, DataDirect XML Converters and/or DataDirect XQuery greatly simplifies accessing and working with any non-XML data sources directly from your Java or .NET applications.
Marc Van Cappellen is a Director of Development at DataDirect Technologies. Marc has more than 15 years of experience in various data access technologies including ODBC, JDBC, ADO, XQJ, SQL and XQuery.
Over the past year we have seen office documents -- spreadsheets, word processing documents, and the like -- taking a more prominent role in business processes, workflows, and vertical applications. While office documents have always been common place, their proprietary binary formats have made them all but unusable to businesses which have sought to integrate office documents in their processes. New office document standards like the OpenDocument Format(ODF) and Office Open XML (OOXML), however, are making office document integration in business processes a reality.
A key benefit of ODF and OOXML for developers is the reuse of existing standards -- in essence, your office documents are XML documents, which makes available a complete palette of tools for manipulating these documents and the information they contain. Using tools and technologies available today, you can transform office documents to HTML or PDF, store them in an XML database, shred their information and store it in a relation databases, embed SOAP messages, enrich them with external information, and so on.
Read the full article: XQuery Your Office Documents
There has been a point of controversy over whether JSON or XML is better at dealing with communication between endpoints. For documents, XML is certainly the leader, supporting namespaces and mixed content. But for communications between live programs, JSON with its implicit hash and array support.
Where the message content can be handled equally well by either, JSON is more succinct, but not as much as one might think. But because the syntax is simpler, it parses more quickly.
But it doesn't matter, because with DataDirect XML Converters, you can transform JSON or XML with equal facility.
Here is a retrieval http call of calendar data from Google's developer calendar.
doc("converter:JSON? http://www.google.com/calendar/feeds/ developer-calendar@google.com/public/full ?alt=json &orderby=starttime &singleevents=true &sortorder=ascending &futureevents=true &max-results=4")
(In this and several of the following examples, spaces and line breaks are added for clarity.)
The portions in this color are the actual XQuery syntax, which includes escaping the "&" characters. The portion in this color is the call to the converter.
What the above mini XQuery program (yes, it is actually a complete program) does is fetch the calendar data as JSON and convert it to XML. Because we haven't specified any transformations, the content will just be output.
Taking the resulting JSON from the raw http: query would result in something like this (formatted for ease of reading):
...
"entry"
: [
{
"id"
: {
"_t"
: "http:
//www.google.com/calendar/feeds/
developer-calendar%40google.com/
public
/full/
r53qk5j7ps71oe0u1mim9rcvq0_20080102T030000Z"},
"published"
: {
"_t"
:
"2007-07-16T16:55:58.000Z"
},
"updated"
: {
"_t"
:
"2007-11-15T18:19:05.000Z"
},
"category"
: {
"scheme"
:
"http://schemas.google.com/g/2005#kind"
,
"term"
:
"http://schemas.google.com/g/2005#event"
},
"title"
: {
"type"
:
"text"
,
"_t"
:
"Joomla Bay Area User Group Meeting"
},
...
which is clearly some sort of nested tree structure. Since XML is also a nested tree structure,* and XQuery actually operates on the XQuery Data Model and not on just raw XML bits, using the JSON XML Converter we can transform this on-the-fly to XML input for another application, including the DataDirect XQuery Engine.
(* actually, this looks like XML converted to JSON, which we're going to deconvert.)
This means the XQuery engine effectively "sees" something like the following — but remember, this is a virtual translation, which means there is no separate step that saves something as JSON and then reloads it as XML. It all happens inside of the JSON XML Converter library.
...
<entry>
<id>
<_t>http:
//www.google.com/calendar/feeds/
developer-calendar%40google.com/
public
/full/
r53qk5j7ps71oe0u1mim9rcvq0_20080102T030000Z</_t>
</id>
<published>
<_t>2007-07-16T16:55:58.000Z</_t>
</published>
<updated>
<_t>2007-11-15T18:19:05.000Z</_t>
</updated>
<category>
<scheme>http:
//schemas.google.com/g/2005#kind</scheme>
<term>http:
//schemas.google.com/g/2005#event</term>
</category>
<title>
<type>text</type>
<_t>Joomla Bay Area User Group Meeting</_t>
...
Once the data "looks" like XML, XQuery can transform it. So to see all of the names and dates of the next four upcoming entries, the following XQuery program will do just that:
<
html
><
body
><
table
border
=
"1"
>{(
<
tr
><
th
>Meeting</
th
><
th
>Starts</
th
><
th
>Ends</
th
></
tr
>,
for $entry in doc("converter:JSON?http://www.google.com/calendar/feeds/
developer-calendar@google.com/public/full ?alt=json&orderby=starttime
&singleevents=true &sortorder=ascending&
futureevents=true&max-results=4")/feed/entry
return <
tr
><
td
>{ data($entry/title/_t) }</
td
>
<
td
>{ xs:dateTime($entry/gd_when/startTime) }</
td
>
<
td
>{ xs:dateTime($entry/gd_when/endTime) }</
td
>
</
tr
>)
}</
table
>
</
body
></
html
>
Which of course produces the following schedule of upcoming Google-related events:
Meeting |
Starts |
Ends |
Joomla Bay Area User Group Meeting |
2008-01-01T19:00:00-08:00 |
2008-01-01T21:00:00-08:00 |
Kernel Walkthroughs |
2008-01-01T19:00:00-08:00 |
2008-01-01T21:30:00-08:00 |
Kernel Walkthroughs |
2008-01-08T19:00:00-08:00 |
2008-01-08T21:30:00-08:00 |
MySQL Community Meetup |
2008-01-14T18:30:00-08:00 |
2008-01-14T20:30:00-08:00 |
The real advantage of the DataDirect XML Converters is that they make file format issues irrelevant. Just as DataDirect XQuery is a highly conformant W3C XQuery processor that works the same against any of the databases it is connected to — whether Oracle, IBM DB2, Sybase, MySQL, SQL Server or whatever — so do the XML Converters mask the difference between different storage formats.
So the JSON adherents can have their way, and the XML adherents theirs, but you can bridge the gap and process data from either or both sources without difficulty.
By: Ivan Pedruzzi, Product Architect, The DataDirect XML Converters™ Team.
If you have processed XML using Java, you should be familiar with JAXP (Java API for XML Processing). Here is a simple program that processes an XML document using XSLT:
import javax.xml.transform.*;
public class MyTransform {
public static void main(String args[]) throws Throwable
{
StreamSource input = new StreamSource("file:///c:/input.xml");
StreamSource xslt = new StreamSource("file:///c:/tranform.xsl");
StreamResult output = new StreamResult("file:///c:/result.xml");
TransformerFactory tf = TransformerFactory.newInstance();
tf.newTransformer(xslt).transform(input, output);
}
}
Nice and easy if you are dealing with XML documents, but what happens if your data is not all XML? Assume for instance that we need to exchange data with a business partner who requires files using a specific XML dialect, but that our data is tabular and is stored as CSV files.
Here's a sample from our CSV file of used motorcycles inventory:
make,model,year,mileage
BMW,R1150RS,2004,14274
Kawasaki,GPz1100,1996,60234
Ducati,ST2,1997,24000
Moto Guzzi,LeMans,2001,12393
BMW,R1150R,2002,17439
Ducati,Monster,2000,15682
Aprilia,Futura,2001,17320
Our partner requires an XML document that looks like what follows, and has provided an XML Schema so that we can ensure that the XML we provide complies with his requirements:
<?
xml
version
=
"1.0"
?>
<
inventory662
;
<bike>
<
make
>BMW</
make
>
<
model
>R1150RS</
model
>
<
year
>2004</
year
>
<
mileage
>14274</
mileage
>
</
bike
>
</
inventory
>
The first step is simply to create an XML representation of our CSV data.
01.
<?
xml
version
=
"1.0"
?>
02.
<
table
>
03.
<
row
>
04.
<
make
>BMW</
make
>
05.
<
model
>R1150RS</
model
>
06.
<
year
>2004</
year
>
07.
<
mileage
>14274</
mileage
>
08.
</
row
>
09.
<
row
>
10.
<
make
>Kawasaki</
make
>
11.
<
model
>GPz1100</
model
>
12.
<
year
>1996</
year
>
13.
<
mileage
>60234</
mileage
>
14.
</
row
>
15.
<
row
>
16.
<
make
>Ducati</
make
>
17.
<
model
>ST2</
model
>
18.
<
year
>1997</
year
>
19.
<
mileage
>24000</
mileage
>
20.
</
row
>
21.
<
row
>
22.
<
make
>Moto Guzzi</
make
>
23.
<
model
>LeMans</
model
>
24.
<
year
>2001</
year
>
25.
<
mileage
>12393</
mileage
>
26.
</
row
>
27.
<
row
>
28.
<
make
>BMW</
make
>
29.
<
model
>R1150R</
model
>
30.
<
year
>2002</
year
>
31.
<
mileage
>17439</
mileage
>
32.
</
row
>
33.
<
row
>
34.
<
make
>Ducati</
make
>
35.
<
model
>Monster</
model
>
36.
<
year
>2000</
year
>
37.
<
mileage
>15682</
mileage
>
38.
</
row
>
39.
<
row
>
40.
<
make
>Aprilia</
make
>
41.
<
model
>Futura</
model
>
42.
<
year
>2001</
year
>
43.
<
mileage
>17320</
mileage
>
44.
</
row
>
45.
</
table
>
Though some of the specifics vary (additional parent elements, different element names, and so on), because the data is now in XML format, XSLT can be used to create the final XML result.
So, that's the theory (CSV to> XML to> XSLT to> XML), but how do you make it all work? Even assuming you have an XML document that represents data stored in CSV format, the prospect of writing XSLT can be daunting. It might sound intimidating, but using DataDirect XML Converters™ we can create solid, complex, bug-free XSLT in just a few steps.
You know what they say ... a picture is worth 1000 words.
DataDirect XML Converters™ did most of the work for us — converting CSV to XML, writing XSLT code based on simple mappings — but what happens if we need to perform the same steps periodically, perhaps as part of larger application so that we can automate the process of exchanging inventory information with our partner? The answer lies in common URLs, and DataDirect XML Converters™'s uncommon Java runtime library.
URL stands for Uniform Resource Locator. URLs allow us to use a symbolic name to reach a document that either exists somewhere or gets created dynamically when the address represented in the URL is hit. If you look closely inside the DataDirect XML Converters™ Project window, you quickly realize how adapters work — an adapter is represented as a URL, like FTP, HTTP, FILE, and so on.
Let's take a closer look at an adapter's components, using the following adapter as an example:
adapter:CSV:newline=crlf:sep=,:first=yes:escape=\:quotes='"?file:///c:/one.csv
Let's go back to the original JAXP example. Wouldn't it be cool if we could use the adapter URL to specify the argument for the Java StreamSource method — that is, to effectively use a CSV file as a source for an XSLT transformation, and do it all programmatically? Thanks to the DataDirect XML Converters™ runtime library, we can!
01.
import javax.xml.transform.*;
02.
import javax.xml.transform.stream.*;
03.
04.
// here, we have included the DataDirect XML Converters™ runtime library
05.
import com.exln.stylus.io.*;
06.
07.
08.
public class MyTransform {
09.
public static void main(String args[]) throws Throwable
10.
{
11.
// we use the createSource method from the DataDirect
12.
// XML Converters™ runtime library, which understands
13.
// the DataDirect XML Converters™ adapter URL syntax
14.
StreamSource input = StylusFileFactory.createSource
15.
(
"adapter:CSV:newline=crlf:sep=,:first=yes:escape=\\:quotes='\"
?
16.
file:
///c:/one.csv", "", true);
17.
StreamSource xslt =
new
StreamSource(
"file:///c:/tranform.xsl"
);
18.
StreamResult output =
new
StreamResult(
"file:///c:/result.xml"
);
19.
TransformerFactory tf = TransformerFactory.newInstance();
20.
tf.newTransformer(xslt).transform(input, output);
21.
}
22.
}
JAXP defines a clever mechanism that allows XSLT processors to delegate URL resolution to an external component called theURIResolver
. The XSLT processor invokes the URIResolver
to turn a URL used in document()
, xsl:import
, or xsl:include
instructions into a Source object.
The StylusFileFactory
class in the DataDirect XML Converters™ runtime library implements the URIResolver
interface, allowing adapter URLs to be resolved in the same fashion as standard URLs. Here is an XSLT fragment that shows how a CSV file is accessed as an XML document through the document()
function:
1.
<xsl:
for
-each select="document('
2.
adapter:CSV:newline=crlf:sep=,:first=yes:
3.
escape=\:quotes=
'"?file:///c:/one.csv'
)/table/row"> <xsl:copy-of select=
"."
/> </xsl:
for
-each>
To register a URIResolver
, we use the following code:
1.
TransformerFactory tf = TransformerFactory.newInstance();
2.
Transformer t = tf.newTransformer(xslt);
3.
t.setURIResolver(StylusFileFactory.getFactory());
4.
t.transform(input, output);
We hope you enjoyed this overview of DataDirect XML Converters™'s runtime library. Happy Java programming!
Ivan Pedruzzi, The DataDirect XML Converters™ Team