Converting proprietary file formats to specific XML structures

February 18, 2008 Data & AI

We have talked in the past about how you can use XQuery to deal with EDI message (transform a directory of EDI messages, create EDI messages out of your database, converting tab delimited files into EDI). But sometimes you are not dealing with EDI; it's not infrequent that companies still rely on proprietary file formats to exchange or circulate information. For example, a few days ago we received a request from a user who needed to convert a file in a proprietary format into a specific XML structure. The sample file he sent us looked something like this:

[cc lang="text" theme="xquery"] 10 0123456789MR JOHN SMITH 20 0123456789ACCT1 30 0123456789SALARY 500000 30 0123456789BONUS 1000 20 0123456789ACCT2 30 0123456789OTHER 100 10 1234566790MR DAVID DOE 20 1234567890ACCT1 30 1234567890SALARY 10000[/cc]

This isn't EDI, CSV, tab delimited or any of the other standard or semi-standard ways to represent information in text files; it's a proprietary format based on a specific interpretation of the size and position of the fields. This is the description of records and fields that the user shared with us:

[cc lang="text" theme="xquery"] one block is made of: one record 10 (customer) 1 to N record 20 per record 10 (accounts) 1 to N record 30 per record 20 (transactions)

record 10 record type on 10 characters customer id on 10 characters customer lastname on 20 characters customer firstname on 30 characters

record 20 record type on 10 characters customer id on 10 characters account id on 10 characters

record 30 record type on 10 characters customer id on 10 characters label on 10 characters value on 10 characters[/cc]

How can we deal with something like this without building our own parser? DataDirect XML Converters and Stylus Studio can definitely help! XML Converters support what we call "custom conversions"; a custom conversion defines how files belonging to the same format family should be converted to XML by XML Converters. As the definition of a custom conversion can be a tedious task, Stylus Studio provides an editor that allows you to create custom conversion through an intuitive graphical interface. Explaining how that would work in this case is probably more complicated than actually doing it... so, here is a short video which describes all the operations you would do in Stylus Studio to convert the file provided by the user into XML.

Thanks to the custom conversion created using Stylus Studio, we are now able to manipulate the proprietary format file as XML; but, as mentioned above, the user needs to convert the file into a specific XML format. Using the example above, the desired XML result should look like this:

[cc lang="xquery"]

10 0123456789 MR JOHN SMITH

20 0123456789 ACCT1

30 0123456789 500000

30 0123456789 1000

20 0123456789 ACCT2

30 0123456789 100

10 1234566790 MR DAVID DOE

20 1234567890 ACCT1

30 1234567890 10000 [/cc]

This is where the combination of DataDirect XQuery and XML Converters proves particularly helpful: not only we are able to move from a proprietary file format to well formed XML, but we also have all the power, performance and scalability of XQuery to deal with such XML to transform it (potentially, even augment it!) into the final format that we need. The following XQuery - which implicitly relies on the custom converter described above - does the trick; notice that we created a couple of functions to take care of the positional grouping which is required to generate the desired output; that makes the XQuery much easier to read and more reusable:

[cc lang="xquery"] declare function local:getRelated20s($item) { let $nextItem :=$item/following-sibling::*[local-name()!="record_20"and local-name()!="record_30"][1] for $related in $item/following-sibling::*[local-name()="record_20"] where if ($nextItem) then $related << $nextItem else true() return $related }; declare function local:getRelated30s($item) { let $nextItem := $item/following-sibling::*[local-name()!="record_30"][1] for $related in $item/following-sibling::*[local-name()="record_30"] where if($nextItem) then $related << $nextItem else true() return $related };

{ for $record_10 in doc(‘converter:file:///c:/blog1/sample.conv?sampleinput.txt’)/root/record_10 return

{$record_10/customer_id/text()}

{$record_10/lastname/text()}

{$record_10/firstname/text()}

{ for $record_20 in local:getRelated20s($record_10) return

{$record_20/account_id/text()}

{ for $record_30 in local:getRelated30s($record_20) return

{$record_30/value/text()}

}

}

} [/cc]

Once again, thanks to the fact that DataDirect XQuery and XML Converters make it possible to manipulate a wide variety of data sources (even proprietary ones!), you are able to deal with transformation (and possibly aggregation) problems from an XML point of view, shielding as much as possible those tasks from the low level details of the data sources.

Minollo