How to split a large XML document in many smaller ones?

September 19, 2008 Data & AI

We often receive this kind of question: "I have a large XML document and I need to split it in many smaller documents; my document looks like this:[cc lang="xquery"] field1.1 field2.1

[/cc]...and I need to documents that don't contain more than N records each." Using DataDirect XQuery this task is quite simple; leveraging the ddtek:serialize-to-url() function, you can do something like this: [cc lang="xquery"] declare variable $recordsPerDocument := 10; let $records := doc("c:/books.xml")/records let $groupCount := xs:integer(fn:ceiling(count($records) div $recordsPerDocument)) for $g in 1 to $groupCount let $group := $records[fn:position() gt ($g - 1) * $recordsPerDocument and fn:position() le $g * $recordsPerDocument] return ddtek:serialize-to-url( <records>{ $group }</records>, concat("file:///c:/split-", $g, ".xml"), "indent=yes") [/cc]What if you want to do something similar, but applied to RDBMS tables? How do I split the content of a RDBMS table across multiple XML documents? Well, as we are talking about DataDirect XQuery, it shouldn't surprise you that basically the same XQuery can be applied to a table:[cc lang="xquery"] declare variable $recordsPerDocument := 10; let $records := collection("myTable")/myTable let $groupCount := xs:integer(fn:ceiling(count($records) div $recordsPerDocument)) for $g in 1 to $groupCount let $group := $records[fn:position() gt ($g - 1) * $recordsPerDocument and fn:position() le $g * $recordsPerDocument] return ddtek:serialize-to-url( <records>{ $group }</records>, concat("file:///c:/split-", $g, ".xml"), "indent=yes") [/cc]Once again XQuery offers a simple, flexible solution for a problem that comes up pretty frequently.

Minollo