Well, the temptation to keep reviewing license agreements rather than playing with XQuery was strong... but I resisted! I had to help this guy! :)
I was happy to notice that the spreadsheets had been saved using the Office Open XML (OOXML) format; OOXML is basically a package of XML files describing the document (text, spreadsheet, slides) that it represents. Being XML-based, it's quite easy to inspect it using an XML query language; the package is a zip format, so individual documents can be retrieved using the standard "jar:" file scheme, for example.
So, coming back to the guy asking for help joining two spreadsheets; what can we do for him? If you look at the structure of the two spreadsheets he sent me (attached to this post), you'll see two XML documents that are relevant to us:
To retrieve the whole content of xl/worksheets/sheet1.xml from c:doc1.xlsx using a Java-based XQuery processor like DataDirect XQuery, you can just us the standard fn:doc() function:
[cc lang="xquery"]fn:doc(“jar:file:///c:/doc1.xlsx!/xl/worksheets/sheet1.xml”)[/cc]
The index corresponding to the character string contained in the first cell of the first row in the spreadsheet can be retrieved doing:
[cc lang="xquery"]fn:doc(“jar:file:///c:/doc1.xlsx!/xl/worksheets/sheet1.xml”)// ooxml:sheetData/ooxml:row[1]/ooxml:c[1]/ooxml:v[/cc]
...where ooxml is a namespace prefix associated to the URI http://schemas.openxmlformats.org/spreadsheetml/2006/main .
And if you have the index of the character string, the character string itself can be retrieved as:
[cc lang="xquery"]fn:doc(“jar:file:///c:/doc1.xlsx!/xl/sharedStrings.xml”)// ooxml:si[$theIndex+1]/ooxml:t/text()[/cc]
Easy, isn't it!?
Add some syntax, a couple of FLWOR expressions, a few HTML tags, and here you go! This XQuery just joins the two spreadsheets and creates an easy to read HTML report:
[cc lang="xquery"]declare namespace ooxml= “http://schemas.openxmlformats.org/spreadsheetml/2006/main”; let $doc1Strings := doc(“jar:file:///c:/doc1.xlsx!/xl/sharedStrings.xml”)//ooxml:si let $doc2Strings := doc(“jar:file:///c:/doc2.xlsx!/xl/sharedStrings.xml”)//ooxml:si return
{
NameDivisionCity
, for $row1 in doc(“jar:file:///c:/doc1.xlsx!/xl/worksheets/sheet1.xml”)// ooxml:sheetData/ooxml:row, $row2 in doc(“jar:file:///c:/doc2.xlsx!/xl/worksheets/sheet1.xml”)// ooxml:sheetData/ooxml:row let $col1-1 := $doc1Strings[$row1/ooxml:c[1]/ooxml:v/number()+1]/ooxml:t/text() let $col2-1 := $doc2Strings[$row2/ooxml:c[1]/ooxml:v/number()+1]/ooxml:t/text() where $col1-1 = $col2-1 return let $col1-2 := $doc1Strings[$row1/ooxml:c[2]/ooxml:v/number()+1]/ooxml:t/text() let $col2-2 := $doc2Strings[$row2/ooxml:c[2]/ooxml:v/number()+1]/ooxml:t/text() return
{$col1-1}{$col2-2}{$col1-2}
}
[/cc]
It's that simple! Here are the two spreadsheets, if you want to try it yourself: doc1.xlsx, doc2.xlsx
Subscribe to get all the news, info and tutorials you need to build better business apps and sites