This second post in the series Integrating Office applications with Salesforce.com talks about querying Microsoft Office Word forms.
This is in no way a complete introduction on WordProcessingML, or the .docx file format used by Microsoft Word. Such great introduction is <a href="http://openxmldeveloper.org/articles/1970.aspx"?OpenXML Explained, Wouter van Vugt explains in this e-book the different file formats in OOXML with concrete examples. And if not sufficient, you may start to digest the Office Open XML (OOXML) specification.
Before making our hands dirty with XQuery coding, first a word on creating Microsoft Office Word forms. You can create a form in Microsoft Office Word, by adding “content controls” to your document. As one of the key benefits of such approach, you can lock down the document, and as such only allow the user to edit the data in the controls. This guarantees the structure of the document will not be altered, and finally this will simplify querying the document as we will discuss later on.
Adding “content controls” is done via the Developer tab, which is not enabled by default. Go to the “Word options”, and select “Show Developer tab in the Ribbon”.
Clicking “OK”, the Developer tab is now visible.
To add controls and define the layout of our form, we need to go into “Design Mode”.
There are various types of controls. From plain text controls, to more structured controls to pick a date or choose a value from a drop down list.
Dragging the controls on our document, we end up with the following form. Check out our first post in this series how the document looks like from a user’s perspective.
Every control has a tag property, as we will explain right away, this is key in order to query the different content controls in our document. The tag is specified in the properties dialog box.
We have now our form, we can protect it and have our users only be able to fill in the forms.
As we have discussed in the past, an OOXML document is based on the ZIP format. A ZIP file, containing multiple XML documents.
For today only the document /word/document.xml matters. Let's first have a closed look and try to understand the structure of the XML. Here is a fragment of the document.xml, which contains the user’s data.[cc lang="xml" theme="xquery"]
1st Avenue 999
[/cc]The XML document includes a Structured Document element, , for each content control. The element has 2 sections, a properties section (line 2), and the content section (line 11). The element (line 4) is of interest as it allows us to locate a specific content control in the document through the w:val attribute. Remember we specified the tag for each control when designing the document. The element contains the actual data entered by the user. We can gather that data from the single element (line 18).
Let’s define a utility function to retrieve the data for a specific content control. The parameters are the document and the tag (name) of the content control. The functions returns a single string with the user’s data.[cc lang="xquery"]declare function local:getField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent/data(.) };[/cc]Using this function we can now query the various content controls in our Microsoft Office Word document. But before doing so, we have to note that text controls have a more complex . And this because they may span multiple lines. Here is an example.[cc lang="xml" theme="xquery"]
Our first trip to Great Company was a success.
Mr. Roxy was excited after our product presentation. And we have scheduled a follow-up conference call to discuss some of the outstanding questions.
[/cc]We can still gather that data from the text elements . But in addition we need to take the elements into account, they indicate a new line. We define a second utility function to query controls supporting multiple lines.[cc lang="xquery"]declare function local:getMultiLineField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent//(w:t|w:br)/(if (fn:local-name() = "t") then fn:data(.) else " ") };[/cc]We have now all the necessary utility functions. The following query retrieves all data out of our Microsoft Office Word form. We use the greatcompany.docx document, shown in Part I of this series.[cc lang="xquery"]declare namespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
declare function local:getField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent/data(.) };
declare function local:getMultiLineField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent//(w:t|w:br)/(if (fn:local-name() = "t") then fn:data(.) else " ") };
let $word := fn:doc("zip:file:///C:/example/greatcompany.docx!/word/document.xml") return
{local:getField($word, "date")} {local:getField($word, "name")} {local:getField($word, "description")} {local:getField($word, "contact")} {local:getField($word, "email")} {local:getField($word, "phone")} {local:getField($word, "street")} {local:getField($word, "city")} {local:getField($word, "state")} {local:getField($word, "country")} {local:getMultiLineField($word, "report")} [/cc]This query results in the following XML fragment.[cc lang="xml" theme="xquery"] 12/31/08 Great Company This is a great company Roxy joe.roxy@greatcompany.com 123-456-789 1st Avenue 999 San Francisco CA 94103 United States Our first trip to Great Company was a success. Mr. Roxy was excited after our product presentation. And we have scheduled a follow-up conference call to discuss some of the outstanding questions. [/cc]We have shown how to query Microsoft Office Word documents, only in a few lines of XQuery. In our next post we’ll explain how to interact with your Salesforce.com system using DataDirect XQuery.