Integrating Office applications with Salesforce.com - Part II

Default Blog Top Image
by Marc Van Cappellen Posted on May 31, 2009

This second post in the series Integrating Office applications with Salesforce.com talks about querying Microsoft Office Word forms.

 

This is in no way a complete introduction on WordProcessingML, or the .docx file format used by Microsoft Word. Such great introduction is <a href="http://openxmldeveloper.org/articles/1970.aspx"?OpenXML Explained, Wouter van Vugt explains in this e-book the different file formats in OOXML with concrete examples. And if not sufficient, you may start to digest the Office Open XML (OOXML) specification.

 

Before making our hands dirty with XQuery coding, first a word on creating Microsoft Office Word forms. You can create a form in Microsoft Office Word, by adding “content controls” to your document. As one of the key benefits of such approach, you can lock down the document, and as such only allow the user to edit the data in the controls. This guarantees the structure of the document will not be altered, and finally this will simplify querying the document as we will discuss later on.

 

Adding “content controls” is done via the Developer tab, which is not enabled by default. Go to the “Word options”, and select “Show Developer tab in the Ribbon”.

wordoptions

 

Clicking “OK”, the Developer tab is now visible.

developertab

 

To add controls and define the layout of our form, we need to go into “Design Mode”.

designmode

 

There are various types of controls. From plain text controls, to more structured controls to pick a date or choose a value from a drop down list.

controltypes

 

Dragging the controls on our document, we end up with the following form. Check out our first post in this series how the document looks like from a user’s perspective.

tripreportmaster

 

Every control has a tag property, as we will explain right away, this is key in order to query the different content controls in our document. The tag is specified in the properties dialog box.

controlproperties

 

We have now our form, we can protect it and have our users only be able to fill in the forms.

restrictformatting

 

As we have discussed in the past, an OOXML document is based on the ZIP format. A ZIP file, containing multiple XML documents.

examplezip

 

For today only the document /word/document.xml matters. Let's first have a closed look and try to understand the structure of the XML. Here is a fragment of the document.xml, which contains the user’s data.[cc lang="xml" theme="xquery"]

1st Avenue 999

[/cc]The XML document includes a Structured Document element, , for each content control. The element has 2 sections, a properties section (line 2), and the content section (line 11). The element (line 4) is of interest as it allows us to locate a specific content control in the document through the w:val attribute. Remember we specified the tag for each control when designing the document. The element contains the actual data entered by the user. We can gather that data from the single element (line 18).

 

Let’s define a utility function to retrieve the data for a specific content control. The parameters are the document and the tag (name) of the content control. The functions returns a single string with the user’s data.[cc lang="xquery"]declare function local:getField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent/data(.) };[/cc]Using this function we can now query the various content controls in our Microsoft Office Word document. But before doing so, we have to note that text controls have a more complex . And this because they may span multiple lines. Here is an example.[cc lang="xml" theme="xquery"]

Our first trip to Great Company was a success.

Mr. Roxy was excited after our product presentation. And we have scheduled a follow-up conference call to discuss some of the outstanding questions.

[/cc]We can still gather that data from the text elements . But in addition we need to take the elements into account, they indicate a new line. We define a second utility function to query controls supporting multiple lines.[cc lang="xquery"]declare function local:getMultiLineField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent//(w:t|w:br)/(if (fn:local-name() = "t") then fn:data(.) else " ") };[/cc]We have now all the necessary utility functions. The following query retrieves all data out of our Microsoft Office Word form. We use the greatcompany.docx document, shown in Part I of this series.[cc lang="xquery"]declare namespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

declare function local:getField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent/data(.) };

declare function local:getMultiLineField( $document, $fieldName as xs:string) as xs:string { $document//w:sdt[w:sdtPr/w:tag/@w:val = $fieldName]/w:sdtContent//(w:t|w:br)/(if (fn:local-name() = "t") then fn:data(.) else " ") };

let $word := fn:doc("zip:file:///C:/example/greatcompany.docx!/word/document.xml") return

{local:getField($word, "date")} {local:getField($word, "name")} {local:getField($word, "description")} {local:getField($word, "contact")} {local:getField($word, "email")} {local:getField($word, "phone")} {local:getField($word, "street")} {local:getField($word, "city")} {local:getField($word, "state")} {local:getField($word, "country")} {local:getMultiLineField($word, "report")} [/cc]This query results in the following XML fragment.[cc lang="xml" theme="xquery"] 12/31/08 Great Company This is a great company Roxy joe.roxy@greatcompany.com 123-456-789 1st Avenue 999 San Francisco CA 94103 United States Our first trip to Great Company was a success. Mr. Roxy was excited after our product presentation. And we have scheduled a follow-up conference call to discuss some of the outstanding questions. [/cc]We have shown how to query Microsoft Office Word documents, only in a few lines of XQuery. In our next post we’ll explain how to interact with your Salesforce.com system using DataDirect XQuery.


Marc Van Cappellen
View all posts from Marc Van Cappellen on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
More from the author

Related Tags

Related Articles

Boost Your Post M&A Success: Embrace Integration
The period after landing a deal is an important time to build connections, establish trust and implement an integration plan.
Top 5 Reasons to Use DataDirect with Salesforce
Customers pick Progress DataDirect for Salesforce connectivity because of its security, performance, high availability and more.
Delivering Relevant Notifications When Monitoring Complex Systems and Applications
Corticon.js helps deliver relevant notifications in complex systems and applications monitoring.

Thierry Ciot January 12, 2023
Prefooter Dots
Subscribe Icon

Latest Stories in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation