The MarkLogic Sample Authoring App for PowerPoint

December 16, 2010 Data & AI, MarkLogic

This series of posts pertaining to the Office OpenXML formats provides an excellent introduction on how to create and manipulate the XML components that compose Word, PowerPoint, and Excel documents.  To get started with OpenXML and MarkLogic, review the document components as well as general XML and code examples. However, please note that each version of Microsoft Office introduces new namespaces and new XML elements, and Office applications can change how they produce and consume XML as well. As a result, you may need to update the examples to work with versions of Office after 2010. Since we first published this series, MarkLogic has continued to introduce new tools and programming languages that are useful in working with these documents as well, but the fundamentals demonstrated here remain the same.

The MarkLogic Sample Authoring App for PowerPoint was designed as a jump start for anyone doing Microsoft Office development in MarkLogic, showcasing the use of the MarkLogic Toolkit for PowerPoint in MarkLogic version 8 with Office 2007 or 2010. The toolkit uses tags — name and value for a property — to let authors enrich PowerPoint presentations, and associate and manage metadata. In addition, users can search and reuse existing tagged components and their metadata in new PowerPoint presentations.

Getting Started

In order to use this application, install the add-in and supporting XQuery library from the toolkit. The toolkit provides a guide for installation and comes with its own separate sample application.

To get started using this sample application, first update three areas with the URL of the application. See the README.txt, as well as the Sample Authoring App Developer’s Guide, both included in the download for the application.

Following is an overview of the Sample Authoring App functionality, but we can also configure this application to meet additional requirements. A “files of interest” section is also included in the guide, in case we just want to get in there and start hacking.

Enriching Content

In the Authoring screenshot below, we see the Tags palette. In PowerPoint, we can tag presentations, slides, and slide components (shapes within a slide). For the initial release of the Sample App, the components we can tag and roundtrip from MarkLogic include presentations, slides, textboxes, and images. We can select what we’d like to tag from the navigation bar located on the Tags tab. Below, the “presentation” tag object is currently selected.

We can enrich the presentation by simply selecting the presentation icon in the navigator bar of the Tags tab and clicking a button on the Tags palette to apply the tag. To enrich a slide, select the slide to tag in PowerPoint, select slides in the navigator bar of the Tags tab, and click the button in the Tags palette to apply the tag.

A list of tags that have been applied to the selected presentation, slide, or slide component, are listed in the Properties panel below the Tags palette. To the left of the tag name is a delete button for removing the tag from the selection.

Notice that the tag labels for presentations and slides in the above screen shots differ. These tags are configurable. The tag labels and the amount of tags made available for enrichment of presentations, slides, and slide components is determined by the configuration. See the Developer’s Guide for more detail.

The configuration is a simple XML file and it generates the HTML for the buttons on the palette as well as the associated JavaScript functions required for inserting the tags. The names used for enrichment will be the values used for Search once the document is saved in MarkLogic Server.

By clicking on a slide or slide component within an active presentation, the Properties section beneath the tags palette helps to navigate the tags by updating automatically and providing information about the tags currently applied to the object selected, which include:

  • the name of the tag
  • a delete button to remove the tag
  • if the tag belongs to a slide component (shape), a Save All button

If the content in a tagged component changes, click Save to successfully roundtrip the component from MarkLogic to a new presentation once the presentation is saved in MarkLogic Server.

Tags are only exposed through PowerPoint’s object model. There is no native functionality in the application for applying or displaying tags. We’ve done our best to provide all the visual cues you’ll need to navigate tags within the Authoring pane. By selecting slides and components, tags and messages will be displayed in the Properties panel to inform you what’s been tagged and what hasn’t.

Working with Metadata

Each time we add a tag to your presentation, a custom XML metadata part is added to the .pptx being authored. This custom metadata part is associated by ID with the tag added. Here, we can edit the metadata values for associated tags.

The metadata pane provides a list of the tags applied to the presentations, slides, and slide components. By clicking on a label in the list, the form for associated metadata will be displayed beneath the list view. Edit values, and as fields change, the values will be automatically saved to the metadata form.

Notice that for this example, there are three types of metadata forms associated with the tags. We are also using Dublin Core metadata, which is configurable. We can use other XML for the metadata, and can even have a different custom XML metadata form for each tag defined in the tag palette. Please refer to the Developer’s Guide included with the Sample Authoring App.

Deleting a tag from the document (via the delete button on the Properties panel of the enrich tab) also removes the tag’s associated metadata part from the .pptx.

Searching within MarkLogic

When we save a .pptx to MarkLogic Server, it is automatically unzipped and made available for search and reuse. On this pane, we can search for text found within any slides saved in MarkLogic.

Under the search box, we can select Components, Slides, or Presentations. This selection will filter the types of results returned, which also have different options available for reuse.

The results provide a count of search results as well as pagination.

When we search and have selected Presentations, all presentations that contain a slide that includes the search text value will be returned.

When we search and have Slides, any slide from any presentation including the search text value will be returned.

When we search and have selected Components, any component from any slide, from any presentation including the search text value will be returned.

For each result, we see:

  • the document title (if present in the extracted document properties part) or the URI of the source document in MarkLogic
  • the last modified date and last modified by for the source document
  • an icon informing which type component we are inserting (either slide, textbox, or image)
  • a snippet of text, with our search text in bold
  • rollover of a snippet will display more text for the result
  • a thumbnail image of the slide (representing the Slide or Slide containing Components)
  • an insert button (Components, Slides)
  • an undo button (Components, Slides)

The thumbnail image for Presentation results will be of the first slide in the deck.

The options of ‘undo’ and ‘insert’ are not available for Presentation search results, but are provided when searching for Components and Slides.

Clicking the document title will open the source presentation from MarkLogic Server.

Clicking insert will insert the slide into the active presentation at the current slide position. The search result may have tags with associated metadata parts. The slide may also include components that have been tagged and also include associated custom metadata parts. If any of the search results has tags and associated metadata parts in their source document in MarkLogic, those tags and metadata parts are retained and added to the presentation being authored. View and edit any metadata values by examining them in the metadata tab.

In the case that we’ve inserted something we didn’t intend to, click ‘undo’ and the inserted Slide or Component will be removed, as well as any associated custom metadata parts from the active .pptx package.

Finally, restrict searches through use of the filter tab. Click the drop down arrow to see a list of tag labels available to apply to the search, which is also configurable. To apply the criteria, click the Filter button. Closing the filter selection will keep the filter applied.

A Note on Slide Images

You may be asking: where did the thumbnails for slide images come from? Well, in the MarkLogic Toolkit for PowerPoint, we provided the ability to ‘Save To MarkLogic’ directly from the button in Office 2007 (You’ll also find the same option available from ‘Add-ins’ in the backstage view for Office 2010). When we save to MarkLogic, we also save the deck’s slide images, so we have them available for display in search results.

For Components, we reuse the slide images and provide cues through icons, snippets, and component labels to help guide authors in finding and inserting the content they seek to reuse in new presentations.

Conclusion

The Sample Authoring App is intended as a way to provide authors a way to enrich content in PowerPoint, as well as define and identify PowerPoint presentationsslides, and slide components for search, reuse, auditing, and tracking.

By providing a way for authors to reuse components of content across documents, identified by rich metadata, authors can then use MarkLogic to query where document components are being re-used. Also, if some document component includes text that needs to be updated across documents, the metadata can be used to run an update of components across all PowerPoint presentations in MarkLogic that include those sections.

Pete Aven