In my previous blog I explained why upfront high-level modeling is essential. I recommend using the Unified Modeling Language (UML), as it helps to visually depict your model for greater clarity. UML can feed into MarkLogic’s Entity Services, which is a shockingly low-effort means to model-driven data management in MarkLogic. When I first played with it, I was surprised how little input I had to provide to reap a treasure chest of outputs.
My UML-to-Entitly-Services toolkit provides the ability to transform a UML data model to a MarkLogic Entity Services model. To use it, you’ll need MarkLogic 9-0.3 or later plus your preferred third-party UML modeling tool. The UML tool you select must support UML 2.x, must be able to export UML models to XML Metadata Interchange (XMI) 2.1, must be able to import UML profiles, and must support stereotypes and tagged values. In the suite of examples featured in the toolkit, I used two such tools: MagicDraw 18.5 and Eclipse Modeling Framework 2.x. The toolkit includes several UML examples that demonstrate the model-driven workflow process.
Let’s use one of these examples – the movie model – to walk through the process. For comparison, refer to the toolkit’s documentation of this example, which both describes the recipe and provides the finished product. In this post, we’ll follow along with the recipe.
The first step is to open your favorite UML editor, create a new UML model, and import into the model the toolkit’s UML profile for MarkLogic Entity Services. The profile is an XMI file. Follow the approach specific to your UML tool to add this profile to your model.
Next draw the movie model. You will be composing a UML class diagram consisting of classes, their attributes, and class relationships. I used MagicDraw, but any UML tool that meets the requirements will suffice. Here is what the final model looks like:
Figure 1: UML class diagram of movie data
At a high level, the model describes two main types of data, movies and contributors. Contributors are of two types: persons (actors, directors, writers, etc) and companies (production companies, special effects companies, etc). There is a many-to-many relationship between movie and contributor, and we express that relationship as role. A contributor performs a role (or perhaps several roles) in a movie; the set of roles for a contributor is that contributor’s filmography. A movie’s cast is the set of roles — director roles, actor roles, writer roles, production company roles, and others — in that movie. A movie also has a set of parental certificates, i.e. the parental ratings per country for the movie. A movie and a person contributor can have user documents. These are user-contributed posts, such as actor biographies and movie plot summaries.
The model has three levels of structure. At the highest level is package, which describes the overall model and maps to the Entity Services notion of model. In MagicDraw, the package details are configured in a separate dialog window, shown in the Figure 2. We name our package MovieModel and tag it with two properties that are needed by Entity Services: baseUri and version. These tags belong to the esModel stereotype from the custom profile.
Figure 2: Package details with two properties tagged
At the next level is classes. Our model has seven classes: Movie, MovieContributor, PersonContributor, CompanyContributor, UserDocument, ParentalCertifcate, and Role. These map to Entity Services entities. Notice that two of the classes are stereotyped:
Each class contains one or more attributes, which map to Entity Services properties. An attribute has a name, a type, multiplicity, and can be stereotyped with Entity Services configuration. Here are a few examples from the class Movie:
Especially interesting in this model are the class relationships:
From the UML tool, export the class diagram to an XMI file. It is now time to transform the XMI to an Entity Services model descriptor. The toolkit provides a gradle-based utility to do this. The basic steps are the following:
The README file in the toolkit explains these steps in detail.
Let’s review the mapping for our movie model. The following code listing is an excerpt of the model descriptor produced by the transformation. (If you compare it to the UML diagram in the previous section, you see how the mapping worked. Refer to the next section for a general reference guide to the mapping.)
{ "info": { "title": "MovieModel", "version": "0.0.1", "baseUri": "http://com.marklogic.es.uml.movie"}, "definitions": { "Movie": { "properties": { "movieId": { "datatype": "string"}, "seriesId": {"datatype": "string"}, "countries": {"datatype": "array", "items": {"datatype": "string"}}, "imdbUserRating": {"datatype": "float"}, "parentalCerts": {"datatype": "array", "items": {"$ref": "#/definitions/ParentalCertificate"} }, "cast": {"datatype": "array", "items": {"$ref": "#/definitions/Role"} } }, "required": ["movieId", "seriesType", "releaseYear", "runningTime", "imdbUserRating"], "primaryKey": "movieId", "elementRangeIndex": ["seriesType", "releaseYear", "genres", "runningTime", "imdbUserRating"] }, "Role": { "properties": { "roleType": {"datatype": "string"}, "roleNames": {"datatype": "array", "items": {"datatype": "string"}}, "contribClass": {"datatype": "string"}, "refMovieContributor": {"datatype": "string"}, "refMovie": {"datatype": "string"} }, "required": ["roleType", "contribClass"] } } }
The most important artifact that the Entity Services library generates is the conversion module. It is expected that the developer will modify this generated code. We modify the movie conversion module as follows:
The modified conversion module is here.
With these changes in place, we proceed to ingest data. The gradle toolkit provides sample movie data. It shows how to use the gradle MarkLogic Content Pump (MLCP) plugin to ingest data from JSON files to MarkLogic. We use our conversion module as an MLCP transform, mapping the JSON source files to XML envelopes whose structure follows that of the model.
We conclude by running a few queries to explore the ingested movie data to verify that it meets the design goals of our UML model. We use the Query Console workspace.
“Movie Parentals, Cast, Docs” tab has a query to retrieve the details of a movie, its parental certificates, its roles (i.e., cast), and its user documents. Notice the parental certificates and roles are contained within the movie. For the user documents, we use cts:search() to find user documents that refer to the movie.
let $movie := fn:doc("/xmi2es/imdb/movie/movies1.xml") let $docs := cts:search(fn:doc(), cts:and-query(( cts:collection-query("movieDoc"), cts:element-value-query(xs:QName("movieDoc"), $movie//movieId) ))) return ("Movie", $movie, "Parental", $movie//ParentalCertificate, "Cast", $movie//Role, "Docs", $docs)
Here is an excerpt of the output:
Movie:
<es:envelope xmlns:es="http://marklogic.com/entity-services"> </es:info> <es:info> <es:title>Movie</es:title> <es:version>0.0.1</es:version> </es:info> <Movie> <movieId>Gut Fellas</movieId> <seriesType>feature</seriesType> <releaseYear>1987</releaseYear> <countries datatype="array">USA</countries> <countries datatype="array">UK</countries> <imdbUserRating>1.8</imdbUserRating> <parentalCerts datatype="array"> <ParentalCertificate> <country>Chile</country> <currentCertificate>scandalous</currentCertificate> </ParentalCertificate> </parentalCerts> <cast datatype="array"> <Role> <roleType>actor</roleType> <roleNames datatype="array">Tony Blair</roleNames> <contribClass>person</contribClass> <refMovieContributor>Billy Wonka</refMovieContributor> <refMovie>Gut Fellas</refMovie> </Role> </cast> </Movie> </es:instance> </es:envelope>
Parental:
<ParentalCertificate xmlns:es="http://marklogic.com/entity-services"> <country>Chile</country> <currentCertificate>scandalous</currentCertificate> </ParentalCertificate>
Cast:
<Role xmlns:es="http://marklogic.com/entity-services"> <roleType>actor</roleType> <roleNames datatype="array">Tony Blair</roleNames> <contribClass>person</contribClass> <refMovieContributor>Billy Wonka</refMovieContributor> <refMovie>Gut Fellas</refMovie> </Role>
Docs:
<?xml version="1.0" encoding="UTF-8"?> <es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>UserDocument</es:title> <es:version>0.0.1</es:version> </es:info> <UserDocument> <docId>92d38bed-275b-4074-9e92-5adcdef175aa</docId> <authorId>Happy Cross</authorId> <docText>A satire of politics in a post-truth world</docText> <docType>plot</docType> <docSubType></docSubType> <movieDoc>Gut Fellas</movieDoc> </UserDocument> </es:instance> </es:envelope>
We leverage the TDE template generated when we deployed the model to run SQL queries against our data. “Company and Filmography SQL” tab has a query to find a company and its filmography. The SQL is a join of CompanyContributor and its contained filmography. Under the covers, there is nothing to join: one document has all. Document-structured data is made to look relational!
Finally, the “Person and Bios SQL” tab has a SQL query to show person contributors and their bios. This query joins PersonContributor and UserDocument, which really are separate documents. Recall UserDocument has a reference to PersonContributor.
Next, I walk through how to use the toolkit for UML modeling with the data hub using semantics.
View all posts from Mike Havey on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites