
MarkLogic Data Curation

Instructor-Led Training


Course Description

Learn to build a MarkLogic Data Hub powered by the MarkLogic database to help accelerate data integration projects and deliver faster time to value to your customers. This course is only recommended if you are using the MarkLogic Data Hub or want to learn the Hub Central interface.

By completing this course you will be able to:

  • Describe what the MarkLogic Data Hub is
  • Describe when and why you would use the MarkLogic Data Hub
  • Create a MarkLogic Data Hub project
  • Implement a security model
  • Deploy project configuration using ml-gradle
  • Configure entities
  • Configure indexes
  • Control access to sensitive PII (personally identifiable information)
  • Create flow pipelines to ingest, curate, and master data
  • Run and debug flows
  • Configure ingestion steps
  • Use MarkLogic processors for Apache NiFi
  • Configure mapping steps
  • Use pre-built mapping functions
  • Develop and deploy custom mapping functions
  • Integrate RDF data (semantic triples) in a hub
  • Program, deploy and run a custom data harmonization step
  • Configure Smart Mastering matching and merging steps
  • Access curated data from the hub using JavaScript and SPARQL


Data Architect, MarkLogic Developer, Data Engineer


8 hours

Course Outline

Data Services First

  • Understand the high-level approach to data integration projects using the MarkLogic Data Hub
  • Understand the customer and business requirement for the course hands-on project
  • Understand the user stories and technical requirements for the course hands-on project
  • Understand the data sources available for the course hands-on project

The MarkLogic Data Hub

  • Understand what it is
  • Understand what it does
  • Initialize and install a new MarkLogic Data Hub project

Implement Security

  • Create users and roles for both business users and members of the technical project team
  • Understand how to use Data Hub specific roles
  • Implement role hierarchies
  • Assign execute privileges necessary to meet project requirements
  • Deploy security configuration using QuickStart and ml-gradle

Create an Entity

  • Create a new entity
  • Define properties
  • Configure Indexed
  • Protect access to PII (personally identifiable information)

Ingest Data

  • Create flow pipelines
  • Configure ingestion steps
  • Understand the purpose and use of the staging and final databases in a MarkLogic Data Hub
  • Implement key data modeling concepts including document URIs, collections, document permissions, property naming best practices, geospatial data modeling patterns, denormalization, and the use of the envelope pattern

Curate Data

  • Configure mapping steps
  • Use pre-built mapping functions
  • Program, deploy and use a custom mapping function
  • Test and debug mapping steps

Use Semantics

  • Understand key semantic data modeling concepts including triples, IRIs, ontology triples, managed and unmanaged triples
  • Load triples to a MarkLogic Data Hub
  • Program, deploy and use a custom harmonization step to add triples to the envelope of a document

Access Data

  • Explore the use of JavaScript APIs
  • Explore the use of SPARQL
  • Validate that the curated data from the hub can be used to meet the business and technical requirements for the hands-on project

Adapt to Change: Perform Another Iteration of Ingest | Curate | Access

  • Ingest a new data source
  • Curate the new data so that it can be consumed in the same way as existing data

Use Smart Mastering

  • Configure a matching step
  • Configure a merging step
  • Test Smart Mastering
  • Explore mastered data

How to Enroll

Instructor-Led Option

This course is available as a free publicly scheduled instructor-led course! Please, refer to our schedule to select the most suitable date for you.

See dates
Services prefooter banner

Interested in the Class?

Stay up to date with technology trends and get the most out of your Progress technology investment.