Build a Geospatial Data Hub with the MarkLogic Esri Connector

January 16, 2020 Data & AI, MarkLogic

Many enterprises and government agencies need to combine location data with other critical data assets to drive important business decisions. Esri is a global leader in Geographic Information Systems (GIS), and with their ArcGIS® geospatial platform, they offer a wide range of GIS capabilities to build analytical intelligence and enable these decisions. Combining Esri’s best-of-breed geospatial capabilities with the MarkLogic Data Hub platform allows organizations to analyze and visualize location information with any shape of data, such as structured, unstructured, and semantic, all in real-time and in a highly secure data platform. This means organizations can leverage location analytics for all of their data faster and in new, innovative ways.

Integration with Esri

Most Esri products, such as ArcGIS Pro and ArcGIS Online, support the GeoServices API, a REST API specification originally developed by Esri. The MarkLogic Esri Connector is a set of components designed to search and deliver data following the GeoServices specification.

One of the core concepts of the GeoServices specification is the idea of feature services where data is queried and retrieved as a set of features and feature layers – a data model which translates well with how most geospatial applications are designed. Data for these features can be scattered across multiple data sources which the MarkLogic data hub platform is designed to integrate. The MarkLogic Esri Connector thus acts as a bridge between the data hub and an Esri application by providing a GeoServices REST API for Esri products to consume.

Figure 1: The MarkLogic Esri Connector Overview

The GeoServices API provides a SQL-style approach to querying and filtering data. The key to making this work with multi-model data in MarkLogic is a set of capabilities that were introduced in version 9 – the Optic API and Template-Driven Extraction (TDE). TDE allows you to design templates that “project” structured or unstructured data in a row-based view and/or RDF triples, allowing it to be queried using MarkLogic’s Optic API, SQL and SPARQL. With these capabilities, you can combine different querying mechanisms – such as the full-text search – with queries against other indexes supported by MarkLogic, such as geospatial, row and triple indexes. This can all be done via the Optic API.

Data Architecture

All data in MarkLogic, including geospatial data, is stored in the Universal Index. We can use TDE to map that data into rows in the row index and/or into the triple index, and the geospatial data is indexed into the geospatial index. The Optic API can then be used to join all that data together and expose it out as layers on a map that contain features, points, or polygons on the map that are linked back to the documents in which they came from inside of the MarkLogic database.

Figure 2: Data Architecture Overview

MarkLogic Esri Connector Components

The MarkLogic Esri Connector is made up of three components, Geospatial Data Services, MarkLogic Koop Provider, and MarkLogic Add-in for ArcGIS Pro®:

  1. The backend component is Geospatial Data Services, which uses the Optic API and TDE to get geospatial data out of MarkLogic as GeoJSON. It exposes the Query API and manages metadata.
  2. The second component is the MarkLogic Koop Provider, which is a plugin to a platform called Koop – an open-source Node.js Express project that Esri maintains. The Koop Provider allows you to map the geospatial JSON data that comes out of Geospatial Data Services into JSON that all the Esri tools can consume.
  3. The third component is the MarkLogic Add-in for ArcGIS Pro®, which is a native add-in that runs in ArcGIS Pro desktop applications and provides users a search experience against their data stored in MarkLogic, from within the ArcGIS.

The Geospatial Data Services component uses the concept of service descriptors to define and manage the feature services and layers you wish to be exposed. Service descriptors allow you to:

  • Define feature services
  • Configure layers
  • Set the data source for each layer
  • Set the bounding query for each layer
  • Set what geospatial indexes to use

Fundamentally, service descriptors are a declarative way to generate Optic API plans that can be executed by the connector.

Figure 3: The MarkLogic Esri Connector Architecture

Optic Query Pipeline

Under the covers, the Optic API uses “Query Plans” or “Query Pipelines,” which define what pieces of data are needed using different views. The start of the pipeline is the data sources –  you can pull in multiple views, whether those are row-based views or SPARQL-based views, and you can join that data together within one pipeline. The documents that are included are initially limited by the “bounding query”. The bounding query is a combination of a query specified in the layer definition in the service descriptor and any geospatial constraints that the client has specified in the call to the feature service. The bounding query in the layer definition is a powerful tool that allows a service to only expose the features from documents that meet potentially complex criteria. Users of the connector can use this mechanism to create services and layers for focused analysis tasks, potentially reducing a database of millions of documents down to just the documents they want to expose for geospatial analysis in the Esri tools.

Once the bounding query has been applied, the rows are limited by the optional SQL WHERE condition that is included in the call to the Esri feature service. This allows users to specify what features they want to come back based on the conditions applied to attributes of the features. There are additional stages that allow you to tell the API what you want to order by, offset, and limit so you can do pagination. You can join information from the document into the final result, and then transform it into the geospatial JSON data that is produced by the Geospatial Data Services.

The Esri feature services also allow you to do aggregations. The aggregation pipeline looks the same, except there’s a “group by” stage in the pipeline that allows you to do aggregate calculations against the attributes of the features.

Figure 4: Optic Query Pipeline

Summary

To summarize, the Esri connector provides a configuration-based approach to expose data from existing or new applications built on MarkLogic. The MarkLogic Esri Connector exposes the power of full-text indexing, geospatial indexing and row-based indexing via one standard interface to expose data to Esri tools. Read more about the advantages of the MarkLogic Esri Connector in the Doing More with Your GIS Information Sheet.

The advantages of the MarkLogic Esri Connector are especially critical for defense, intelligence, and national security organizations, which are highlighted in the MarkLogic demonstrations with Esri at the Esri User Conference in 2019. You can also watch demonstrations that walk through how the MarkLogic Esri Connector works in the Building a Geospatial Data Hub with MarkLogic presentation, which also goes through many different kinds of use cases.

Related Resources

MarkLogic Esri Connector Technical Resources – Access all the technical resources related to the MarkLogic Esri Connector, including documentation, blogs, written tutorials, GitHub Repositories, videos, and more.

Esri User Conference 2019 — Find out more about MarkLogic’s role at the Esri User Conference in 2019.

Doing More with Your GIS — Read the PDF resource that quickly summarizes how MarkLogic and Esri integration delivers location analytics for all of your data.

Building a Geospatial Data Hub with MarkLogic — Presentation that demonstrates how organizations get a 360 view of their data, including the geospatial data and its context, using a MarkLogic Data Hub together with Esri.

James Kerr

James Kerr is a Software Fellow at Progress and the Product Manager for MarkLogic Server. He has been with the MarkLogic team for nearly 15 years, holding positions in consulting, alliances, cloud enablement, performance engineering and product management. During this time, James has worked with customers to build some of the largest, most complex data management systems on MarkLogic. He now brings that extensive customer experience and operational knowledge from the field to help guide the product into its next chapter as part of Progress.