Are you someone who’s more comfortable working in Graphical User Interface (GUI) than writing code? Do you want to have a visual representation of your data transformation pipelines? What if there was a way to empower users to visually enrich content and drive data pipelines without writing code?
With the community tool Pipes for MarkLogic Data Hub, you can. Pipes allows you to create a custom step for your Data Hub without writing code – instead, you simply connect blocks. Pipes provides a low-code solution to designing and running logic within a MarkLogic Data Hub.
What is Pipes?
Pipes is a tool for the MarkLogic Data Hub that produces code for a custom step using a GUI. There may be a case when you need to extend out-of-the-box Data Hub functionality with a custom step – with Pipes, you create Data Hub 5.x custom steps without coding.
Note: Pipes for MarkLogic Data Hub is acommunity tool. As such, it is not supported by MarkLogic Corporation and is only updated and corrected based on a best-effort approach. Any contribution or feedback is welcomed to make the tool better. Pipes is designed to run onMarkLogic 10.0-2 with DHF 5.1.0 installed.
Who is Pipes for?
Pipes is targeted towards data analysts and Data Hub developers. For data analysts, Pipes allows users to drive logic inside the Data Hub using a GUI, and build and tweak flows on their own. Instead of writing code for your custom step, you can define your complex data transformations with mouse clicks, much like drawing a diagram on a whiteboard.
For developers, Pipes gives you a starting point to accomplish tasks very quickly using building blocks so you don’t have to start from scratch. In addition, using building blocks in a GUI allows you to better communicate custom step functionality to business users, ensuring everyone is on the same page.
How Does it Work?
Simply put, Pipes converts visual blocks in a GUI into JavaScript for a custom step. The GUI uses LiteGraph, an open-source, node-based programming framework, which provides a UI and engine used to design and execute a visual graph in JavaScript. The graph is composed of building blocks, each associated with code executed in MarkLogic shared libraries. You design your step using these blocks, then, in the current version, the LiteGraph engine executes the graph inside MarkLogic. In the upcoming version Pipes will directly produce plain JavaScript code to be executed in MarkLogic.
Pipes can be used to build out complex scenarios, such as:
- Transforming multiple different source documents into multiple harmonized entities in one go
- Producing an array of entities from a single source file
- Harmonizing an entity from different source files (e.g. document + reference / meta-data)
- Documenting the provenance (origin) of every harmonized data point
In addition, Pipes has a Live-Preview Function where you can preview exactly what your custom step will output, either from a random doc in the source collection, or using a specific URI.
Create Your Own Blocks
You can extend Pipes by creating your own blocks, which adds features and functionality. If you need a specific computation or transformation that you plan to re-use in multiple places, you can create a custom block for it. Considering that a block can implement any logic and have its own settings, it’s also possible to provide high-level features such as:
- Value mapping: Map value from input to output based on a dictionary which is configured in the UI.
- Customer transactions: The block takes the customer ID as an input and returns the transaction statistics configured in the block. (e.g., the sum of the transactions performed during the last year).
- PROV-O: The graph can generate provenance data to be stored alongside the data.
- Custom transformations: use existing or purpose build JavaScript libraries to transform data. For example, to do coordinate conversion or buffering.
Use Cases for Pipes
While working on the Pipes, I have come across some situations where the tool has been handy to speed up development that include:
- Customizing envelopes: Add information to headers and triples, as well as dynamically add collections (e.g., based on some lookup, logic, computation, etc.). If you don’t want to include the attachment or want to put only URIs (no content) or some other lineage information, you can do that with Pipes.
- Customizing URIs in final database: Define your own URI for documents in Final, instead of using the one from Staging.
- Nesting and harmonizing: Re-use “values” in multiple contexts by computing sub-objects, which can then feed into multiple different elements of the final entities.
- Handling data without 1:1 mapping: Handle mismatches between input and output records when raw input records don’t match your entity model (e.g., you have 10 CSV “rows” and need to join them).
- Combining operations into a single step: Combine multiple “steps” into one transaction to avoid dealing with the partial states in between two steps, in case one fails.
Future Plans
Pipes currently uses the LiteGraph engine to execute the graph. We are currently preparing a new engine which generates Javascript code within MarkLogic so that Pipes will generate the code directly for better performance. The new engine is currently in beta and will soon be able to manage all existing Pipes blocks.
Get Started
Play around with the tool and let us know your thoughts and feedback so we can improve the tool. We like to hear your input for further improvements, and want to understand how your projects benefit from it. Or just pinpoint what’s missing (and add it yourself).
Get started today with this GitHub Wiki guide for Pipes.
Related Resources
Pipes GitHub Wiki Documentation — Get started with your first Pipes project.
Pipes GitHub Repository — Clone or download the tool today. Explore documentation and videos. Submit issues or tickets using GitHub issues.
Pipes Technical Resources — Explore the technical resources related to Pipes. Find documentation, blogs, demos, and more.
Eric Poilvet
Eric is director Solutions Architecture at MarkLogic. He supports the company's customers from the pre-sales phases to the release of solutions. He is involved, among other things, in manufacturing, media and insurance industries on the design of innovative solutions based on MarkLogic operational DataHub.
Eric is currently based in France and previously spent 2 years in MarkLogic London office.