Establishing and maintaining a data warehouse is critical for enterprise organizations that want to use quality data to make informed business decisions. ETL tools such as Matillion ETL are often used to access data and pull it into a warehouse.
Matillion ETL is a cloud-native data integration platform designed to work with a wide range of cloud data sources, including Azure Synapse Analytics, Amazon Redshift, Google BigQuery, and Snowflake. It offers users an easy drag-and-drop interface to quickly and easily move data into data warehouses from various sources. While Matillion ETL supports many data sources, it doesn’t have an easy way to integrate on-premises data, requiring complicated VPN configurations to bridge the gap. However, Hybrid Data Pipeline from Progress DataDirect offers a solution to this challenge.
Hybrid Data Pipeline is a light-weight connectivity service that enables secure connectivity to cloud and on-premises data sources. Hybrid Data Pipeline also has the flexibility to work with a variety of data sources and platforms. In addition, rather than requiring complicated VPN setups or SSH tunnels, such as those in the Matillion use case, Hybrid Data Pipeline includes an on-premises agent that allows cloud platforms to securely access data behind a firewall.
This tutorial shows how to connect Matillion ETL to on-premises data using Hybrid Data Pipeline. Here is a high-level view of an integration:
The following items are required to complete the tutorial:
The following steps describe how to download Hybrid Data Pipeline components.
Note: After each download, you may have to use your browser's back button to return to the download page and download the next component.
The following guides show a number of ways Hybrid Data Pipeline may be deployed.
Note: The Hybrid Data Pipeline server must be configured for SSL during deployment to use the On-Premises Connector to connect to on-premises data sources.
Hybrid Data Pipeline uses the On-Premises Connector to enable connectivity to on-premises data. The On-Premises Connector must be installed on a Windows host on the same network in which the data resides. The On-Premises Connector may only be installed after the Hybrid Data Pipeline server has been installed. During installation of the Hybrid Data Pipeline server, four configuration and certificate files are generated. These files must be copied to the directory from which the On-Premises Connector installation program will be run. For step-by-step instructions on installing the On-Premises Connector, refer to Installing the On-Premises Connector in the Hybrid Data Pipeline Deployment Guide.
To confirm that the connection between the Hybrid Data Pipeline server and the On-Premises Connector is active, open the Configuration Tool program on the Windows host machine. Then, from the Status tab, click Test. All the tests should return green:
A Hybrid Data Pipeline data source defines the connection parameters to your on-premises data and enables you to access it. Take the following steps to create the data source.
https://MyServer:8443/hdpui
Important: From the Connector ID dropdown, select the On-Premises Connector.
You must install the JDBC driver on a machine where you can access the Matillion web interface. As with the On-Premises Connector, the JDBC driver may only be installed after the Hybrid Data Pipeline server has been installed. During installation of the Hybrid Data Pipeline server, four configuration and certificate files are generated. These files must be copied to the directory from which the JDBC driver installation program will be run. For step-by-step instructions on installing the driver, refer to Installing the JDBC Driver in the Hybrid Data Pipeline Deployment Guide.
Take the following steps to add the driver profile to Matillion ETL.
{ "name" : "Hybrid Data Pipeline", "driver" : "com.ddteck.jdbc.ddhybrid.DDHybridDriver", "url" : "jdbc:datadirect:DDhybrid://example.com:443", "fetchSize" : "500", "limit" : "top-n", "allowUpload" : "true" }
Note: The driver jar file will be located in the lib folder of the driver installation directory. For example:
C:\Program Files\Progress\DataDirect\Hybrid_Data_Pipeline_for_JDBC\lib
Result: You have successfully integrated the Hybrid Data Pipeline JDBC driver with Matillion ETL. You may now create database queries against Hybrid Data Pipeline data sources.
After adding the Hybrid Data Pipeline JDBC driver, you may proceed with a database query in Matillion ETL. The following screenshot captures the parameters of a query using a Hybrid Data Pipeline data source.
Note:
jdbc:datadirect:DDhybrid://example.com:8443;hybridDataPipelineDataSource=MyHDPOnPremDataSource
Thank you for taking the time to consider the Progress DataDirect solution for connecting Matillion ETL to your on-premises data. Please contact us for additional information. Click LEARN MORE for additional information about Hybrid Data Pipeline.