Apr 1, 2022
My name is Aaron Burg and I'm a sales engineer here at Progress DataDirect. Thanks for joining us for this overview and brief demo of Hybrid Data Pipeline. I often talk about Hybrid Data Pipeline, but as a Swiss army knife, it's really designed to ease your connectivity to a wide range of data sources, and then give you a single set of APIs to access that data. Whether it's data in the cloud, data on-prem, REST APIs, you can use one ODBC or JDBC driver or OData, a standards-based REST API to connect to those.
It also helps you solve challenges in the hybrid world where you have some data that might be behind a firewall. We have a small agent that helps get access to that, much like a OneDrive or Dropbox agent. So you don't need to worry about setting up VPNs or SSH tunnels. And really at the end, the whole goal is to unify all of your data into this one platform, to our data hub, so you only need to provide the single set of APIs, a single set of credentials to your data users, to be able to get their work done and get access to the data in your environment.
When you dig in a little bit more to Hybrid Data Pipeline, we support a wide range of data sources, including cloud sources, things like Salesforce or Oracle Marketing cloud, or Google BigQuery, we support the relational sources you would expect, like a SQL server or DB2 or Oracle. And then we also support the ability to drop in third party JDBC drivers for anything we may not have in here. Hybrid Data Pipeline is a self-installed Linux based platform that can be installed on-prem in the cloud. You can actually run it in Docker as well. So really up to you, wherever you'd like to run it. Wherever it makes the most sense in your environment.
The use cases we see most frequently for Hybrid Data Pipeline are the external data gateway. This is where you have data that might be living in your SaaS environment, in the cloud, and you want to securely provide access to that data to customers or people that may not be on your secure network. This gives you a way to do that over HTTPS. You're not having to open up direct access to your databases. You can also handle putting throttling controls on this to help protect your backend data. And with OData, you can even get more granular and limits specifically, maybe what tables they see, things like that. This gives you quite a bit of control for sharing data with customers or people outside the network, the internal data hub or kind of data gateway for internal use is going to be more where you have a wide range of BI and analytics users in your organization that need access to data, but you don't want to have to distribute a bunch of different drivers to them, different credentials, but with Hybrid Data Pipeline, you can give them the single set of drivers.
So one ODBC driver or JDBC driver, or even this OData REST API access. And with that, they can access any of the data sources that have been visualized or through Hybrid Data Pipeline. And then there's the cloud to ground access that I mentioned using that small on-premises connector agent, which helps you solve cloud to ground problems without needing to use a VPN. Architecturally Hybrid Data Pipeline, as I mentioned, runs on Linux. You can install it in the cloud, on-prem, and using Docker, can be a single node to deployment or multi node cluster deployment for high availability that sits behind a load balancer if you're choosing, we support the cloud load balancers, like say for web sockets, like the AWS, ALB or the Azure application gateway.
And then on the backend Hybrid Data Pipeline's going to connect to your data sources and you'd be consuming that on the front using the ODBC, JDBC, or OData APIs. It also will reach out to cloud sources, like Salesforce, and you'd be using those same APIs to access that. And likewise, you could be doing the same thing with the on-premises connector at remote locations. Regardless of whether your data is at a remote site, a cloud site, or network local to Hybrid Data Pipeline, you can connect to it using those single set of APIs.
Let's get into a quick demo of this. Ehen you log into Hybrid Data Pipeline, you'll see our UI, everything in the UI can also be accomplished through our management APIs, through REST APIs. If you want to integrate it into an existing platform, you can do that. We have the concept of tenants. Within those tenants, there are users, and then those users have their associated roles. The roles for a user could be as granular as limiting them to a single API that they can connect to for data access, all the way up to system administrator and you can get very granular with how you control user access. When you come down to managing the data sources, each user has their own set of data sources. They can be shared between users. In this case, for instance, I have a connection set up to a SQL server.
This is a SQL server that's actually running, using that on-prem agent. Since it isn't local to my demo instance, put in my credentials to the database, click test connect, and just like that we have our enabled access using our ODBC or JDBC connectors. If I come over here to a SQL test tool, I've installed our JDBC hybrid driver. I'm going to connect on HTTPS 443 to my Hybrid Data Pipeline instance. And let's put in my data source name, in this case is SQL, it could be an Oracle server, it could be Salesforce doesn't matter. Same driver, same thing would hold true for ODBC as well. Once we do that, click connect, it's going to authenticate the Hybrid Data Pipeline using my Hybrid Data Pipeline credentials. I don't need to provide database credentials. You could do that for ODBC or JDBC, but in this case, I'm storing those in Hybrid Data Pipeline.
The user doesn't even need to know how to access that backend source. Just like that, it looks like I'm connected directly to a SQL server. I can run a query and we have our data. If we wanted to OData enable or REST enable that data source, we can go to the OData tab in the HDP UI and very easily do that. You'll see here, the tables in this case, I'm going to take the album and artist's table. I'm going to add those to the OData API. Once we do that, when I come back over here, I can grab this URL and put that in my browser. Up here you'll see that we now have access. I would authenticate directly to Hybrid Data Pipeline, and now we'll be able to bring that data back as a JSON document, just like we saw using the JDBC driver.
This is all full CRUD as well for the sources that support it. So if we come back over to Hybrid Data Pipeline, a few other things I wanted to point out, we do have a SQL testing tool, which makes it very easy when you set up new data sources in here to test your connectivity to them. In this case, I'm quickly connecting to the SQL server and then bringing back that same data in the product. We do support a few different types of authentication. So you can set up authentication using a Java plugin that you build. We do support LDAP and then SAML authentication as well as OIDC.
Thank you for watching this presentation. We hope you found it useful. Don't hesitate to reach out with any questions. You can download a fully featured trial of Hybrid Data Pipeline from our website at www.progress.com/hdp.