Tutorial: Querying External Data in Hive Using the JDBC Storage Handler

November 08, 2018 Data & AI, DataDirect

It’s now easier than ever to work with external data in Apache Hive. Learn how you can quickly connect Hive to Salesforce with Progress DataDirect.

With the inclusion of the JDBC Storage Handler, Hive now makes it easier to access and query your data from external sources. In this tutorial, we’ll walk through the steps of connecting Hive to an external Salesforce instance using a Progress DataDirect JDBC connector. 

What is Apache Hive?

Apache Hive is one of the most popular open source data warehouses in use today. Built to withstand the big-data forces of Hadoop and sporting a user-friendly SQL-like query interface, Hive is a fantastic resource for managing and analyzing large datasets. Hive’s earliest users include Facebook, Netflix and Amazon. If it can handle the amount of data these companies are creating, then it likely will handle yours as well. 

What is Apache Hive Storage Handler?

Starting with version 2.3, Hive introduced a new and powerful feature called the JDBC Storage Handler. This new functionality allows you to connect and query any data source with a JDBC connector. This becomes immensely helpful as you invariably will need to manage and analyze more than just what resides in your data warehouse. And while Hive has always had some limited capability to handle external data (vs managed), this new upgrade makes it easier and more seamless to do so.

Using the Apache Hive Storage Handler with Progress DataDirect JDBC Drivers

It’s great to talk about this new product feature, but it’s better to actually start working with it! My colleague Saikrishna Bobba has assembled instructions to get you up and running quickly. In this example, he’s going to walk you through the steps of connecting Apache Hive to your Salesforce instance using the Progress DataDirect Salesforce JDBC Connector.

Once you’ve walked through it, you’ll be able to use this process to connect Hive to any external source for which you have a JDBC connector. Get started today with a free trial download of our DataDirect JDBC drivers and see what data you can bring into Apache Hive!

Read the Hive Tutorial

Download a JDBC Trial Today 

James Goodfellow

James Goodfellow is a Senior Product Marketing Manager at Progress and focuses his efforts on the DataDirect suite of solutions. Through his tenure at companies like Progress and SAS, he has spent the bulk of his career launching successful marketing campaigns for data and analytics products. James blogs here and around the web on topics such as data connectivity, analytics, IoT, visualization and machine learning. You can follow him on twitter at @jcgoodfellow.

Read next Top 5 Reasons to Use DataDirect with Salesforce