AWS Glue is an Extract, Transform, Load (ETL) service available as part of Amazon’s hosted web services. Glue is intended to make it easy for users to connect their data in a variety of data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view.
A lot of organizations now use REST APIs to expose and consume data. We often see that they also want to store this data coming from the REST APIs to provide real time business intelligence or analytics. The problem with this approach is that each of these REST APIs are built differently. Their authentication schemes differ, their response structures differ and when you want to bring in this data into AWS RedShift, S3 or EMR Hive using AWS Glue, you end up writing a lot of code for each of these services. This can mean a lot of unnecessary effort.
With Progress DataDirect Autonomous REST Connector, you can connect to any REST API without you having to write a single line of code and run SQL queries to access the data via a JDBC interface. In this tutorial we will show how you can use Autonomous REST Connector with AWS Glue to ingest data from any REST API into AWS Redshift, S3, EMR Hive, RDS etc., We will be using the Yelp API for this tutorial and we’ll use AWS Glue to read the API data using Autonomous REST Connector. Finally, we’ll write it to S3.
C:\Program Files\Progress\DataDirect\JDBC_60\lib\autorest.jar
To use the Yelp Fusion API, you’ll need to register as a developer and create an app on the Yelp developer website.
Provide the App Name, Industry, Contact Email and Description to create your App. You should now see the ClientID and API Key on your screen, allowing you to authenticate with Yelp’s API.
To connect to the Yelp API using Autonomous REST Connector, point it at the endpoint using default values.
For the tutorial, we will connect to the Business Search endpoint offered by Yelp. This allows us to get all businesses in our area or to search specific business categories.
To configure the driver to connect to the this endpoint, use the following JDBC URL:
jdbc:datadirect:autorest:sample=http://api.yelp.com/v3/businesses/search?location=27617;AuthenticationMethod=HttpHeader;AuthHeader=Authorization;SecurityToken='Bearer <
Your
API Key>'
When you connect to a REST API using Autonomous REST Connector, it will automatically sample the API and create a configuration, which you can access by querying the _CONFIGURATION table. You can get this configuration by using Autonomous REST Connector in any SQL querying tool like Dbeaver, Squirrel SQL etc.,
For this tutorial, download this config file from GitHub and save it as yelp.rest.
If you review the configuration, you will notice that Autonomous REST Connector has detected all the objects and their data types.
To learn more about Autonomous REST Connector and how you can configure it to connect to multiple endpoints, we recommend you go through these other tutorials after you have finished this one.
Before we start writing the Glue ETL job script, you will need to upload the Autonomous REST Connector autorest.jar file (from the install location) and the yelp.rest file to S3.
You can find the autorest.jar in the lib folder of the install location you chose in the previous section.
Note: Don’t forget to provide valid API Key in JDBC connection URL.
This is just one example of how easy and painless it can be with Progress DataDirect Autonomous REST Connector to pull data into AWS Glue from any REST API. Feel free to try the connector with any application you want. If you have any questions, please contact us or comment below.