One of the most popular interfaces for exposing and consuming data is using a REST API, and one of the best ways to store data at a low cost and analyze the data later is HDFS.
Data Architects encounter scenarios where the business wants to store data from REST API’s as they get updated information, so that the BI/Analytics team can run analytics on top of this data. To do this currently we need a custom plugin/solution to bring that data from the REST API to HDFS. Wouldn’t it be great if you could use an existing tool like Sqoop to bring in the data from REST API with out having to write a single line of code?
That is exactly what we are going to do with the help of Sqoop and Progress DataDirect Autonomous REST Connector. Autonomous REST Connector allows you to query API data using standard SQL by normalizing the semi-structured JSON response data into structured data that can be used with SQL.
In this tutorial we will use Autonomous REST Connector with Sqoop, where Autonomous REST Connector will connect to Alpha Vantage REST API (this is an example, you can use any REST API) to get stock exchange data. Sqoop will ingest this data into HDFS for your future analytical needs. Let’s take a look at how you can do this.
Before you start the tutorial, we assume you have the following environment and tools setup:
/home/<
user
>/Progress/DataDirect/JDBC_60/lib/autorest.jar
cp /home/<
user
>/Progress/DataDirect/JDBC_60/lib/autorest.jar $SQOOP_HOME$/lib/autorest.jar
jdbc:datadirect:autorest:sample=https:
//www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=PRGS&interval=1min&apikey=<apikey>
{
"query"
:{
"#path"
:[
],
"Meta Data"
:{
"1. Information"
:
"VarChar(84)"
,
"2. Symbol"
:
"VarChar(64),#key"
,
"3. Last Refreshed"
:
"Timestamp(0)"
,
"4. Interval"
:
"VarChar(64)"
,
"5. Output Size"
:
"VarChar(64)"
,
"6. Time Zone"
:
"VarChar(64)"
},
"Time Series (1min){Timestamp(0)}"
:{
"1. open"
:
"Double"
,
"2. high"
:
"Double"
,
"3. low"
:
"Double"
,
"4. close"
:
"Double"
,
"5. volume"
:
"Integer"
},
"function"
:{
"#type"
:
"VarChar(64),#key"
,
"#virtual"
:
true
,
"#default"
:
"TIME_SERIES_INTRADAY"
,
"#eq"
:
"function"
},
"symbol"
:{
"#type"
:
"VarChar(64),#key"
,
"#virtual"
:
true
,
"#default"
:
"MSFT"
,
"#eq"
:
"symbol"
},
"interval"
:{
"#type"
:
"VarChar(64),#key"
,
"#virtual"
:
true
,
"#default"
:
"1min"
,
"#eq"
:
"interval"
},
"apikey"
:{
"#type"
:
"VarChar(64),#key"
,
"#virtual"
:
true
,
"#default"
:
"<your api key>"
,
"#eq"
:
"apikey"
}
}
}
Note: Please use your api key in the #default property under apikey json object.
sqoop import --connect
'jdbc:datadirect:autorest:config=/db/hadoop/scripts/AlphaVantage.rest;TransactionMode=ignore'
--table TIMESERIES
1
MIN --hive-import --create-hive-table --hive-table TIMESERIES
1
MIN --driver com.ddtek.jdbc.autorest.AutoRESTDriver
sqoop import --connect
'jdbc:datadirect:autorest:config=/db/hadoop/scripts/AlphaVantage.rest;TransactionMode=ignore'
--query
"select * from TIMESERIES1MIN where QUERY_SYMBOL='AAPL' AND \$CONDITIONS"
--hive-import --create-hive-table --hive-table TIMESERIES
1
MIN --driver com.ddtek.jdbc.autorest.AutoRESTDriver --target-dir /usr/hadoop/timeseriesquery --split-by key
It’s so easy to import any REST API to HDFS using Progress DataDirect Autonomous REST Connector and you won’t have to write a single line of code to do so. Feel free to try out Autonomous REST Connector with any API. If you have any questions, feel free to talk to us.