Amazon Redshift is a scalable, fully-managed and fast data warehouse used by organizations/users to analyze data in Petabyte scale with advanced security features built-in. R is a language used by statisticians and data scientists for statistical computing, graphical capabilities and data analysis. This makes R a perfect language that can be used along with Redshift for data analysis.
So, how do you start? Well, for beginners you can use Progress DataDirect Amazon Redshift JDBC driver to connect to Redshift from R and this tutorial will walk you through the steps.
install.packages("RJDBC",dep=TRUE)
library(RJDBC)
drv <- JDBC("com.ddtek.jdbc.redshift.RedshiftDriver", "C:\Program Files\Progress\DataDirect\JDBC_51\lib\redshift.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:datadirect:redshift://<;
hostname
>:5439;DatabaseName=dev", "<
user
>", "<
password
>")
//List all tables
dbListTables(conn)
//List User tables
dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")
//Executing simple queries
dbGetQuery(conn, "select count(*) from venue")
dbGetQuery(conn, "select * from venue where venueseats > 30000")
//Reading a table
venues <- dbReadTable(conn, "venue")