Using AWS Glue and Spark with MongoDB via JDBC

Using AWS Glue and Spark with MongoDB via JDBC

Posted on April 02, 2018 0 Comments
Using AWS Glue and Spark with MongoDB via JDBC_870x450

Note: You can now connect to various data sources easily from AWS Glue using the Progress DataDirect Cloud Connectors that are available in the marketplace. Learn how to connect to Salesforce from AWS Glue Connectors in this new tutorial.


Learn how to access MongoDB using a DataDirect JDBC driver with AWS Glue.

What is AWS Glue?

AWS Glue is an Extract, Transform, Load (ETL) service available as part of Amazon’s hosted web services. Glue is intended to make it easy for users to connect their data in a variety of data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view.  Announced in 2016 and officially launched in Summer 2017, Glue greatly simplifies the cumbersome process of setting up and maintaining ETL jobs.

Why MongoDB?

MongoDB is an open-source, NoSQL data store. Rather than the tabular rows and columns format of relational databases, MongoDB uses documents and schemas. MongoDB has grown in popularity and is generally ranked among the top 5 most popular data stores. At Progress, we've seen increased interest in learning how to use MongoDB in an Amazon AWS Glue environment.

AWS Glue


JDBC and Glue

Glue supports accessing data via JDBC and currently the databases supported by Glue through JDBC are Postgres, MySQL, Redshift and Aurora. Of course, JDBC drivers exist for many other data sources besides these four. If you want to access any other database with JDBC, you can do so using JDBC drivers through Spark connections. The data can then be processed in Spark or joined with other data sources, and AWS Glue can fully leverage the data in Spark.

Using JDBC connectors you can access many other data sources via Spark for use in AWS Glue. For example, this AWS blog demonstrates the use of Amazon Quick Insight for BI against data in an AWS Glue catalog. Quick Insight supports Amazon data stores and a few other sources like MySQL and Postgres.

With DataDirect JDBC through Spark, you can open up any JDBC-capable BI tool to the full breadth of databases supported by DataDirect drivers, including MongoDB, Salesforce, Oracle and many others.

Accessing JDBC Data through Spark with DataDirect

So, how do you setup a JDBC connection to access data through Spark using a JDBC driver? Here is a quick overview of the simple steps to get started.

  • Download and locally install the DataDirect JDBC driver, then copy the driver jar to Amazon Simple Storage Service (S3). The drivers have a free 15 day trial license period, so you’ll easily be able to get this set up and tested in your environment.
  • Create your Amazon Glue Job in the AWS Glue Console.
  • Follow our detailed tutorial for an example using the DataDirect Salesforce driver. The same steps will apply for MongoDB or any other DataDirect JDBC driver.

Get Started with DataDirect JDBC and AWS Glue

The industry standard for JDBC database connectivity, the Progress DataDirect JDBC drivers solve the limitations of Type 4 JDBC drivers, delivering the fastest, most scalable Java application performance. The DataDirect line of JDBC drivers supports all major databases and include advanced enterprise functionality such as application failover, bulk load, SSL data encryption, and operating system authentication using the Kerberos protocol. DataDirect also publishes a Security Vulnerability Response Policy to address  all databases in a timely manner—including SaaS, big data and relational sources.

Download a DataDirect JDBC driver today and get started with AWS Glue.

Start My Trial

Nishanth-Kadiyala

Nishanth Kadiyala

Nishanth Kadiyala is a Technical Marketing Manager at Progress. He got his B.Tech degree from IIT Guwahati and his MBA from UNC Chapel Hill. He has worked on several technologies including database designing, SQL querying and Cloud Computing in the past. Currently, he is committed to educating enterprises about standards based connectivity via ODBC, JDBC, ADO.NET and OData. He is also proficient with DataDirect Hybrid Connectivity Services – DataDirect Cloud and Hybrid Data Pipeline. You can stay in touch with him through Twitter.

Comments

Comments are disabled in preview mode.
Topics

Sitefinity Training and Certification Now Available.

Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.

Learn More
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation