Apache Spark Answers Life’s Biggest Questions

July 15, 2015 Data & AI

Michael Coutsoftides, Principal Solutions Engineer at Progress DataDirect

Michael Coutsoftides discusses the applications of Apache Spark and why you should pay attention to “the most important new open source project in a decade.”

 

Apache Spark has come a long way from its humble beginnings, becoming one of the most important technologies in the Big Data space.

It’s so big that even Big Blue is taking notice.

Why? Because Apache Spark, promises to increase performance by up to 100 times for certain applications.

Big Blue and Big Data

On June 15,  2015, IBM announced a major commitment to Apache Spark, calling it “potentially the most important new open source project in a decade.”

IBM has committed to:

  1. Embed Spark into their industry-leading Analytics and Commerce Platforms
  2. Offer Spark as a service on the IBM Cloud
  3. Apply more than 3,500 researchers and developers to work on Spark projects worldwide
  4. Educate more than one million data scientists and engineers on Spark

IBM presented a number of compelling use cases illustrating how Spark is transforming business and driving innovation:

  1. Optibus uses Spark to improve public transportation with real-time planning software
  2. Findability Sciences uses Spark and IBM Watson to increase processing capabilities for streaming data generated by IoT devices
  3. Independence Blue Cross improves healthcare by using Spark to analyze clinical data and radiologic imaging associated with hip implants, making it easier to predict which patients are at risk for clinical complications
  4. NASA and the SETI Institute use Spark to analyze terabytes of complex, deep space radio signals in their search for extraterrestrial life

Sparking the Search for Life

Given my inner-geek, I was most excited about the partnership between IBM, NASA and SETI. Carl Sagan fans may recall that The SETI Institute is an organization dedicated to “explore, understand and explain the origin, nature and prevalence of life in the universe.”  For more than 35 years, they have tirelessly searched the cosmos for signs of intelligent life. Hopefully, that search is going to get easier now that SETI is using Spark running on IBM Bluemix to help their hunt.

As you read this, Spark is analyzing over 100 million radio signals, collected by the Allen Telescope Array. SETI is using Spark to analyze signals to see if they come from the same location, even if signals are spread out over a period of years or the signal composition is different. It’s a bit like searching for a needle in a billion haystacks, but the machine-learning capabilities of Spark will act like a high-powered magnet—pulling out the important data.

Thanks to Spark, we may soon learn the answer to one of life’s biggest questions: Are we alone in the universe?

Big Data Analytics for Everyone

Spark is available for applications much broader than looking for aliens, however. (It can also be used to analyze data about fictional aliens!) Any kind of analytics process can benefit from Spark’s high performance. SparkSQL  makes it very easy to connect to existing business intelligence (BI) and analytics software like Tableau or Microsoft’s Power BI using ODBC or JDBC data connectivity. To make the most of this connection, you need a high performance driver from Progress® DataDirect®.

Get Connected with Spark SQL

On June 2, DataDirect, the leader in ODBC and JDBC connectivity across relational, NoSQL, Big Data and SaaS application access, announced the release of our enterprise-class SparkSQL driver. Our drivers enable you to fully leverage the speed of Apache Spark for the fastest Big Data analytics possible.  Don’t hesitate—you can get a free trial of our SparkSQL driver today!

Michael Coutsoftides

As a Principal Solutions Engineer at Progress Data Direct, Michael eats, sleeps and breathes data connectivity. He is dedicated to developing and implementing proven, high-performance data connectivity solutions, empowering enterprises to better manage and integrate data across Big Data, Cloud and Relational data sources. Follow him at https://twitter.com/DataSherpa

Read next Progress to Acquire NoSQL Database Pioneer, MarkLogic