Different databases are designed to solve different problems, and understanding the difference is critical to making the right decision for your data. In this guide, Brian Bowman details the options you may want to consider for your in-memory computing solution.
When it comes to databases and data access, in-memory computing is a hot topic these days. Users are constantly accessing more data while demanding that data at faster speeds. Embedded analytics and predictive analytics are just going to make the data access requirements larger.
There are a multitude of niche players in the market today that have or are claiming to have in-memory databases (IMDB). Most of them have specific markets or verticals that they focus on. They each solve specific business challenges in their own way. The focus of many of these niche players is performance and massive data retrieval for analytics, either real-time or predictive.
There is also a lot of confusion around what IMDBs really are and how they fit into today’s modern application. To help set the ground for future discussions let me start by providing some definition around IMDB vs. columnar databases.
In-Memory Databases
In-memory refers to the architecture of databases that load the complete database into memory for much faster access to massive amounts of data. This mostly removes the requirement of disk I/O and ACID transaction requirements that come with Online Transaction Processing (OLTP) applications.
It is important to understand that just moving the data into memory is only part of the solution. Data access is usually different with IMDBs to improve data access speed times. This is where many IMDBs employ either an object oriented data approach or a columnar data approach. In-memory computing (IMC) relates to the use cases of IMDBs, though often these acronyms are interchanged.
There are generally two types of IMDB vendors emerging in the market today. There are the traditional databases, like Progress OpenEdge, that are being extended to support IMC. There are also pure-play vendors that are offering stand-alone IMC database solutions.
Architecturally, IMDB solutions can lend themselves to horizontal scalability across commodity hardware (usually Linux).
Technically, the OpenEdge database could be transformed to be an in-memory database. This is not the right approach to providing an in-memory database for OpenEdge. The first reason is that online transaction processing (OLTP) applications require ACID transactions and need to guarantee data consistency. Second, they are usually row-based updates and do not require massive data reads and summarization. Putting a database in memory does not help meet these needs.
There are many IMDBs on the market today. The most notable include IBM DB2 with BLU Acceleration, Oracle TimesTen, MemSQL and SAP HANA.
Columnar Databases
The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query. Columnar databases are often highly compressed. This also helps with respect to data retrieval and I/O from the hard drive.
There are many different columnar-based databases in today’s market. Many of them are bundled with in-memory solutions to perform better and avoid the disk I/O. This blurs the lines between the definition of in-memory and columnar. Examples of columnar databases are Vertica, SybaseIQ and Oracle Exadata.
In-Memory vs. Columnar
It is important to define the differences between these two solutions. Any database today can become an in-memory database, but not all databases are columnar in nature. Traditional OLTP databases are not columnar but are typically row-based. This allows for better transactional consistency and granular update capabilities that are required for OLTP applications. Online Analytical Processing (OLAP) has different requirements; solutions that require OLAP are typically analytical solutions that access the data in large batches and require massive data access with speed and summarization needed.
In-memory databases can provide much faster access to the data but don’t necessarily solve the OLAP requirements. The Forrester Wave for in-memory databases for Enterprises lists 11 different database vendors. They also define databases into two additional categories: in-memory databases and in-memory data grids. The OLAP database market is very fragmented today and will continue to be this way for the next 5+ years according to many analysts.
Columnar databases have strengths in the OLAP application scenario that row-based databases cannot match. This often comes with a price. Columnar databases typically do not do well in OLTP environments, as row level updates are expensive. They are designed for large data access and summarizations that meet the requirements of analytics applications.
Although many columnar databases are also in-memory databases, it’s important to remember that they are not the same. Columnar does not equal in-memory.
Summary
The database market continues to grow and fragment. Many new niche players are building on top of core engines like Hadoop or Cassandra. This will continue to happen for the foreseeable future. The focus should be on what problem you are trying to solve and what technology best meets that need. Once you have determined that, trolling through the myriad of possible vendors becomes simpler.
Brian Bowman
Brian Bowman has been working for Progress for over 20 years. He has performed database tuning and disaster planning for all sizes of customers around the world. Brian started in technical support, and has also worked in product development, pre-sales for Direct and Indirect customers and is currently a Senior Principal Product Manager for OpenEdge.