ACID, BASE and NoSQL

May 10, 2013 Data & AI, MarkLogic

My last post talked about Enterprise NoSQL and ACID vs. BASE in the context of handling data variety. In this one I’d like to delve deeper into transactional, Enterprise NoSQL.

Let’s start by focusing on the main question: How can one guarantee cross-record ACID transactions in a horizontally-scalable, schema-agnostic database?

The short answer is an architectural pattern called Multi Version Concurrency Control or MVCC.

The basic notion behind MVCC is that records are never modified, but instead a new version is created every time a record changes. The system eventually deletes these old versions after a configureable period of time, but within that time window it’s simple to roll back a transaction. More over, it’s also straight forward to roll back the entire database to an earlier point in time – A.K.A. point-in-time recovery – a key requirement of enterprise databases.

Interestingly enough, the availability of Enterprise NoSQL – a schema-agnostic technology that satisfies these requirements – is now starting to blur the boundaries between the traditional Data Warehouse, Operational Data Store and DataMart, and converge them into a single store. The enabler for this is the notion of schema-on-read (vs. the traditional schema-on-write), which refers to the ability to enter data without requiring a pre-defined schema, while supporting multiple schemas when the data is read. This means that the categories mentioned above can be merged into a single platform that satisfies many data consumers without requiring intense modeling and transformation ahead of time.

In addition to schema-on-read, it is also the unification of data management and search that is key to handling data diversity. In fact it was the immense success of search engines that paved the way to this new data management paradigm. Search technologies have established the use of a rich set of indexes as a means for querying non-relational data. From there it was a small leap to apply this notion to a database, converging it with database indexing. But unlike traditional RDBMS, indexes in the NoSQL world do not have do be pre-defined, nor rebuilt as the data changes.

So we’re witnessing some related convergence trends – the convergence of structured and unstructured data, that of database and search technologies, and of traditional data management tiers into a single platform.

My next post will tie these concepts back to the related industry use-cases that benefit from them.

Amir Halfon