To Be or Not to Be: The Truth about Schemas

December 11, 2014 Data & AI, MarkLogic

As a MarkLogic newbie, I became curious after happening upon an article on Jaxenter.com that attempts to diminish the value differentiators of a schema-agnostic database. In the post, the author asserts “there’s absolutely nothing that is really easier with ‘schemaless’ databases than with ‘schemaful’ ones.” Say what!? This must be crazy talk, I think to myself, as the author’s claims very specifically fly in the face of everything I’m learning about enterprise NoSQL. To get to the bottom of it, I reached out to a subject matter expert on the topic for clarification and edification.

What follows is an interview with my colleague, Ed Delacruz, who in addition to being a savvy enterprise NoSQL crusader is also a Principal Consultant at MarkLogic.

Fiona: The author of the article in question makes it sound as if it’s easier to change a schema in a RDBMS than in a schema-agnostic NoSQL database. I’m a bit taken aback by these assertions as this upends what I thought I knew about enterprise NoSQL. Can you shed some light on the author’s claims and the reasoning behind them?

Ed: It seems that the author is being a little hypocritical here, telling us how simple it is to change a schema in an RDBMS while he completely ignores the obvious difficulty of ensuring that the layers of the application on top of the database actually adhere to the changes made. In fact, the author’s project, jOOQ, apparently stems from this very challenge. On his website he’s even quoted as saying …

You’re probably asking yourself why we need yet another database abstraction software in Java. Fair question. In our experience when writing applications against large and complex Oracle databases, Hibernate was not a good fit because we wanted to stay close to SQL: 2008 and to Oracle’s extensions. JDBC, on the other hand is verbose and causes a lot of quality and security headaches. So we rolled our own tailor-made little SQL builder. In fact, every company I have ever met rolled their own tailor-made SQL builder. But our business was not to write SQL builders, our business was to write brokerage logic.

Fiona: So, the article’s assertion that making a change to an RDBMS schema is no more difficult than making a change to a schema-agnostic database is incompatible with the fact that the author is actually in the business of developing a SQL builder to remedy these recognized RDBMS shortcomings. Is that the gist?

Ed: Well, it’s clear that he believes that the layer dealing with relational databases is difficult at best and requires his additional software solution just to manage it successfully.

Fiona: What about making changes to a schema in a schema agnostic database? How would you compare the difficulty?

Ed: In an enterprise NoSQL database platform, additions or deletions to an XML or JSON document might only have to be changed in two places, the data (no schema to change) and the view layer.

On the other hand, in the traditional RDBMS n-tier stack, you’d have to change the:

  1. Database schema, as demonstrated in the original article but if you were to normalize most values, an additional lookup table is probably needed (including foreign-key constraints)
  2. Database data, you would have to ETL the data into the new schema, similar to the first step in a “schemaless” database
  3. Programming language object, if it’s not a dynamically-typed object like JSON or XML, you might also have to define an enumeration to map to lookup table values
  4. Object-relational-mapping (ORM), you will have to modify or extend this layer so that the database schema marries up with the programming language object changes
  5. View layer, as you would for a schema-agnostic database

And if that’s not messy enough, it gets worse for RDBMS if you add a complex structure like a shipping address that has a street number, city, state and zip code.

Fiona: Thanks for clearing that up, Ed. My last question is in regard to the author’s statement that dynamically typed schemas, like NoSQL, are lacking because “there’s no compiler (or IDE) that can help you infer the types with 100% certainty.” How do you respond to that?

Ed: IDEs (integrated development environments) do exist for interpreted languages, but neither IDEs nor compilers do anything when it comes to confirming that your code maps correctly with the database. It’s very debatable to imply that statically-typed schemas RDBMSs provide any quality improvements – especially when utilizing automated testing for QA (as is good practice). When pressed for the truth, anyone who has modified an RDBMS schema of an existing system doesn’t do it often because it is shifting the foundation of a whole stack of layers that sit on top of it. Further, I have seen firsthand how our agile enterprise NoSQL platform is able to adjust to changing business needs when rigid RBDMS-based platforms could not.

###

Fiona Ehret-Kayser