For decades now the database world has been oriented towards the schema-on-write approach. First you define your schema, then you write your data, then you read your data and it comes back in the schema you defined up-front. This approach is so deeply ingrained in our thinking that many people would ask, “how else would you do it?” The answer is schema-on-read.
Schema on read follows a different sequence – just load the data as-is and apply your own lens to the data when you read it back out. You might say, “OK, fine. But why would you want to do that?” There are several really compelling reasons. I’ll cover the main ones here.
At this point people often say, “Well sure, but you need a predefined schema or it will be slow.” That’s absolutely true for traditional technologies, but not for an Enterprise NoSQL database like MarkLogic. We are built from the ground up to excel at this approach. [Ed. There’s not enough room to go into how we accomplish that here, but if you’re curious, we’ve got a great paper you can read on the topic.]
The other important thing to keep in mind is that just because we don’t force you to do an extensive data-modeling task up front, doesn’t mean that you can’t learn from your data over time. Get your data loaded, start using it, get value from it. Over time you may well find that you want to normalize certain aspects of your data or otherwise optimize your representation. With MarkLogic, that evolution can happen over time as you gain real-world experience with your use cases and datasets. Imposing too much structure too soon and trying to optimize before you really understand the bottlenecks is a common trap. Schema-on-read can help you avoid it.
Schema-on-read is just one of the ways that MarkLogic can help you solve problems that are a major challenge with traditional technologies.
Joe Pasqua brings over three decades of experience as both an engineer and a leader. He has personally contributed to several game changing initiatives including the first personal computer at Xerox, the rise of RDBMS in the early days of Oracle, and the desktop publishing revolution at Adobe. In addition to his individual contributions, Joe has been a leader at companies ranging from small startups to the Fortune 500.
Most recently, Joe established Neustar Labs which is responsible for creating strategies, technologies, and services that enable entirely new markets. Prior to that, Joe held a number of leadership roles at Symantec and Veritas Software including VP of Strategy, VP of Global Research, and CTO of the $2B Data Center Management business.
Joe’s technical interests include system software, knowledge representation, and rights management. He has over 10 issued patents with others pending. Joe earned simultaneous Bachelor of Science Degrees in Computer Science and Mathematics from California Polytechnic State University San Luis Obispo where he is a member of the Computer Science Advisory Board.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites