A few weeks ago, I blogged about Data as a Service – “DaaS”– what it is, pitfalls and why it is valuable. Today I’m following that up with strategies and techniques for getting it right.
My summary from the previous blog post
I defined DaaS as an architectural pattern where your most valuable services and data formats are developed first and intended for re-use over the long term. Some parts of the last blog that are good to remember here are the definition and expected value from DaaS.
DaaS is an approach to make data available whenever it is needed, and fits into the larger “SOA” Service-oriented Architecture design pattern. DaaS is an approach, within SOA, that values, shares and focuses on data.
On to the main point of this follow-upblog: What are some things you can do to have a successful Data as a Service implementation that includes a set of harmonized, high-value, durable data services?
A key to success is to have the right team roles and focus. As a mentor from my early years pointed out, “every organization is destined to build their org chart into all their software systems.” For DaaS, this means that if you don’t define data service modeling as a separate role you won’t get Data Services as a high-quality deliverable.
Rather than have your System Architects and Designers have some notion of re-use and data services built into their day jobs, empower a separate group of Data Service Architects to models data formats and services, with an eye toward what will provide lasting value for the enterprise.
Wire formats are the data structures that integrate systems across your enterprise. They are usually XML messages that travel as REST or SOAP messages, but lately also include JSON and RDF payloads sent as REST messages.
Don’t build a comprehensive, relational model up front, in 3rd normal form. These models are complex, tightly coupled to a relational database, resistant to change and offer no abstraction as they are a physical model. Instead, focus on the payloads in data services. Big modeling up front is typical of an enterprise data modeling approach, which is slow to yield benefits and prone to failure.
Another way to think of it is that every wire format is a de-facto “Interface Control Document” (ICD) that specifies the contract between data providers and consumers, enabling stability as both systems evolve.
Where possible, align your internal Data Service formats with industry and global standards such as Dublin Core, HL7, DocBook, NEIM, DDMS, and the like. These standards are compiled by working groups or companies who have done a lot of hard-fought data modeling for you. It will also be easier to transform your data into standard formats for integration with other systems or internal components if your formats are at least close to existing standard formats. Note that this does not mean building a full implementation of a complex standard, since many of your Data Services will only use a small subset of a larger standard.
Just as data formats and access patterns should be uniform, so should the ways you expose and query metadata (what is available, what formats exist, sources) and RDF (semantic data and relationships about your data). There’s a lot to talk about around metadata and RDF that won’t fit in this blog post, though.
Don’t allow services to be exposed without security. Ultimately, that would be an obstacle to data sharing within your enterprise, complicate and weaken your enterprise security posture, and slow down applications at runtime by forcing the calling applications to all implement and filter data for security.
So those are some general tips for data modeling toward DaaS. But this is a MarkLogic Blog too, so here is how the MarkLogic Server product can be used to facilitate this.
Key features of what we now call “DaaS” have been baked into MarkLogic for over a decade. Things like built-in security, data transforms (using XSLT, JavaScript, XQuery), text search, geospatial search, alerting/monitoring, clustering, high availability, DR, elasticity, and data adapters are all included. At this point, MarkLogic natively stores almost any data format without mapping it to relational tables: XML, RDF, JSON, Text, and Binary formats, including metadata or text extraction adapters for most binaries. RESTful and SOAP services are provided out of the box to expose it all.
So as an organization starts to focus on the messages that move around the enterprise, and the wire formats that define those messages, MarkLogic becomes a cheap and appealing place to store that data, together with metadata, provenance and relationships. Better yet, MarkLogic persists it all natively with zero modeling.
This makes MarkLogic a natural component of most DaaS solutions, but by no means the only component. Existing, legacy systems, relational databases, and almost anything that contributes value and has data can and should be integrated into a DaaS architectural approach over time.
To sum it up – create a team that is empowered to advocate for data as a valuable, secure asset that will last many years and is exposed in a coherent, secure way.
This team should focus on the “wire formats” of the messages flying around your enterprise – these data formats define how your data will be understood and re-used throughout your enterprise. The alternative is to allow direct access to underlying databases, which is complex, lacks a good security model, and quickly becomes an obstacle to change as dependencies accumulate between multiple applications and their underlying databases.
Damon is a passionate “Mark-Logician,” having been with the company for over 7 years as it has evolved into the company it is today. He has worked on or led some of the largest MarkLogic projects for customers ranging from the US Intelligence Community to HealthCare.gov to private insurance companies.
Prior to joining MarkLogic, Damon held positions spanning product development for multiple startups, founding of one startup, consulting for a semantic technology company, and leading the architecture for the IMSMA humanitarian landmine remediation and tracking system.
He holds a BA in Mathematics from the University of Chicago and a Ph.D. in Computer Science from Tulane University.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites