In the IT world we lead duplicitous lives. Perhaps duplicitous is not the exact word I am looking for because it implies deception, but then again maybe that is the right word because we are deceiving ourselves.
Think about the stuff that runs your business; not the machines and servers, or even the people, but the stuff inside the people: ideas. Machinery, servers and raw materials are just a reflection of the ideas contained within your company. The deception comes when we pretend like the only information that really matters to us is the data.
Sure data is important and must be protected, but data is merely a raw material. Records are like Legos® (or for purposes of copyright protection “plastic building blocks”); they are a predictable shape and size allowing them to fit together nicely. We guard our blocks and use them to build things. We keep our blocks and we brag to each other about how many blocks we have and what we are able to build with our blocks.
But how many building blocks (relational records) do you have on your laptop? Probably not many. What you have on your laptop is all of the stuff that surrounds those building blocks: pictures, e-mail, IMs, documents, social media, etc. Last week we talked about a New Normal where we are now able to capture all of the information around the transaction and present a complete picture; not just the raw record, but the rich set of information that surrounds it. Think about a customer, a piece of equipment or a transaction; there is a lot more information about those things than just the data that exists in your ERP system.
This information is not like a plastic building block, but more like a drop of water. It does not have a defined and predictable shape, in fact, it can change shape over time: information might start out as an email and morph into a document. That document might be translated into a brochure or a drawing. That drawing might power a production line to produce a product, before the product can be sold to a consumer. Today you have access to the records that reflect pieces of that journey: an order for supplies, a production work order, a sales order, a bill of lading, but do those things tell the story you want to hear?
You have seen this pitch before. The building block salesman came and showed you the 250-piece set that built the police station or the spaceship. He told you all of the great insight you could get from your data and he showed you the picture of the spaceship and the new world that spaceships allow you to explore. But that spaceship can’t take your business where you want to go, because is in not built on the information that powers your business, it is just a representation of that information.
XML and JSON Give You True Picture
To get a true picture of your business you need access to all of the additional information: email, documents and other unstructured data. As the Internet was gaining steam, folks came up with ways to store and contextualize that information in forms like XML. XML and its cute young friend, JSON, take data and put it into a structure that is self-documenting, human-readable and machine-readable.
You might not be a code junkie, but you can likely make the translation between that box of index cards in your moms’ cupboard and the same information stored in XML.
In fact, these data formats are so powerful that Microsoft, starting with Office 2007, began storing all Office content in XML (that is why you now have a .docx instead of a .doc). Even things like photos and videos, that are not stored natively in XML, can have the metadata stored that way to make it easier to index.
Industries that have embraced this technology were forced into it by seminal events: 9/11 dictated that the intelligence community look at everything; Healthcare reform forced companies to trade data in predictable ways for interoperability; the financial services industry had to look at all the supporting information for a trade in order to satisfy regulators.
Outside of those industries we are starting to wake up as well. I am talking to folks on a daily basis that need to grab hold of the information in their organization and use it in a more natural and business friendly way.
This is what puts the *Big* in Big Data. The amount of relational data in most organizations is comparable in size to the amount of plastic building blocks you had as a percentage of your total toy collection growing up. As a nod to the size of the unstructured data collections being built by companies, some vendors have begun referring to them as “data lakes.” But storing the information is not enough, you have to be able to work with the information. Technology exists to now be able to index this unstructured data and combine it with the traditional relational data.
Throughout our careers we have focused on the building blocks (the 20 percent of data in our organizations) because that is what technology allowed us to do. Now we have the ability to address the other 80 percent that went into creating the other and more importantly, addressing the 80 percent can be done for a fraction of what you spent (and continue to spend) on the original data.
No one is arguing that relational data is not important and won’t stay important, just that it is not the only thing we must focus on. I am sure that you can build an impressive and lifelike engine with those plastic building blocks. But long ago, some other people took drops of water and heated them to make steam, and made an engine that transformed the world. And it is about to happen again.
(Cool spaceship, by the way. )
In my next post I’ll tell you what makes up a great database — for all those drops of water.
John Biedebach
John Biedebach has 25 years of experience in data warehousing and business intelligence. John lives in Dallas, TX and manages pre-sales for the South region. In his spare time John works as a paramedic for Collin County EMS