In the past few years, the number of organizations using Hadoop—or contemplating using it—has grown astronomically. Each organization has common questions about whether they are really ready to implement Hadoop, and what the best practices are for being successful.
For these reasons, TDWI developed an Online Hadoop Readiness Assessment and Guide to help organizations as they start working with Hadoop. TDWI is an organization that provides research and advice for everything data related. The Assessment they created is free and it provides a great way to analyze each dimension of readiness, including organizational readiness, Big Data readiness, data management readiness, analytics readiness, and IT readiness.
Example of how the TDWI Hadoop Assessment scores results
One of the initial challenges that people have when getting started with Hadoop is simply navigating the myriad of components that have popped up in recent years. I was at the Strata Hadoop conference in New York a month ago and based on what I saw, I can understand the confusion around Hadoop with all of the crazy names being advertised: Mahout, Ambari, Avro, Datafu, Oozie, Tez, Chukwa, Trafodion, etc.
A few of the more popular Hadoop projects shown here
The quickly changing landscape of the Hadoop ecosystem is what makes Hadoop planning ever more critical today. Hadoop is no longer just HDFS and MapReduce (MapReduce seems to actually be fallign quite a bit in popularity), but a family of tools that all fall under the broad umbrella of Hadoop and are at various levels of maturity ranging from “University lab side-project” to production use at large companies.
We need resources to navigate the growing complexity in the Hadoop ecosystem
There are many customers that we talk to that are already using Hadoop, and so the question comes up quite frequently, “Why do we need MarkLogic if we’re already using Hadoop?”
To put it simply, MarkLogic provides an enterprise-class, operational database and Hadoop does not. Hadoop has many benefits, but it currently lacks some enterprise features that organizations require for production environments (e.g., Hadoop does not have robust security, and it does not carry the necessary integrity constraints for ACID transactions).
Typically, customers rely on MarkLogic to provide a persistent, operational database for low-latency transactions and they use Hadoop as a low-cost place to store data and do batch analytics. Integrating both systems is quite easy because there is a MarkLogic connector for Hadoop. And, there is a lot of parity in how MarkLogic and Hadoop handle data, and both systems actually rely on MapReduce for loading data and doing analytics.
Customers such as KPMG, McGraw Financial, and a top investment bank have all found this division of labor between MarkLogic and Hadoop to work quite well. Below is a graphic that shows at a high level how these customers are using MarkLogic and Hadoop. Actual production system vary greatly due to the number of different Hadoop components, but the general architectural pattern is shown here—MarkLogic is the database, and Hadoop provides a low-cost storage option for structured and unstructured data. More info on MarkLogic and Hadoop can be found here.
The MarkLogic Connector for Hadoop provides a seamless integration
So, with that introduction, we encourage you to try out the online TDWI Assessment Tool, download the Guide, and see whether your organization’s readiness for Hadoop.
Matt Allen is a VP of Product Marketing Manager responsible for marketing all the features and benefits of MarkLogic across all verticals. In this role, Matt interfaces with the product and engineering team and with sales and marketing to create content and events that educate and inspire adoption of the technology. Matt is based at MarkLogic headquarters in San Carlos, CA and in his free time he is an artist who specializes in large oil paintings.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites