Best Lil’ Operational Warehouse

At the start of this year we began a campaign to once again make data the center of the data center. We believe that your data should be available for whatever purpose you want, not locked up in purpose-built silos, e.g. one for BI analysis, one for operational systems, and so on. Our idea of a ‘Data Centered Data Center’ is a big step towards that — all your data in one place, delivered as it’s needed for whatever purpose.

But for most organizations that is hard. It is not that anyone ever wanted data to be put in the corner, but the relational DBMSes that have been relied on for over 30 years made it easier to put applications at the center of data centers. Ironic, right? Stacks of infrastructure were built to support single applications or siloed warehouses of data, because the reliance on a single schema made anything else darned near impossible.

But in recent years we have seen signs of change. Our customers were building out operational data warehouses, and succeeding at it. These systems are transactional and operational to enable websites, ecommerce and other applications – while at the same time also are used for analytics and storage of lots of data. And when I say lots, I mean 50TB or more – we have customers who are running systems with 100’s of TB, one even larger than 500TB.

And, yet, Gartner, the keeper of such knowledge as “how many organizations are using an operational data warehouse,” says it is remarkably few.  Why are operational data warehouses such an anomaly? Simply put, because traditional databases weren’t built for it. You need to have a system that can scale out to handle massive loads – something most RDBMS are not good at. But really, the biggest challenge is the dependence on a schema! By the time you’d ETL all your data in order to get it into a single operational data warehouse that could store data and be used for analysis across multiple systems, you could kiss real-time goodbye.

Gartner_OpDataWarehouse

source: Critical Capabiltiies for Data Warehouse Database Management Systems, Roxane Edjlali and Mark A. Beyer, Gartner, 18 August 2014

Which is why in a recent Gartner report, Critical Capabilities for Data Warehouse Database Management Systems, MarkLogic came out ahead of all other vendors in the customer rating for Operational Data Warehouse. We believe that it is the trifecta of schema-agnostic, scale-out and enterprise -readiness that put us far ahead of traditional data warehouse systems from Teradata, Microsoft, Oracle IBM and SAP.

MarkLogic was specifically built to solve the ETL problem. Okay, maybe that wasn’t really the problem our founder was trying to solve from day one – but in building out a document database that was schema-agnostic and scale-out for massive workloads and large amounts of data, that is essentially what he built. (And, when you think of documents, don’t just think of the literal meaning of a document – a row of data in a table can also be thought of as a document.) Being schema-agnostic, you just load data into the database and it is indexed upon ingestion. That means you get to skip the T, Transform, and just Extract and Load data continuously into the database. Because it is indexed, you can search and query the data immediately. Being scale-out, MarkLogic lets you scale your system automatically and elastically on commodity hardware so you can handle the demands that are put on the database as it is used for both operational systems AND for ad hoc queries, data mining, operational analytics, search, storage and management of your data for the longer term. This enables you to skip expensive, time consuming and risky movement of data from this system, to that data mart, to yet another data warehouse. You get to not only manage the data reliably; you get stronger governance and better usage of your data, too.

There’s been lots of talk about ‘data lakes’ and ‘data hubs’ in the upstart, open source database market, making you think that an operational data warehouse could be a possible use case. But in order for you to be assured that the system is ready to power your mission critical systems, the database needs hardened and proven high availability and disaster recovery. This is where new upstarts struggle. MarkLogic was built over a decade ago with single intent to be run in government or enterprise data centers – and so things like ACID transactions, security, high availability and disaster recovery were built-in from the start. Our database has been proven by our customers to be able to withstand the rigors and trials of heavily loads, operator errors and acts of God (and treachery, too) without loss of data.

Can data, once again, be the center of the data center? A resounding “yes” from our customers, for sure. They’re doing it. And, when we tell prospects the story of MarkLogic, and how our new modern database can manage data in a way that perhaps unexpected and completely different from traditional relational technology, they get it, too.

Want to learn how you can build your own Data Centered Data Center? Check out VP of Engineering, David Gorbet, from this year’s MarkLogic World. He tells the story best.


All statements in this report attributable to Gartner represent MarkLogic’s interpretation of data, research opinion or viewpoints published as part of a syndicated subscription service by Gartner, Inc., and have not been reviewed by Gartner. Each Gartner publication speaks as of its original publication date (and not as of the date of this blog). The opinions expressed in Gartner publications are not representations of fact, and are subject to change without notice.