Recently, when asked what the “next big thing” was, Ann Winblad, renowned venture capital investor, responded:

“Data is the new oil.”

I agree with Ann but to get value from crude oil it must be processed.  And that is often what is lost in the buzz surrounding the Volume, Velocity and Variety (3Vs) attributes of Big Data requirements.  The application of “complex workloads” is what turns the “crude oil” of Big Data into something consumable.  After you have ingested data sources that may contain Big Data’s 3Vs, a suitable environment is required to process and leverage the data.  Otherwise, all you have accomplished is the creation of a new form of long-term storage and another information silo.  Addressing “complex workloads” allows Big Data to be integrated as an aspect of a wider enterprise environment whether that is operational or analytical.

This brings about some interesting questions.

  • Should text analytics be performed with SQL or NoSQL?
  • How do you best economically utilize an EDW processing and storage environment?
  • Should time based analysis be performed in Hadoop platform or a NoSQL key value platform?
  • What is the best way to utilize a graph data store?
  • Where an analytical platform should be used?

In each use case, a decision must to be made as to which aspect of the Big Data environment is used to facilitate operational or analytical action. As Hadoop and other ingestion technologies are mastering the “science” of getting Big Data in a platform, where “complex workloads” are allocated and performed is going to be the “art” of gaining value from Big Data initiatives.

EMA has defined this intersection of Big Data platforms as the Hybrid Data Ecosystem.

The EMA Hybrid Data Ecosystem (see below) includes the following components: Operational Systems; Enterprise Data Warehouses (EDW) and Data Marts (DM); Analytical platforms (ADBMS); Hadoop, Key/Value, Graph data stores (NoSQL); and Cloud-based sources.

Enhanced by Zemanta