Skip to main content

These days, Big Data technologies seem to be developing so unbelievably fast that it’s causing downstream affects that we may have not seen coming. It almost seems like somebody passionate about Big Data found Aladdin’s Wonderful Lamp, rubbed it, then asked the Genie for “Multiple Stacks of Technologies” because new products seem to be released constantly. There are so many choices that many find it overwhelming.  Some have a great Big Data use case, but have decided against starting on it because they’re too busy trying to figure out what platform to use.

It reminds me of my childhood days when gaming consoles came with one or two games – my friends and I played them constantly. Then we were introduced to the cartridges that were loaded with 60-70 games. We spent as much time exploring what games we could play as we did actually playing them. This is what may be happening with Big Data.

Yesterday was Presto.  Today is Kylin.  And tomorrow?

Yesterday, it was Presto – an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was originally contributed by Facebook and has since been adopted by Teradata.

Today, it is Apache Kylin™ – also an open source Distributed Analytics Engine, but designed to provide an SQL interface and multi-dimensional analysis (OLAP) on Hadoop to support extremely large datasets.  Kylin was originally contributed by eBay.

What’s it going to be tomorrow?

The point to consider is when we have similar technologies that do similar things, how do we know which one will prevail? Think back to the Betamax and VHS war in the 1970s – many consumers chose Beta because it was Sony’s, and how could you lose with Sony?  But the market eventually embraced VHS, and people that bet on Beta soon had a 30 pound paperweight in their entertainment center.

Fast forward to today: there is a huge Big Data ecosystem that includes tools such as Spark, Kafka, Tez, Hbase, Cassandra, Drill, Impala, Pig, Storm and a ton of others, including 3rd party providers.  Each is designed to solve or enhance something specific. The problem is that like VHS and Beta, many of them have similar capabilities.  Spark and Storm, for example, perform real-time computation and large-scale data processing.  Both have their pros and cons, but how do we know which we should pick to invest our time, money & efforts?  Will one be the next Beta?  If that happens, will those that make the wrong choice have to start all over again?

Is uncertainty holding you back?

Uncertainty may be the biggest reason Big Data has not yet seen widespread adoption. Remember when the mobile phone market was in its infancy? Consolidation soon followed, consumers no longer view Android or iPhone as a risk, and now 96% of consumers choose either Android or iPhone.

The market appears to be responding with a trend. Many companies – perhaps still too many – have adopted bits and pieces of that big data ecosystem to build their own product. Even the “Big Boys” are developing a streamlined strategy to this ever growing stack of technologies.  If history repeats itself, we may find the market settling on a subset of Big Data platforms. If that happens, we may see Big Data adoption rates explode.

So if you happen to stumble across Aladdin’s Wonderful Lamp and are granted a wish from the Genie, would you consider asking him for a consolidation of these Big Data technologies?