Make Data Great Again

By Krishna Kallakuri - August 31, 2016

Make Data Great Again
Make Data Great Again

Have you ever heard or seen a phrase or title like this in your entire career? We almost assume the answer is a big ‘NO’.

We were at one of the major big data events in California a month ago and I personally was not super happy for many reasons. But one thing quickly grabbed my attention and put a great smile on my face and that is when I saw a guy walking around with a cap that says “Make Data Great Again”.

I’m sure you heard something similar in the current political atmosphere but this phrase got me seriously thinking and the real question is “When and how did any enterprise make data great to make it great again? “

Let us be more pragmatic to answer this question. If we rewind a decade, the computing world did not have enough resources or the technology to make the data great!

But in this modern world it is completely a different story and we have vivid data architectures that can be leveraged to achieve this feat. So how many enterprises took full advantage and what is the status? Below are few scenarios that we come across all the time in an effort to make data great.

Scenario1: Our Enterprise is becoming Data Driven.

“We now have adopted Hadoop and are creating a data lake to gain insights from the data. Hadoop is the panacea for all applications and we need a couple of Hadoop developers who are experts in the following skills. Sqoop, Flume, MapReduce, Solr, Hbase, Spark, AWS……………………………………………etc.”

And then if we fast forward 6 months to a year to check the status, my favorite answer is …“MapReduce is slow and our data warehouse does not perform on Hive. So we have decided not to pursue the efforts at this time.” Ooh! What a way to make Data Great Again?

Scenario 2: We have recently hired a Data Scientist and he will deliver Insights from Data.

This is another great scenario and the best one. So what is the real definition of a Data Scientist?  And here is my favorite joke on Twitter.

“A data scientist is ‘a data analyst who lives in California,”

Let us cut to the chase and look at some real world definitions from the internet.

  1. A data scientist is a person who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.
  2. Data scientists are “analytically-minded”, statistically and mathematically sophisticated data engineers who can infer insights into business and other complex systems out of large quantities of data.
  3. A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math.

If this is the case why can’t we find them? We don’t think they are in planet MARS. In our opinion this is not a new skill set and the data scientists have been amongst us living for ages. It is just that the models that they developed did not come to life or operationalized due to limited computing abilities. How many times have we not heard that the SAS job has been running for the last 5-6 days?

Now that we have all the resources and the question still remains unanswered. The answer is simple, many enterprises who adopted these modern architectures are still operating with a traditional mindset.

Let us be realistic! Can the businesses wait 6-7 months to run a marketing campaign based on the outcome of a predictive model? The overall time to delivery is not acceptable due to the reason that the traditional methods cannot scale in time or training.

Ex: When we know that R can only run using a single thread and fails many a times even for data sets ranging 500-600 GB, What is the point in taking this approach? Or did we ever consider converting SAS, SPSS models to PySpark, Spark ML etc. which can take full advantage of distributed computing.

We now live in a world where these models can be trained and scored in real time. It is eminent that enterprises have been in a maze to connect the dots using technology or have not put technology to the right use and of course if you were wondering the business context is completely out of the picture.

So what needs to change to Make Data Great……

  • Get away from the traditional mindset approach
  • Avoid ambiguous usage of technology and adopt the business context
  • Build an appetite to learn and explore new possibilities
  • Start with a couple of use cases that can be quantified along with business value instead of solely steering the efforts on building a Data Lake

What do you say my friends? Can we Make Data Great using these simple principles?

Look forward to your comments.