Skip to main content

Big Data

I remember in the early 90s trying to explain to a business leader, who was a smart woman, how adding memory to our PCs was going to solve our performance problems with our newly developed client/server application that she just spent over $1M developing. No matter how I explained it, she wasn’t understanding, and finally said with frustration “Salter, tell me like I’m Kindergartner!” I quickly thought of an analogy:

“Picture memory as your desk space and the hard drive as a file cabinet down the hall. Every time you need data from the papers on your desk you just look down, but when you need data from the papers in the file cabinet you have to walk down the hall, find the paper, walk back to your desk, then look at the paper. Adding memory is like making the desk space larger.”

 She looked at me, nodded, and said: ”Order the memory – I don’t care what it costs.”

Right then I realized there is brilliance in simplicity. It wasn’t her fault she didn’t understand what I was saying, it was mine because I wasn’t explaining it in a way she understood. Those of us with a technical background do this all too often – we get down in the weeds and confuse people. We lose the forest for the trees.

“If you can’t explain it to a six year old, you don’t understand it yourself.” Albert Einstein

The purpose of this article is to explain 3 things about big data that I think everyone should know. One, to understand what it is – well enough to be able to explain it a six year old. Two, to understand how technology has enabled it. Three, to understand conceptually why it’s important for business. First, allow me to provide some insight into data as it relates to the world today:

90% of the world’s data was created in the last 2 years

60% per year is the average growth rate of data in business

0.5% of data generated is analyzed – and that number is shrinking

[Sources:  IBM and Business Insider]

I’ve heard lots and lots of definitions for Big Data. Here’s how I would explain it to a six year old:

First, let’s make sure we understand what data is. It’s really anything that can be stored on a computer or even a smart phone. A number, picture, video, name, address, computer game, document, list of what you bought from a store or iTunes, what people put into Facebook, a car’s GPS, anything like that. Big Data is a massive amount of a variety of data.

Putting that into business context, companies have had lots of data for decades. Typically, they store years of data from business systems like accounts payable, payroll, benefits, CRM, ERP, etc. Most of the time, that type of data can be stored in rows and columns. This type of data is called structured data.

What makes Big Data different is that it also includes unstructured data, which is data that cannot be stored in traditional rows and columns. We’re talking about data such as video, audio, pictures, social media, weather, etc. Traditional hardware and software platforms were not built to handle the volume, variety or velocity (aka the 3 Vs of data) of Big Data.

Enter Hadoop. Hadoop is often credited as the technology that really enabled Big Data. Prior to the release of Hadoop, there was no widely adopted way to store, ingest or process the scale of data that we are talking about with Big Data. Hadoop, which was named after the stuffed toy elephant of the founder’s son, is an open source software platform that enables Big Data. [Open source simply means it’s free, developed by a group of people and companies and doesn’t come with support.]  Here are 3 key things you should know about Hadoop.

  1. Stores huge files. Before Hadoop, storing massive files was a problem because most file systems could not handle them. Now, the Hadoop Distributed File System (HDFS) enables that.
  1. Computes huge files. Without a way to open and process massive files and process them, HDFS would be rendered useless.
  1. Distributed. Running Hadoop on a single server would be pointless because it would create a bottleneck with the hardware. Hadoop allows you to “distribute” the load to as many servers as you want, whether it’s 5, 50, 500 or 5,000. And it can run on commodity hardware, which means virtually any computer with a processor, memory and storage.

Lastly, let’s discuss why business, or anybody for that matter, should care about Big Data. Simply put, it can help every area of an organization function better. It can help a businesses solve problems, create opportunities, better understand customers and predict what they want and need – even before they ask for it. It can help reduce costs and increase revenue, better leverage marketing dollars and streamline operations. Big Data and the Analytics it enables truly help organizations find gold in their data. Here are some examples:

  • Package delivery companies ingest route, traffic and driving data to better plan routes to save millions on fuel and reduce emissions.
  • Retail companies are segmenting customers and predict what they want and when they’ll want it. You may have heard about a controversy one large retailer created by predicting pregnancies and target market to those households – one dad wasn’t too happy when his daughter was receiving mail like that.
  • Commercial airliners have thousands of sensors that constantly generate huge amounts of data, and we should all be thankful because those companies use it to predict equipment failures before they happen.
  • Energy companies are using Smart meters in homes and buildings to read energy usage as it’s happening, then integrating it with their systems to predict usage and manage load distribution.
  • Health care companies are doing all sorts of things that are saving lives and reduce costs, like real-time heart monitoring of chronic patients that enable proactive cardiologist care.

These are all things companies are doing now with Big Data. And the list goes on and on.

So if a six year old asks you what Big Data is, how technology has enabled it, and why anybody cares, you’re now equipped to tell them!

Jim Salter, DataFactZ,