"We have lots of data – now what?"

(How can we unlock real value from our data?)

We live in the age of the data big bang. Driven by the internet economy, mobile phone, cheaper hardware and the Internet of Things (IoT) we are all, largely unconsciously, touched by data every day, from searching on Google, to buying on Amazon/ebay, monitoring our health to buying car insurance and beyond.

Over the past few years, there’s been a lot of hype in the media about “Data science” and “Big Data.” A reasonable first reaction to all of this might be some combination of skepticism and confusion.

A New Field Emerges

Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems.

There is significant and growing demand for data savvy professionals in businesses, public agencies, and nonprofits. The supply of professionals who can work effectively with data at scale is limited, and is reflected by rapidly rising salaries for data engineers, data scientists, statisticians, and data analysts.

The Current Landscape (with a Little History)

So, what is data science? Is it new, or is it just statistics or analytics rebranded? Is it real, or is it pure hype? And if it’s new and if it’s real, what does that mean?

This is an ongoing discussion, but one way to understand what’s going on in this industry is to look online and see what current discussions are taking place. This doesn’t necessarily tell us what data science is, but it at least tells us what other people think it is, or how they’re perceiving it. For example, on Quora there’s a discussion from 2010 about “What is Data Science?” and here’s Metamarket CEO Mike Driscoll’s answer:

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.

But data science is not merely hacking—because when hackers finish debugging their Bash one-liners and Pig scripts, few of them care about non-Euclidean distance metrics.

And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a tab-delimited file into R if their job depended on it.

Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools and materials, coupled with a theoretical understanding of what’s possible.

How to Do Data Science

The three components involved in data science are organizing, packaging and delivering data (the OPD of data). Organizing is where the physical location and structure of the data is planned and executed. Packaging is where the prototypes are build, the statistics is performed and the visualization is created. Delivering is where the story gets told and the value is obtained. However what separates data science from all other existing roles is that they also need to have a continual awareness of What, How, Who and Why.