There are two branches of analytics that are used widely to describe ways we can summarise large amounts of data. The first way simply describes the data. We call this descriptive analytics.
Descriptive analytics has a wonderful use. It helps us summarise the data into readable chunks so we can understand what happened in the big pile of messy data. With descriptive statistics, we can begin to make decisions because we can analyse what we know.
But what happens when we want to know something we haven’t seen yet (or are not able to ask directly because gathering that data would be impossible or too costly?) Questions like…
- Will my product sales be higher next year than they are this year?
- What kind of product will my customers buy tomorrow?
- What is the best way to respond to a brand crisis?
- How much should I spend on advertising next year?
The 2nd form of analytics is much more interesting, and in my opinion useful, this is called predictive analytics.
By definition, we’re inferring something we do not know (our prediction) from something we know (data we’ve collected.) The reason to use predictive analytics is when the cost to acquire some information is too much.
Either the question is impossible to ask or it would take us too long to acquire that information. With predictive analytics, you can make better decisions because you have a model that helps you understand something you didn’t know before.
To be clear, predictive analytics cannot reduce all risk. It’s impossible to know what WILL happen. The goal of predictive analytics is to understand what MIGHT happen and all of the caveats that went into the analysis.
Let’s talk about the two types of predictive analytics.
Extrapolate — Time Series Forecasting
This version of predictive analytics is relatively straight forward. Most decision makers are familiar with it and how to use it. In time series forecasting, We want to know what may happen in the future given the trends of the past.
This trend line, which is typically a time series model, summaries the past trajectory of data. The magic happens when you take that model summary and use it to extrapolate future time where data doesn’t exist.
In the case of extrapolation, the reason we do not have data is that it doesn’t exist yet! We cannot measure future events (at least not yet — that we know of.)
The idea of non-temporal predictive analytics is the model is irrelevant. We could build a model in many different ways. The main idea is to make sure you have a way to understand it’s accuracy. We understood the accuracy of the model by splitting our training data set and using the some of that data to predict data the model hadn’t seen yet.
Anyone can model anything based on what they think they know about the world. Successful predictive models are those that can remain accurate with data the model was not trained with. A model that falls apart outside of its training data set is completely useless — or worse will give you inaccurate predictions.
In the case of non-temporal predictive analytics, the reason we do not know something is that data is really difficult or impossible to ask. Could you imagine if Amazon had to ask you what products you like every time you visited their website? Non-temporal predictive analytics to the rescue!
Using data we have to understand what MIGHT happen in the future. We also know how it’s different than descriptive analytics — which is using data to summarize what happened in the past.
The main advantage of having a predictive model is that we can extrapolate the model to areas where data doesn’t exist and make predictions. This is extremely useful because knowing everything past, present and future — is impossible.
At least in the real world, like knowing which products each individual consumers is most likely to buy next, and clinching that sale by making sure it is offered to them. A feat impossible to do without predictive analytics.