0:05

All right in this session we're going to dig a little bit deeper into the idea of

Â forecasting.

Â When I say dig deeper I actually mean you know we're going to drill down and

Â look at predicting behavior when it comes to individual customers.

Â And that's what customer analytics is about.

Â So the promise of marketing is I'm going to be able to deliver the right

Â message to the right customer.

Â Well, we want to be able to influence the behavior of individual customers.

Â Which customer gets a particular coupon,

Â the depth of the coupon, which customer do I send which message to?

Â In order to answer those questions we really have to have an understanding

Â of not just the average behavior of customers, but

Â the behavior of individual customers.

Â And that's what we're going to be able to look at with some

Â simple models that can be built within Excel.

Â All right,

Â so where's this idea of customer centric analytics going to come into play?

Â Well, the behavior of your of your existing customers,

Â behavior of prospective customers.

Â How are they different from each other?

Â What's the right message to send them?

Â When is the right time to send that out?

Â If we need to know what products should we launch, well that requires

Â an understanding of the customer segments that are in the marketplace.

Â Not just the average customer, but how many different types of

Â customers are there, how many types of customers are there of each?

Â If I want to look at effectiveness when it comes to promotional activity,

Â promotions aren't going to be equally effective for everyone.

Â Some customers, they're going to have absolutely no impact on,

Â others customers are going to be very effective.

Â And ultimately, if we're looking to make allocation decisions,

Â if I have a fixed marketing budget,

Â how do I allocate resources across all of my customers and prospective customers?

Â 1:50

So if we look at the toolbox that's available to us,

Â the statistical methods that we have,

Â the right model is really going to depend on the type of data that we're looking at.

Â We've got truly four different I've included continuous data in this set.

Â But what kind of customer level data might we have?

Â Well, we're going to have a lot of choice data.

Â I'm choosing between Coca-Cola and Pepsi.

Â I'm choosing between AT&T, Verizon, and Sprint.

Â I'm making the choice in that context.

Â We might have count data.

Â How much am I going to purchase?

Â How much quantity am I going to purchase on a particular shopping occasion?

Â We might look at timing data.

Â 2:34

When do I become a customer?

Â How long do I stay as a customer?

Â How long is it between visits to my favorite website?

Â How long is it between purchases?

Â All of those are timing or duration observations.

Â And if I were to combine multiple pieces of data, might be,

Â let's say choice, and count data.

Â Well, I choose a brand, and then how much of it do I buy?

Â Well that would be multivariate data.

Â Or, I have interpurchase times, and then how much do I buy?

Â Combining count and timing data.

Â Well those are the different types of data sources that we might encounter and

Â all of those are going to have different methods associated with dealing with them.

Â In terms of marketing we might be interested in ROI,

Â understanding marketing effectiveness.

Â We might be interested in understanding clickstream behavior of customers

Â online and targeting advertisements at them.

Â If you're looking at loyalty programs or

Â social media activity, all of this is producing individual level data.

Â It is not just being produced at the level of how many total purchases do we have,

Â but it's who are the individuals conducting these purchases?

Â And as we have access to those individual histories,

Â that's what's going to allow us to make those individual forecasts.

Â Conduct customer evaluation at the level of the individual customer.

Â 3:56

So, let me give you a brief refresher from what we talked about last class

Â with our forecasting models and building out regression models.

Â We came up with a prediction, our Y variable is our prediction.

Â So our prediction Y set on a base of predictors.

Â X1, marketing activity 1, 2, however much marketing activity that we have.

Â 4:23

And that's our best guess but we don't always observe our best guess and

Â that's where the error term comes in.

Â Well, the error term is the difference here, epsilon, between what did I predict,

Â Y hat, and what did I actually observe?

Â And we make the assumption when we're running regression models,

Â specifically, when we're running linear regression models

Â that that error term follows a normal distribution.

Â That that error follows that bell-shaped normal distribution.

Â 4:53

Another way that we could look at this would be to say my

Â observation Y itself, follows a normal distribution,

Â with a particular expectation mu, and that's my best guess, mu.

Â And that's what's giving me my regression equation.

Â That's where my X variables come into play.

Â Well, that's assuming that everything follows a normal distribution.

Â When we are dealing with customer level data, then normal distribution isn't

Â necessarily going to be appropriate if I'm dealing with choice data or

Â count data or duration or timing data.

Â Using that normal distribution just doesn't make sense because that's not what

Â the data ultimately looks like.

Â 5:37

But just like when we're conducting linear regression,

Â we're going to start by building up a model based on the data that we observe.

Â And we're going to refer to this as the likelihood function.

Â So if I have N different observations, y1 through yN,

Â and I make the assumption that they all follow a particular distribution.

Â Now when we're doing linear regression,

Â we assume that they follow a normal distribution.

Â 6:04

But what we're going to ultimately look at is when we estimate parameters,

Â the question we're asking is, how likely, under a given set of values for

Â those parameters, how likely am I to observe the set of data that I did?

Â And what we try to do is make that likelihood, make that probability, as big

Â as possible so we're going to maximize the likelihood of observing our data.

Â That is, we're going to choose the right values for our coefficients or

Â our parameters that makes observing the data as likely as possible.

Â All right, so if our likelihood function is a product of the likelihoods

Â associated with each observation, we're going to choose the parameters, theta,

Â that make observing the actual data,

Â y1 through yN as likely as possible.

Â All right, so let me just give you this example using the normal distribution to

Â give you a sense that we've been doing this all along, we just didn't know it.

Â So linear regression is actually maximum likelihood estimation.

Â So, if we had a set of data x1 through xN that we said

Â came from a normal distribution.

Â 7:17

Well, that is the likelihood expression that goes along with N data

Â points from a normal distribution.

Â And if I want to find the value of mu to make that as likely as possible,

Â well, I'm going to take the derivative and set the derivative equal to 0.

Â And so now,

Â from a computational standpoint, rather than trying to maximize the likelihood,

Â which is going to, in most cases, end up being a very,

Â very tiny probability that computer programs can't distinguish from 0.

Â Well, we're going to employ a mathematical trick here.

Â Rather than maximizing the likelihood,

Â we're ultimately going to maximize the logarithm of the likelihood.

Â It turns out, it does not going to change our results,

Â just changes the scale that we're working on.

Â So if I take the log off the likelihood function,

Â this is the equation that I have.

Â Now let's take our first derivative and

Â find the value of mu that's going to maximize that log likelihood.

Â Well, taking our first derivative, and this is our first derivative,

Â that we're setting equal to 0.

Â And now we're just solving for the value of mu.

Â And what we end up with, really your maximum likelihood estimate for

Â the mean, is the sample average.

Â So, if you've got data that you believe follows the normal distribution,

Â your maximum likelihood estimate is no different

Â from just taking the sample average.

Â So, in a lot of cases what we think intuitively is going to line up

Â with that maximum likelihood estimate.

Â 8:58

All right, so let's drill down.

Â And again, we're going to focus today on two types of data very common

Â within marketing, choice data, and timing data.

Â Now, within Excel we're going to talk a lot about binary choices, yes-no outcomes.

Â The technique that we're going to be using

Â is going to generalize to those multinomial options.

Â When I have three, four, five different options and

Â I'm picking one among that set of options.

Â So we'll look at the techniques for choice decisions.

Â We'll also look at models that are specific to duration data.

Â 9:34

So, choice data, very common throughout marketing.

Â I've put together here some examples of choices that customers might face.

Â Do you buy a particular category on a shopping trip?

Â Do you buy a particular brand on a shopping trip, yes or no?

Â Did you acquire service, yes or no?

Â Did you keep service, yes or no?

Â Did you decide to file a complaint with the company, yes or no?

Â Any time we're categorizing things into yes or no,

Â brand a versus brand b, we're talking about a binary choice.

Â So it's going to be a very common type of marketing data for us to deal with.

Â