0:05

Just want to give you a little bit of a thought exercise.

Â I've picked a couple of companies here that deal a lot in consumer data.

Â Apple or any online business really, Netflix, and then Whole Foods.

Â One brick and mortar company in the bunch.

Â But think for a second about the types of decisions consumers make

Â 0:31

with these businesses and framing them in terms of choices that consumers make.

Â So, for example at Whole Foods,

Â it might be am I going to buy a particular brand on a given shopping trip?

Â Yes or no?

Â Well, for the company's standpoint,

Â it might be helpful to know which of those brands are going to be popular.

Â Which ones are people going to buy on different trips?

Â Is there seasonality associated

Â with their products when they're making their ordering decisions?

Â Am I going to come to Whole Foods when I need groceries?

Â Yes or no?

Â I could go to one of the other grocery stores that's available to me.

Â So what about the people who choose to shop at Whole Foods?

Â They're more likely to go there when they're making their shopping trips.

Â With Netflix, do I retain service this month?

Â Yes or no?

Â Do I choose to watch the recommended series?

Â Yes or no?

Â Do I choose a larger plan this month?

Â Do I add on the DVD service this month, yes or no?

Â Similar types of decisions you can imagine consumers making,

Â whether it's Apple or Amazon or any other business.

Â So there are a lot of customer choices that are driving these businesses.

Â Again highlighting the importance of understanding what's the right way for

Â us to be analyzing this choice data.

Â 1:55

And the reason that I wanted to talk upfront about distributional assumptions,

Â saying we're used to using the normal distribution.

Â Well, what we're really going to be changing is

Â that distributional assumption.

Â When it comes to binary choices, we're not going to be using the normal distribution.

Â We're going to assume that customer choices between a yes or

Â a no outcome follows a Bernoulli distribution and

Â there are only two values allowed under a Bernoulli decision.

Â 1 or 0, yes or no, and the only parameter

Â that's associated with the Bernoulli distribution is the probability p.

Â So with the probability p you get a 1, with the probability of 1- p,

Â you get a 0, again framed differently.

Â With a probability of p, there is a yes outcome,

Â with a probability of 1- p, there is a no outcome.

Â Now we can calculate the mean and

Â the variance associated with the Bernoulli distribution, and we've done that here.

Â All right, so, the expected value under a Bernoulli distribution if

Â we take what are the outcomes, 1 and 0.

Â And what are the probabilities associated with those outcomes?

Â 3:11

Our expectation is that's the mean of the Bernoulli distribution,

Â it's the probability p.

Â We can also calculate the variants under the Bernoulli distribution.

Â So when it comes to writing out the likelihood of a single observation

Â from the Bernoulli distribution, this is the form that it takes on.

Â Now, notice it's the probability p raised to the power y

Â times 1-p raised to the power of 1-y.

Â Now, it looks a little bit foreign, but

Â let's break it down based on the values that y can take on.

Â Suppose we observe a 1.

Â All right, y = 1.

Â Well, p raised to the power of y means I have a value of p.

Â (1- p) raised to the power of 1- y, so

Â raised to the power of 0, that term is going to go away.

Â So the likelihood for a single draw from a Bernoulli distribution,

Â if I observe a 1, y = 1, the likelihood is p.

Â Well, what if I drove y = 0?

Â If y = 0, it's p raised to the power of y.

Â P raised to the 0, well that term equals 1, so that essentially goes away.

Â And then, I'm left with a likelihood of 1- p, raised to the power of 1- 0.

Â So when I observe a 1, the likelihood is p.

Â When I observe a 0, the likelihood is 1- p.

Â That's just mapping onto the two values that we talked about earlier.

Â And then product say let's multiply that function over all the data

Â points that we observe.

Â 4:59

the outcomes y follow a normal distribution with a mean mu.

Â And we said mu was a function of marketing activity.

Â Well what we're going to do here is say my outcome is a function of the parameter p.

Â Well my probability p is going to be a function of marketing activity.

Â We're just going to change the form in which that marketing activity affects

Â the probability p.

Â All right, so we talked about this piece already, I said outcomes follow

Â Bernoulli distribution and we can write out the likelihood function.

Â When we bring in marketing activity, we're going to change that a little bit and

Â say that the probability's p.

Â Well there going to be a function of the marketing activity.

Â All right, so we're going to look at an example for customer acquisition.

Â Well marketing actions are going to affect the acquisition probability.

Â So the acquisition probability may be affected by, did I send you an email?

Â Did I send you a coupon?

Â 6:09

We're using a technique,

Â it's GLM if you haven't seen the abbreviation, generalized linear model.

Â And what we're saying is a function of the expectation

Â is going to actually look like a regression equation.

Â So we can think in using the same logic from linear regression,

Â it's just going to look slightly differently when we put it into math.

Â 6:32

All right, so two different models that are commonly used, one is the logit model,

Â and you could see here, this is the functional form that we're going to use.

Â So the probability, it's the exponential function where e raised to the power

Â of x transpose beta divided by 1 + e raised to the power of x transpose beta.

Â 6:55

One thing to keep in mind, we're talking about a probability.

Â p is always going to be a value between 0 and 1.

Â This x transpose beta term, well that's actually our regression equation.

Â Our progression equation previously looked like we had an intercept beta

Â 0 + coefficient beta 1 times x1 + coefficient

Â beta 2 times x2 and however many coefficients we have.

Â That's our regression term.

Â So every time you see that x transpose beta,

Â just plug in your regression equation because that's all we're doing.

Â So think of this as rescaling your regression equation.

Â That regression equation can take on values negative and positive.

Â We've got to somehow make that into a probability, bounded between 0 and 1.

Â So the exponential e raised to that power divided by 1

Â + e raised to that power guarantees that it's going to be between 0 and 1.

Â That's the Logit model.

Â Another model that we could use,

Â it's referred to as the Probit model where we plug, excuse me,

Â we plug in the regression equation we have into the normal CDF.

Â And that's going to give us our probability between 0 and 1.

Â For the most part you're going to get very similar predictions between

Â these two approaches,

Â with the exception of when we get far out to the tails of the distribution.

Â All right, just to give you sense,

Â this is going to be consistent with economic theory, random utility theory,

Â where you choose the option that provides you the highest utility.

Â So, utilities is going to be comprised of two components.

Â x transpose beta, that's our deterministic component,

Â that's the place where the marketing activity comes in.

Â And then the random component.

Â Well, depending on what assumptions we make about the distribution that,

Â that random component comes from,

Â we're either going to end up with a Logit Model or the Probit Model, all right?

Â So, we have the Logit Model on one side, we've got the Probit Model on the other.

Â Just different ways of translating that utility into a probability.

Â 9:08

For our demonstration purposes, we're going to stick with using the Logit model,

Â but very similar intuition carries through for implementing the Probit model.

Â And in fact, that's something that also can be done within

Â Excel using the, I believe, it's equal norm cvf function.

Â