0:05

All right, so we're going to look at the Kiwi Bubbles data set.

Â It was a panel of customers, and what was monitored was when those customers

Â purchased for the first time a new drink product codenamed Kiwi Bubbles.

Â That's if they acquired the product during our observation period.

Â So we're going to look at using timing models, a geometric model, as well as

Â a model with a linear time trend, to model the adoption of this particular product.

Â And then we're going to modify those models to add in this idea

Â of the hardcore never buyers, the HCNB segment, to recognize that just some

Â customers in the panel, no matter how long we wait, aren't going to be interested.

Â So four different modeling specifications that we're going to take a look at.

Â Now you'll find a general description

Â of the approach to the problem that we're going to take up on the course website.

Â 1:04

So the raw data that we have here you see is for the product Kiwi Bubbles.

Â There are 1,499 panelists in total.

Â And what we have are, the cumulative number of triers from that panel.

Â So in the first week that this product was launched, only 8 customers purchased it,

Â by the second week, we had 14 customers who had purchased it.

Â By the third week, 16 who had purchased this.

Â That's cumulative.

Â All right, so we're going to start by estimating the geometric model.

Â One that assumes a constant probability of trying the product each week.

Â So the time until it option is assumed to follow a geometric process.

Â So let's start there.

Â And we're going to use default value to get us started,

Â let's assume that the probability is .5.

Â All right, so the same data that we had previously,

Â we'll zoom in a little bit to be able to see this more clearly.

Â So let's start by constructing the number of incremental triers,

Â because what we need to do is come up with what's the probability that

Â 2:20

We can multiply, that's going to be the likelihood associated with

Â each of the incremental triers.

Â So just to walk through the approach that we're going to be taking,

Â we're going to first calculate out what those incremental triers,

Â how many there are, that will be straight forward for us.

Â In column D, we're going to calculate that survival probability.

Â So what's the probability that someone has not yet

Â tried the product by the beginning of that week?

Â We're going to then calculate the incremental probability

Â of trying it in a particular week in column E.

Â In column F, we then going to combine the number of incremental triers we have with

Â that probability of try on a particular week to produce our log likelihood.

Â We're also going to have to take into account the likelihood associated with

Â the number of people who have not yet tried the product.

Â And then what you see in columns G and H is we're going to predict how many people,

Â according to our model, should have tried the product by a particular week.

Â That is, what's the expectations?

Â Now see, we provided up the data for 52 weeks,

Â but we're not going to use necessarily that full dataset.

Â What we may want to look at, for example, is splitting that data so

Â that we have a portion to be used for forecasting.

Â The alternative, we could use that full dataset and

Â make our forecast based on that.

Â Now what you'll see later in this worksheet,

Â I've also put in completed versions of what we're going to be working through,

Â so you can go back and look at the videos the steps that we are taking.

Â You can also refer to the completed sheet of the corresponding module, all right.

Â So what we are going to do in building this up is,

Â we are going to try to build the model using the first 26 weeks of data.

Â And then we are going to see how well the model does at testing the rest of

Â this data.

Â All right, so to get started,

Â let's calculate out the number of incremental triers.

Â Well that's going to be the difference between how many people tried it,

Â in the first week, it's just going to be the number of triers that we have,

Â that's the first week it's offered.

Â In subsequent weeks,

Â it's going to be the difference between the cumulative triers in adjacent weeks.

Â So cell C7, that's going to be the difference between how many

Â people tried it in week 2, and how many people tried it in week 1.

Â 5:01

And so if we look at this, if we look at the sum, it's 139 incremental triers and

Â that matches the total number of triers that we have in our data.

Â So no problems there.

Â Next, in cell D6, what we want to calculate is what's the probability that

Â someone hasn't tried it by the beginning of this particular week?

Â Well, this is a survival probability.

Â 5:25

Prior to the first week, it's guaranteed that they haven't tried it, so

Â we're going to start at one.

Â Now, if we're assuming a geometric model, we can say that there is a constant

Â probability each period of people trying the product.

Â And so what's the probability that someone by week two,

Â has not yet tried the product?

Â Well, D7 we can write as it's a probability they hadn't tried it yet

Â multiplied by the probability that they do not try it in the current week.

Â 6:04

So, D6 tells us the probability they hadn't tried it previously,

Â multiplied by 1-F1, that's the probability they don't try it In the current period.

Â And we're just going to copy that formula all the way down.

Â Now one of the reasons I wanted to put in 0.5 as that starting probability is it

Â produces that sequence that we might expect of what's the probability of

Â someone not having tried it by the beginning of the second week is 0.5.

Â Probability of someone not having tried it by the start of the third week is 0.25.

Â By the fourth week, .125.

Â So making sure that everything looks right.

Â So if we have column D telling us the probability that they have not yet

Â tried the product by the start of that week,

Â what's the probability they'd try it in a particular week?

Â We can take the difference between the survival probabilities at

Â different points in time to calculate this particular probability.

Â So what's the probability that someone tries the product in the first week?

Â Well, it's going to be the difference between,

Â not having tried it before the start of the first week, and

Â not having tried it by the start of the second week.

Â 7:23

And so for our log likelihood what

Â we want to put on Column F, we want to take into account, how may people does

Â this particular probability correspond to, and what is the probability itself?

Â So the log likelihood for 8 individuals,

Â that's given in C6, well, we're going to take the natural logarithm

Â of the probability of trying it in the first week.

Â 8:10

So that's going to give us our log likelihood for

Â the observations where people try the product in week one.

Â And we're going to copy this down, for the first 26 weeks,

Â because that's what we're going to be using test to made our model.

Â All right, so we've got the log likely to associate with

Â the people who tried the product in the first 26 weeks.

Â What we then have to take into account,

Â what about the people who didn't try this product by this particular point and time?

Â So in cell F32, we're going to calculate and

Â manually type in this log likelihood.

Â So for the 1499 minus the 101 individuals who tried the product.

Â 10:14

All right, so the next step for us is going to be the u solver.

Â So to find the appropriate value for this probability.

Â Now that's under the data ribbon, and we're going to click on solver.

Â And you'll see the set of constraints that I added in place here.

Â The constraints that we have is we want to maximize the value of the cell F2.

Â So the objective function is in F2,

Â we want to maximize it by changing the value of F1, our adoption probability.

Â Now this has to be a probability, so you'll see I've added constraints

Â 10:50

saying that it must be greater than a very, very, very small number that is

Â just bigger than 0, and it's gotta be less than 1.

Â And we click on Solve button, that's it.

Â So the acquisition probability in a given for

Â an individual is 0.003 or 0.03%.

Â All right, now what does that mean to us?

Â How good of a job is this model actually doing in terms of

Â forecasting customers adoption of this particular product?

Â Well let's take a look at that.

Â So we can look at the predicted number of incremental adopters.

Â 11:33

So in cell G6, we can calculate this as the total sample size,

Â the 1499, multiplied by the weekly adoption probability, right?

Â So what fraction of people are going to try it in this particular week?

Â That's given to us in column E, and

Â we can copy this formula all the way down this spreadsheet.

Â 12:15

And then in subsequent periods, it's the number of previous

Â cumulative adopters plus the number of new incremental adopters.

Â We can copy that formula down our spreadsheet.

Â And now we want to take a look at, and

Â see how good a job are we doing in actually capturing the data.

Â Let's take a look at this, looking at the cumulative triers,

Â 14:11

It seems like we're doing a reasonable job during the calibration period.

Â Early on, we're okay, later on, we're okay, in the middle, it's not great.

Â But once we get into our forecasting period, our geometric model is assuming

Â much higher adoption that we're actually observing.

Â And if we wanted to change this instead of looking at the cumulative adaptors what if

Â we look at the incremental adaptors to see what's happening on a week to week basis.

Â So I can click on the lines in the graph and just drag the cell references.

Â And again, we do an okay job on in incremental basis for the first 26 weeks,

Â but looks like adoption afterward is considerable lower.

Â So the geometric module on its own doesn't seem to be doing the trick, all right.

Â So the next thing that we're going to do is we're going to move on to

Â building a model within linear time trend.

Â 15:11

Now if you want to go back and review on the geometric completed sheet,

Â you'll see the work that we just conducted.

Â So let's try to improve the model to allow for a linear time trend.

Â The more time that has passed the less adoption we observe over time,

Â the lower the adoption probability in a given week.

Â And so we're starting roughly with the same worksheet.

Â 15:43

But notice up top, we now have two coefficients.

Â There's an alpha and a beta, so alpha's going to be our intercept,

Â beta1 is going to be the impact of time.

Â We're going to follow the same structure,

Â what's the probability that someone has not yet tried the product?

Â And so we're going to calculate out that probability.

Â For each week, that's essentially our survival function.

Â We construct the incremental probabilities to give us the probability

Â of trial in a particular week and we use that to get the log likelihood.

Â So in this case again, we're going to start out as one,

Â because no one is trying the product before week one.

Â What about prior to week two?

Â Well, it's going to be a probability that you survived up until now,

Â multiplied by the probability that you don't try the product in week one.

Â All right, so

Â we're going to use transformation again to calculate that probability.

Â And what we want is for the loadshare probabilities,

Â so E raised to the linear combination divided by 1 plus E to that linear

Â combination to reflect the probability that you do try the product.

Â So 1 minus that is going to be the probability that you do not

Â try the product in the previous week.

Â So, it's going to be 1 divided by 1 plus

Â E raised to the power of the intercept plus the coefficient for

Â time multiplied by the week number.

Â 19:20

Then, in cell F32, we're going to calculate

Â out the probability associated with the people who did not yet try the product.

Â So, that's the difference between our sample size and the number of cumulative

Â triers through week 26 multiplied

Â by probability of not trying the product by this particular point in time.

Â So, natural log of cell D32.

Â So we have our initial value set for

Â alpha and beta, our sample log likelihood,

Â we're going to calculate the sum of F6 through F32.

Â And now we're going to get solver up and running.

Â 20:16

All right, so

Â things are going to be a little bit different in this case because previously,

Â when we ran the geometric model, we only had one cell that we we're changing.

Â Now, our objective function is in cell F3, and

Â we want to change cells F1 and F2.

Â I'm going to get rid of these prior constraints so I'm going to delete those.

Â 20:46

And just to make sure that the solver doesn't go to values that are too extreme,

Â you can take a look at the constraints that I've used to run this

Â model previously, they're provided in the hand out.

Â Let's try to run it without any constraints and

Â then we'll see what happen.

Â If we get an error message, we'll put in some constraints.

Â 21:12

And so this was what the concern was.

Â You can see we're getting that error message that

Â solver's just going to values that are too large, and it's producing errors for us.

Â And that's probably because of the week variable and

Â the number they are multiplied by a large coefficient giving us an error message.

Â You know, running into values that the computer just cant process.

Â So let's go back into solver and

Â adding the constraint that our time coefficient has

Â to be no larger than or has to be larger than.

Â Yeah we ran into that error message, we saw it's a negative 1.6.

Â Let's try negative .9999 as a minimum value and

Â let's take a look and see if we have any better luck in this case.

Â All right, there we go.

Â And when you could test out different solutions to see

Â how sensitive the results are to the starting values but

Â as we would expect in kind of what we saw from the geometric model where

Â we were over forecasting the number of triers was there's a negative time.

Â 22:23

All right, there's a negative effect of time.

Â As more time passes, people are less and less likely to try the product.

Â Now, we can build out the same charts that we did before,

Â how many incremental tryers are we going to have,

Â it's our overall sample multiplied by that incremental tryer probability

Â 23:18

Those last rows cause us some problems.

Â Those are, because the end of our probability equation.

Â So if we want to take a look at how good a job are we doing at capturing incremental

Â triers, we'll hold down Ctrl+Shift from C5, click on down,

Â that'll highlight, Column C for

Â us and it looks like we haven't dragged this formula down so let's do that first.

Â That is I'm going to copy down our incremental triers formula so

Â that we can compare for the 52 weeks, so

Â from C5 I'm going to hold Control Shift and press down to C 57

Â 23:59

And I'm going to then release all of the keys,

Â and going to hold control when I click on G57.

Â And then I'm going to hold shift and

Â the up arrow to highlight up to G5.

Â And let's take a look at the graph that we produced doing that.

Â And in this cause it looks pretty good.

Â Notice that we're not perfect, but we only use the data up through week 26.

Â And our negative time trend seems to be a much better fit for the data.

Â And then, the number of incremental triers later on continues to decline and

Â we're able to pick that up with the linear time trend.

Â All right, and just as we had when it came to

Â 26:02

Now, with the hardcore never buyers,

Â they're saying there are really two distinct segments in your population.

Â One of those segments is going to follow a distribution, in our case a geometric

Â distribution, they're actually going to try the product, we just don't know when.

Â The other segment is never ever going to try the product no matter

Â how long we wait.

Â 26:28

So let's start by going down to C33 and

Â just copying the incremental formula down the rest of this column.

Â Now, we're going to start in column D by constructing our survival probability,

Â just as we have before.

Â And then, in column E, when we're saying,

Â that survival probability is going to be conditional on tried at some point.

Â 26:58

hardcore never buyer segment.

Â And that needs to follow through all the way to that likelihood.

Â So let's start with our survival probability.

Â This survival probability is going to be identical

Â to how we did it on the geometric worksheet.

Â So we're going to start out with one for not having tried it before the first week,

Â then what's the probability that you try it after that period,

Â it's going to be probability they've not yet

Â tried it, multiplied by that weekly adoption probability.

Â 27:51

All right? The next step.

Â Now, here's where we differ, alright?

Â What we had done before,

Â we said how do we calculate the probability that you try it this week?

Â Well, we just take the difference between not having tried it befor week one and

Â not having tried it before week two.

Â But that only works if we know you're in the segment of people

Â who are eventually going to try the product.

Â 28:44

All right, so our log likelihood specification for

Â the first 26 weeks, same as we've done before, at this point.

Â It's the number of individuals that exhibit this behavior,

Â multiplied by the logarithm, the natural log of what's the probability that you try

Â it in this particular week And

Â we're going to drag this formula down for

Â the first 26 weeks in our file.

Â 29:36

And so, we've captured the log likelihood for

Â the 101 people who tried the product by the end of week 26.

Â What about the individuals who didn't try the product?

Â Well, one possibility for

Â the 1,499 individuals

Â minus the 101 individuals.

Â So B2 minus B33.

Â It's the natural log of the probability that they don't try the product,

Â that they're not observed to try the product by the end of week 26.

Â One possibility is that they just haven't gotten around to it yet.

Â That is, they are going to try the product,

Â so the chances of them trying the product at some point,

Â they have to be in the segment of people who will do that.

Â So one minus F2.

Â 31:05

The other possibility, and I'm going to add just some parentheses to make

Â sure that I don't have any problems with order of operations.

Â The other possibility is that they're never ever going to try it, and

Â that's given by a probability that they're in the segment of hardcore never

Â buyers with a probability of F2.

Â All right, so, it's the number of individuals who didn't try the product,

Â B2 minus B33, multiplied by the log of the probability.

Â Probability comes about in two ways, either you're never going to

Â ever try this product no matter how long we wait, that probability is given by F2.

Â Or you are going to try the product at some point,

Â with the probability of one minus F2.

Â And you just haven't gotten around to it yet, with the probability given by D34.

Â All right, so that gives us our log likelihood for each time period.

Â Let's calculate that sum, so in F3,

Â we're going to enter the overall, the sum from F8 through F34.

Â And we're going to use solver to figure out

Â what are the appropriate values for what we're calling P and Q.

Â 33:24

And same approach for the cumulative adopters.

Â So the first ones are given by the week one incremental adopters,

Â and then in subsequent weeks it's the previous cumulative adopters plus the new

Â incremental adopters.

Â And we can copy that one down, all right?

Â And this last row again,

Â just created by that formula going a little bit long for us.

Â So let's take a look at how good a job this model does.

Â 36:43

Calculate the probability of not yet trying the product the same way that we

Â had previously.

Â Then we're going to in column E when we calculate the incremental

Â trier probability we are going to take into account

Â the probability of never ever trying the product.

Â Calculate our log of likelihood, and when it comes to that final observation,

Â the block of people who didn't try the product,

Â remember we've got to take into account the two possibilities.

Â You could try it at some point, you just haven't done it yet.

Â Or you may never ever try that product.

Â And then we can look at the comparison between actual data and our predictions.

Â And just to see what this looks like,

Â I'm going to jump over to the completed worksheet.

Â