All right, so we're going to look at the Kiwi Bubbles data set. It was a panel of customers, and what was monitored was when those customers purchased for the first time a new drink product codenamed Kiwi Bubbles. That's if they acquired the product during our observation period. So we're going to look at using timing models, a geometric model, as well as a model with a linear time trend, to model the adoption of this particular product. And then we're going to modify those models to add in this idea of the hardcore never buyers, the HCNB segment, to recognize that just some customers in the panel, no matter how long we wait, aren't going to be interested. So four different modeling specifications that we're going to take a look at. Now you'll find a general description of the approach to the problem that we're going to take up on the course website. And let's move over to that Excel file. So the raw data that we have here you see is for the product Kiwi Bubbles. There are 1,499 panelists in total. And what we have are, the cumulative number of triers from that panel. So in the first week that this product was launched, only 8 customers purchased it, by the second week, we had 14 customers who had purchased it. By the third week, 16 who had purchased this. That's cumulative. All right, so we're going to start by estimating the geometric model. One that assumes a constant probability of trying the product each week. So the time until it option is assumed to follow a geometric process. So let's start there. And we're going to use default value to get us started, let's assume that the probability is .5. All right, so the same data that we had previously, we'll zoom in a little bit to be able to see this more clearly. So let's start by constructing the number of incremental triers, because what we need to do is come up with what's the probability that the product is tried in a particular week. So if we have the probability that the product is tried this week. We can multiply, that's going to be the likelihood associated with each of the incremental triers. So just to walk through the approach that we're going to be taking, we're going to first calculate out what those incremental triers, how many there are, that will be straight forward for us. In column D, we're going to calculate that survival probability. So what's the probability that someone has not yet tried the product by the beginning of that week? We're going to then calculate the incremental probability of trying it in a particular week in column E. In column F, we then going to combine the number of incremental triers we have with that probability of try on a particular week to produce our log likelihood. We're also going to have to take into account the likelihood associated with the number of people who have not yet tried the product. And then what you see in columns G and H is we're going to predict how many people, according to our model, should have tried the product by a particular week. That is, what's the expectations? Now see, we provided up the data for 52 weeks, but we're not going to use necessarily that full dataset. What we may want to look at, for example, is splitting that data so that we have a portion to be used for forecasting. The alternative, we could use that full dataset and make our forecast based on that. Now what you'll see later in this worksheet, I've also put in completed versions of what we're going to be working through, so you can go back and look at the videos the steps that we are taking. You can also refer to the completed sheet of the corresponding module, all right. So what we are going to do in building this up is, we are going to try to build the model using the first 26 weeks of data. And then we are going to see how well the model does at testing the rest of this data. All right, so to get started, let's calculate out the number of incremental triers. Well that's going to be the difference between how many people tried it, in the first week, it's just going to be the number of triers that we have, that's the first week it's offered. In subsequent weeks, it's going to be the difference between the cumulative triers in adjacent weeks. So cell C7, that's going to be the difference between how many people tried it in week 2, and how many people tried it in week 1. And we can copy that formula all the way down our spreadsheet. And so if we look at this, if we look at the sum, it's 139 incremental triers and that matches the total number of triers that we have in our data. So no problems there. Next, in cell D6, what we want to calculate is what's the probability that someone hasn't tried it by the beginning of this particular week? Well, this is a survival probability. Prior to the first week, it's guaranteed that they haven't tried it, so we're going to start at one. Now, if we're assuming a geometric model, we can say that there is a constant probability each period of people trying the product. And so what's the probability that someone by week two, has not yet tried the product? Well, D7 we can write as it's a probability they hadn't tried it yet multiplied by the probability that they do not try it in the current week. So, D6 tells us the probability they hadn't tried it previously, multiplied by 1-F1, that's the probability they don't try it In the current period. And we're just going to copy that formula all the way down. Now one of the reasons I wanted to put in 0.5 as that starting probability is it produces that sequence that we might expect of what's the probability of someone not having tried it by the beginning of the second week is 0.5. Probability of someone not having tried it by the start of the third week is 0.25. By the fourth week, .125. So making sure that everything looks right. So if we have column D telling us the probability that they have not yet tried the product by the start of that week, what's the probability they'd try it in a particular week? We can take the difference between the survival probabilities at different points in time to calculate this particular probability. So what's the probability that someone tries the product in the first week? Well, it's going to be the difference between, not having tried it before the start of the first week, and not having tried it by the start of the second week. And again, we can copy this formula all the way down. And so for our log likelihood what we want to put on Column F, we want to take into account, how may people does this particular probability correspond to, and what is the probability itself? So the log likelihood for 8 individuals, that's given in C6, well, we're going to take the natural logarithm of the probability of trying it in the first week. So, for a given individual, if we found out that they did not try the product in the first week, it would just be the natural log of cell E6. Because there are eight individuals who exhibited that behavior, that's why we multiply C6 by the natural log of E6. So that's going to give us our log likelihood for the observations where people try the product in week one. And we're going to copy this down, for the first 26 weeks, because that's what we're going to be using test to made our model. All right, so we've got the log likely to associate with the people who tried the product in the first 26 weeks. What we then have to take into account, what about the people who didn't try this product by this particular point and time? So in cell F32, we're going to calculate and manually type in this log likelihood. So for the 1499 minus the 101 individuals who tried the product. So that's going to give us 1398 individuals, for all 1398 of these individuals, what's the probability that they have not yet tried the product, By week 26? We're going to take the natural logarithm, all of the probability given in D32. This the probability of not yet having tried the product by the start of week 27. All right? And so we've now specified the log likelihood based on each week of our observation period. And the number of individuals who did not try the product. So what does our log likelihood for the entire sample look like in cell F2? We're going to calculate the sum from F6 to F32. And that's our log likelihood. All right, so the next step for us is going to be the u solver. So to find the appropriate value for this probability. Now that's under the data ribbon, and we're going to click on solver. And you'll see the set of constraints that I added in place here. The constraints that we have is we want to maximize the value of the cell F2. So the objective function is in F2, we want to maximize it by changing the value of F1, our adoption probability. Now this has to be a probability, so you'll see I've added constraints saying that it must be greater than a very, very, very small number that is just bigger than 0, and it's gotta be less than 1. And we click on Solve button, that's it. So the acquisition probability in a given for an individual is 0.003 or 0.03%. All right, now what does that mean to us? How good of a job is this model actually doing in terms of forecasting customers adoption of this particular product? Well let's take a look at that. So we can look at the predicted number of incremental adopters. So in cell G6, we can calculate this as the total sample size, the 1499, multiplied by the weekly adoption probability, right? So what fraction of people are going to try it in this particular week? That's given to us in column E, and we can copy this formula all the way down this spreadsheet. All right, we can also look at the percentage of accumulative adopters. So in column H, we can specify H6 as equal to the number of incremental adopters in the first period. And then in subsequent periods, it's the number of previous cumulative adopters plus the number of new incremental adopters. We can copy that formula down our spreadsheet. And now we want to take a look at, and see how good a job are we doing in actually capturing the data. Let's take a look at this, looking at the cumulative triers, and the predicted cumulative triers. So again, we can hold down Ctrl+Shift to highlight, Multiple regions. And we're just going to go down to row 56, that final row that you see. Now our data, looks like our formulas did not copy all the way through, so let's fix that very briefly. Like just copying a value into cell D58, And so if we want to look at the cumulative triers from column B versus the cumulative triers in column H. Let's take a look, so I'll insert the chart. And what we see, let's just make this chart a little bit larger for ourselves. It seems like we're doing a reasonable job during the calibration period. Early on, we're okay, later on, we're okay, in the middle, it's not great. But once we get into our forecasting period, our geometric model is assuming much higher adoption that we're actually observing. And if we wanted to change this instead of looking at the cumulative adaptors what if we look at the incremental adaptors to see what's happening on a week to week basis. So I can click on the lines in the graph and just drag the cell references. And again, we do an okay job on in incremental basis for the first 26 weeks, but looks like adoption afterward is considerable lower. So the geometric module on its own doesn't seem to be doing the trick, all right. So the next thing that we're going to do is we're going to move on to building a model within linear time trend. Now if you want to go back and review on the geometric completed sheet, you'll see the work that we just conducted. So let's try to improve the model to allow for a linear time trend. The more time that has passed the less adoption we observe over time, the lower the adoption probability in a given week. And so we're starting roughly with the same worksheet. But notice up top, we now have two coefficients. There's an alpha and a beta, so alpha's going to be our intercept, beta1 is going to be the impact of time. We're going to follow the same structure, what's the probability that someone has not yet tried the product? And so we're going to calculate out that probability. For each week, that's essentially our survival function. We construct the incremental probabilities to give us the probability of trial in a particular week and we use that to get the log likelihood. So in this case again, we're going to start out as one, because no one is trying the product before week one. What about prior to week two? Well, it's going to be a probability that you survived up until now, multiplied by the probability that you don't try the product in week one. All right, so we're going to use transformation again to calculate that probability. And what we want is for the loadshare probabilities, so E raised to the linear combination divided by 1 plus E to that linear combination to reflect the probability that you do try the product. So 1 minus that is going to be the probability that you do not try the product in the previous week. So, it's going to be 1 divided by 1 plus E raised to the power of the intercept plus the coefficient for time multiplied by the week number. And we're going to drag this formula all the way down and with coefficients of 0 in there, we should observe zeros across the board. I'm going to drag this all the way down so that we have this calculation done and can use it for our forecasting exercise. All right, so now we have a probability of no trials. And just to see what might happen, you could try changing some of these values. So what if we assume a negative coefficient on time. Now notice that that survival probability jumps up and that's what we would expect that if people become less likely to acquire the product as time passes then these probabilities of not yet trying the product should increase. Column E, probability of trial this week, just like we had done for the geometric worksheet, we're going to take the difference in the survival probability. So in E6, it's going to be D6 minus D7. And we're going to copy that formula all the way down. Okay, and then our log likelihood, the formula is going to look the same as what we had done previously. It's the number of incremental triers in the current week multiplied by the natural log of the probability of trying the product in the current week, so C6 times the natural log of E6. And we can copy that formula down. We're going to use the first 26 weeks for forecasting. Then, in cell F32, we're going to calculate out the probability associated with the people who did not yet try the product. So, that's the difference between our sample size and the number of cumulative triers through week 26 multiplied by probability of not trying the product by this particular point in time. So, natural log of cell D32. So we have our initial value set for alpha and beta, our sample log likelihood, we're going to calculate the sum of F6 through F32. And now we're going to get solver up and running. All right, so things are going to be a little bit different in this case because previously, when we ran the geometric model, we only had one cell that we we're changing. Now, our objective function is in cell F3, and we want to change cells F1 and F2. I'm going to get rid of these prior constraints so I'm going to delete those. And just to make sure that the solver doesn't go to values that are too extreme, you can take a look at the constraints that I've used to run this model previously, they're provided in the hand out. Let's try to run it without any constraints and then we'll see what happen. If we get an error message, we'll put in some constraints. And so this was what the concern was. You can see we're getting that error message that solver's just going to values that are too large, and it's producing errors for us. And that's probably because of the week variable and the number they are multiplied by a large coefficient giving us an error message. You know, running into values that the computer just cant process. So let's go back into solver and adding the constraint that our time coefficient has to be no larger than or has to be larger than. Yeah we ran into that error message, we saw it's a negative 1.6. Let's try negative .9999 as a minimum value and let's take a look and see if we have any better luck in this case. All right, there we go. And when you could test out different solutions to see how sensitive the results are to the starting values but as we would expect in kind of what we saw from the geometric model where we were over forecasting the number of triers was there's a negative time. All right, there's a negative effect of time. As more time passes, people are less and less likely to try the product. Now, we can build out the same charts that we did before, how many incremental tryers are we going to have, it's our overall sample multiplied by that incremental tryer probability Multiplied by the incremental trier probability so that's in E6. And then we can predict out our cumulative triers as starting with the incremental triers in the first week and then adding to that H6 plus G7, copying that formula down. And again, that last row looks like we've got a formula. Those last rows cause us some problems. Those are, because the end of our probability equation. So if we want to take a look at how good a job are we doing at capturing incremental triers, we'll hold down Ctrl+Shift from C5, click on down, that'll highlight, Column C for us and it looks like we haven't dragged this formula down so let's do that first. That is I'm going to copy down our incremental triers formula so that we can compare for the 52 weeks, so from C5 I'm going to hold Control Shift and press down to C 57 And I'm going to then release all of the keys, and going to hold control when I click on G57. And then I'm going to hold shift and the up arrow to highlight up to G5. And let's take a look at the graph that we produced doing that. And in this cause it looks pretty good. Notice that we're not perfect, but we only use the data up through week 26. And our negative time trend seems to be a much better fit for the data. And then, the number of incremental triers later on continues to decline and we're able to pick that up with the linear time trend. All right, and just as we had when it came to the geometric model, we have a completed worksheet already. You can go and take a look at that. We can look at, how good a job did we do at capturing cumulative behavior? Let's move those reference boxes for this chart over there for each line. Just click and drag. And in terms of predicting cumulative behavior, notice it's not perfect. But we're doing a pretty good job, especially compared to the model that we had previously. All right. So, we're going to skip passed. If you want to try to challenge yourself, the idea of adding a quadratic time trend, so instead of assuming just a linear effect of time, adding the quadratic term, but it's going to be that same approach. All right, so the next thing that we want to try doing is adding this idea of the never trier segment. Now, with the hardcore never buyers, they're saying there are really two distinct segments in your population. One of those segments is going to follow a distribution, in our case a geometric distribution, they're actually going to try the product, we just don't know when. The other segment is never ever going to try the product no matter how long we wait. And so, that's the model that we want to build next. So let's start by going down to C33 and just copying the incremental formula down the rest of this column. Now, we're going to start in column D by constructing our survival probability, just as we have before. And then, in column E, when we're saying, that survival probability is going to be conditional on tried at some point. Column E, we're going to make sure that we take into account our hardcore never buyer segment. And that needs to follow through all the way to that likelihood. So let's start with our survival probability. This survival probability is going to be identical to how we did it on the geometric worksheet. So we're going to start out with one for not having tried it before the first week, then what's the probability that you try it after that period, it's going to be probability they've not yet tried it, multiplied by that weekly adoption probability. And we're going to copy that formula all the way down our worksheet. And we're also going to make sure that we drag it down just one more time so that we can calculate out those incremental probabilities. All right? The next step. Now, here's where we differ, alright? What we had done before, we said how do we calculate the probability that you try it this week? Well, we just take the difference between not having tried it befor week one and not having tried it before week two. But that only works if we know you're in the segment of people who are eventually going to try the product. And so, we need to multiply this by the probability that you're actually going to try the product. And that's going to be given by one minus the probability that you'll never, ever try the product. So that's F2. Okay. And we're going to drag this formula down to week 52. All right, so our log likelihood specification for the first 26 weeks, same as we've done before, at this point. It's the number of individuals that exhibit this behavior, multiplied by the logarithm, the natural log of what's the probability that you try it in this particular week And we're going to drag this formula down for the first 26 weeks in our file. So that's going to take us down, I believe, to row 31, or 32. Let's just make sure that we've going through the right week. All right, so row 33 is week 26. And so, we've captured the log likelihood for the 101 people who tried the product by the end of week 26. What about the individuals who didn't try the product? Well, one possibility for the 1,499 individuals minus the 101 individuals. So B2 minus B33. It's the natural log of the probability that they don't try the product, that they're not observed to try the product by the end of week 26. One possibility is that they just haven't gotten around to it yet. That is, they are going to try the product, so the chances of them trying the product at some point, they have to be in the segment of people who will do that. So one minus F2. So, they are going to try the product, they just haven't gotten around to it yet. Probability of not having gotten around to it yet is going to be given to us in D34. The other possibility, and I'm going to add just some parentheses to make sure that I don't have any problems with order of operations. The other possibility is that they're never ever going to try it, and that's given by a probability that they're in the segment of hardcore never buyers with a probability of F2. All right, so, it's the number of individuals who didn't try the product, B2 minus B33, multiplied by the log of the probability. Probability comes about in two ways, either you're never going to ever try this product no matter how long we wait, that probability is given by F2. Or you are going to try the product at some point, with the probability of one minus F2. And you just haven't gotten around to it yet, with the probability given by D34. All right, so that gives us our log likelihood for each time period. Let's calculate that sum, so in F3, we're going to enter the overall, the sum from F8 through F34. And we're going to use solver to figure out what are the appropriate values for what we're calling P and Q. Now you notice in running solver we said we're going to maximum cell F3 by changing cells F1 and F2 and recognizing that both of these are probabilities. All right, and let's run solver and see what we get. Okay, so probability of weekly adoption given by 0.92, probability of never trying the product, given by 0.92, also. So 92% of the panel, according to this model, is never ever going to try our product. So let's see how good a job we do in terms of actual forecasting. Let's look at our incremental triers. And we're going to calculate that out the same way. So number of panelists we have, B2 multiplied by probability of trying the product in this particular week. And let's copy that all the way down. And same approach for the cumulative adopters. So the first ones are given by the week one incremental adopters, and then in subsequent weeks it's the previous cumulative adopters plus the new incremental adopters. And we can copy that one down, all right? And this last row again, just created by that formula going a little bit long for us. So let's take a look at how good a job this model does. I'm going to go under Insert and we're going to insert a line chart but we're going to have to tell this line chart where it needs to get the data from. Right, so we're going to add a series. So let's start with our incremental adopters and the values for those incremental adopters is going to be in G8 through G59, that's one series. And I want to add one more series. The name for that, that's going to be the actual data. It's going to be C7 and the value's there, it's going to be C8 or C59, okay? So I'll click OK and that should produce the chart that we're interested in, right? And it looks pretty good in terms of the model fit. Yeah, it's not perfect but we're capturing the general trend. Let's try shifting this so that, let me insert the legend so we know what exactly we're looking at. So instead of focusing on the predicted incremental try. The predicted incremental try, I'm just going to click on the blue line and drag these reference boxes over. And then I'm going to do the same thing for the actual data. This is going to give us our cumulative triers. Now let's see how that looks. So that looks similar to the linear time trend model that we had built. So two different explanations that we're using here. One is that people are changing in the likelihood of trying the product over time. The other is that there is just a ceiling as far as how many people are ever going to be interested in trying this particular product. Now we can take the same approach for incorporating the hardcore never buyers into the linear model and I'll leave that as one for you guys to work through. But let's take a look at that completed worksheet. So how would we go about doing this? We would start by taking the incremental tier column, let's carry that all the down just so that we've got it. Calculate the probability of not yet trying the product the same way that we had previously. Then we're going to in column E when we calculate the incremental trier probability we are going to take into account the probability of never ever trying the product. Calculate our log of likelihood, and when it comes to that final observation, the block of people who didn't try the product, remember we've got to take into account the two possibilities. You could try it at some point, you just haven't done it yet. Or you may never ever try that product. And then we can look at the comparison between actual data and our predictions. And just to see what this looks like, I'm going to jump over to the completed worksheet. And let's build out that chart so we can see how good of a job this particular version of the model does. All right, seems like we have a bigger gap from this model. So adding more bells and whistles, not necessarily going to be the approach that we're going to want to take. The final sheet of this workbook is a comparison tab between the linear, the geometric models and also the quadratic models that we did not go through in detail. But you can see that adding that hardcore number of buyers segment and/or the linear time trend. Both give us that kind of the same story, big improvement compared to the geometric model on its own. So we do want to make sure we take into account some of those other modeling elements. If we look at the cumulative. From the different models, the blue is actual data. Looks like the closest that we end up getting is our geometric with the hardcore never buyer segment. And yeah, we can't see it but virtually identical to the results produced by that linear time trend, all right? So lots of different modelling options. But hopefully we see how flexible Excel can be as a tool for being able to estimate these different alternatives.