0:06

All right, what we're going to be doing in this exercise is building a model for

Â retention.

Â And so you'll see a couple of different tabs in this file.

Â We've got a working tab that we're going to be using to build our model.

Â We have another tab for this geometric model that we're building that's complete,

Â that I've annotated already.

Â And then the other model that we're going to use or

Â that we're going to create is one that allows for a time trend.

Â So I'll explain that difference between these two models as

Â we get a little bit into things.

Â The final tab has a comparison between the performance of these different models.

Â So, one of the basic building blocks is the geometric distribution for

Â customer analytics.

Â The idea in the geometric distribution is, think of it like a coin flip,

Â 0:54

each time period, there is a probability that the event happens.

Â So, in our case the even is churn or retention.

Â So in Month 1, there's a probability that you churn.

Â In Month 2, if you haven't churned yet, there's a probability that you churn.

Â In Month 3,

Â if you haven't churned yet, there's a probability that you're going to churn.

Â The assumption with the geometric distribution is that it's

Â always the same probability, so it makes for a very simple model but

Â it's also one that's a little bit restrictive.

Â So we're going to start by estimating this model using Excel,

Â estimating what is the retention or churn probability.

Â 1:39

Comparing actual to expected data.

Â And then we'll see if that's not enough, do we have to do a little bit better?

Â What can we do as far as allowing that churn probability or

Â retention probability to vary from one time period to the next, so

Â making the model a little bit more realistic.

Â 2:09

continue to have service in each time period out of this cohort.

Â What we've calculated in Column C and, let me zoom in so

Â that what we're doing is a little bit clearer.

Â What we've done in Column C already is taken the difference between the number of

Â individuals who had service in the previous period minus the number of

Â individuals who have service in the current period and that tells us how many

Â individuals cancelled service or how many individuals churned.

Â And so we're going to model the probability for

Â each of those individuals who have churned of doing so after one month,

Â after two months, after three months, after four months, and so forth.

Â So that's what we're going to do in column D is calculate out that probability.

Â We're then going to calculate the log likelihood

Â 3:16

Well, we'll take into account the individuals who have not yet

Â cancelled service and calculate our sample log likelihood.

Â And we're going to use that to say let's find the value for P.

Â In this case, we'll set this up, again,

Â we can do this either as a retention or a churn probability, doesn't make

Â a difference, but let's say I want to set this up at as a churn probability.

Â What is the value that maximizes the observed data?

Â All right, so let's begin by specifying what's

Â the probability that an individual cancels service after one month.

Â All right, well, if we're going to specify in terms of a churn probability,

Â 4:33

All right, well let's write out these probabilities first and

Â maybe we can see a pattern that emerges.

Â All right, so it's going to be one minus our churn

Â probability to say I survived one month and then churned.

Â 5:23

so when it's, I survived one month,

Â that's one minus the churn probability times the churn probability.

Â So I survived one month and then I churned.

Â In the next cell it was, I survived two months.

Â 5:43

So that's why we have it squared and then I churn.

Â Well, in the next one is going to be I survived three months and then I churn.

Â So all that's going to be changing in the next cell is going to be instead of

Â raising the one minus churn probability to the power of 2,

Â it's actually going to be raised to the power of 3.

Â And that's the pattern that we're going to see time and time again.

Â So let's actually, go back to sell D3, and rewrite this a little bit.

Â And the way that we can rewrite this in a more general fashion is,

Â there's the churn probability multiplied by

Â 1 minus the churn probability, so that's going to be the retention probability.

Â 6:43

So what does that look like if we drag?

Â So I survived t -1 periods.

Â So in this row, I survive zero and then I churn.

Â In row four, it's I survived, If we drag this formula down,

Â 7:07

All right, so in row 4, it's I survive A4- 1.

Â So I survive one period, and then I churn.

Â In the next row, it's I survive A5 so 3- 1.

Â So I survive two periods and then I churn.

Â So the 1- K1 is my retention probability raised to the power for

Â how many periods am I retained, and then it's my churn probability.

Â All right, so that's the geometric model.

Â 7:37

Now in terms of specifying the log likelihood, well,

Â we have 131 individuals who exhibited this behavior.

Â So, to construct the contribution to the log likelihood,

Â it's going to be the number of individuals who exhibited this behavior,

Â multiplied by the natural log of the probability.

Â 8:01

And we're just going to copy that formula.

Â Now the reason we're using natural log is that using the raw probabilities

Â we're going to end up with a number so small because it would be 0.5 for

Â each of these 131 individuals, that'd actually 0.5 raised to 131st power.

Â That's an incredibly small number.

Â The more individuals we have, this joint probability becomes so

Â close to zero that the computer can't actually differentiate it.

Â So that's why we use a mathematical trick, and

Â instead of maximizing the likelihood we maximize the log likelihood.

Â Now, what we've done in cells E3 through E9,

Â we've taken into account all the people we observed to churn.

Â What we also need to take into account is the number of people

Â who have not yet churned.

Â All right, and so if we look at all the individuals who have

Â been observed to churn, that's 509, if we just look at that summation,

Â we still have 491 individuals who were not observed to have churned.

Â 9:09

So these are individuals who survived seven time

Â periods without churning so let's take that likelihood into account now.

Â It's going to be our 491 individuals in B9 multiplied by,

Â now what's the probability associated with

Â not having churned by this particular time period.

Â Well a couple ways that we could do this we could say it's the retension

Â probability.

Â 1- churn probability raised to the seventh power.

Â We could also write it as,

Â it's the probabilities that we haven't enumerated yet so

Â it's one minus all of the probabilities that you have churned.

Â 10:34

So it doesn't matter which way you specify, we end up with the same result.

Â All right, and so what's the log likelihood of our sample?

Â Well, we're just going to take the sum of the log likelihood contributions from

Â each individual in column E.

Â 10:50

All right so it's a big number that doesn't make a whole lot of sense to us.

Â But that's the number that we want to make as large as possible,

Â so we're going to maximize that value.

Â And to do that, let's see, we're going to go under,

Â looks like we have not added our analysis tool pack in yet, and

Â Solver, so let's make sure that we have those tools enabled.

Â So, Options> Add-ins, and we're going to manage our Excel add-ins,

Â and we're just going to make sure that we've put Solver on here, all right.

Â 12:20

And we're done, right?

Â So our churn probability is about 10%,

Â which means we've got a retention probability of about 9%.

Â And so column D what it now gives us is we can calculate out,

Â out of the original 1,000 people.

Â Under this model we would've expected to see about

Â 10% churn in that first month 9% churn the next

Â month 8% after that 7% after that and so forth.

Â 13:28

The observed probabilities that we had, and so

Â the probabilities that we calculated for churn in a particular time period.

Â And let's take a look at how well [INAUDIBLE]

Â we underpredict churn in the first couple

Â of periods and then it looks like we're overpredicting the amount of churn.

Â So even in this calibration period, we're not doing too well, all right?

Â 14:46

This the actual, and we'll call the other series Expected,

Â and this is, so under this particular model notice what we're doing is we're,

Â we under predict churn early on, over predict it later on.

Â There's no way that this model is going to work well from a forecasting standpoint,

Â because it's not even doing that well or

Â in the calibration period we're essentially splitting the difference.

Â We were trying to find kind of that middle ground so

Â we over predict in some periods, under predict in other periods.

Â That's because this is a very restrictive model, right.

Â 15:26

Now, if you're looking for

Â a little bit more commentary on kind of each of the steps that we took,

Â you'll find it on the next page in the spreadsheet, where I've

Â annotated kind of each of the steps that we took along the way, all right.

Â So let's try to relax some of these assumptions.

Â And that is, rather than assuming there's a constant churn probability in each time

Â period, let's assume that there's a time trend, that maybe churn becomes.

Â Less likely overtime or more likely over time, but that it varies over time.

Â All right?

Â So we're going to use the logistic regression that we've looked at previously

Â as the foundation for that and focus on this time trend worksheet.

Â So same data that we had before It's set up a little bit differently.

Â So I'm going to walk you through how we're going to proceed here, right?

Â We're going to calculate out the survival probability first, and

Â then we're going to calculate the difference between the survival

Â probabilities in each of the adjacent time period.

Â So if I was

Â If I maintain service until time one, what's the probability of that happening?

Â What's the probability of maintaining service until time two?

Â If I take that different,

Â that's going to tell me how likely I am to drop service after the first time period.

Â So that's what we're going to calculate here, the like will then calculate

Â the log likelihood for each observe term period.

Â So and then arrogate that up to the sample log likely hood.

Â Notice in this case we have two parameters I'm calling alpha and beta.

Â We're going to treat alpha as an intercept.

Â We're going to treat beta as a slope.

Â We're going to essentially use logistic regression.

Â So alpha is going to be the intercept,

Â beta is going to be the slope for our time trend.

Â Alright?

Â And so let's start out by calculating the survival probability, all right?

Â Well the survival probability in order to make it to the next time period

Â you had to survive until the period before.

Â So we're always going to refer back to the previous period to say

Â 17:48

by the probability of surviving in this current time period.

Â All right?

Â And so we're going to use, I'm going to write this a little bit differently from

Â how we traditionally do logistic regression, we're going to model,

Â we're going to use logistic regression to model tendency to churn.

Â And so typically we're going to have e to the x divided by 1 + e to the x.

Â In this case, if that's our term probability,

Â retention probability's going to be 1 over 1 + e to the x.

Â So what we're going to do is we're going to multiply the.

Â The probability of surviving into the previous period by one divided by

Â one plus the exponential function and it's going to be the alpha.

Â So, our intercept plus

Â our slope multiplied by a time trend.

Â 19:05

So that's the reason that this is going down to point 5.

Â Now what's the next period if I drag the formula down it goes to point 25.

Â And let's take a look at the formula itself, it's D3 so

Â I survive at least until the previous period.

Â Through the end of the previous period and

Â it's multiplied by the probability that I survive the current period.

Â So that's the logic in this.

Â It's I survived through the end of the previous period multiplied by

Â the probability of surviving again.

Â 19:39

All right, now let's use those probabilities to calculate

Â the probabilities that an individual churns in each particular time period.

Â It's the survival based on the previous period minus the survival

Â based on the current period and so we're going to use that logic of taking

Â the difference in the survival function, to calculate out what's the probability of

Â surviving until the present and churning in the present time period.

Â All right?

Â And so, we have those probabilities, this is the probability of

Â churning by the end of the seventh period.

Â Notice that this is just shy of one, because there's some chance that people do

Â not churn by the end of that seventh time period.

Â 20:58

So we can calculate the likelihood contribution for

Â the individuals who we observed to churn.

Â And also, don't forget the people who we observed to maintain service,

Â throughout the entire observation period,

Â 21:51

Okay, so we're going to maximize our objective, cell L3, the log likelihood.

Â We want it to be as big as possible by changing cells L1 and L2.

Â We have no constraints.

Â Alpha and beta could be positive, they could be negative, all right?

Â And so I'm going to check, we're going to make sure this box that says,

Â make unconstrained variables non-negative, make sure that's not checked.

Â I believe the default in some versions of Excel is for that box to be checked off,

Â and don't want that to be the case.

Â 22:30

All right, so alpha -1.55 again

Â these numbers in a raw sense, don't really tell us all that much.

Â Beta being negative, what does that tell us?

Â Well let's look back to our survival probability equation.

Â So as time increases,

Â 23:33

The exponential term gets closer and closer to 0,

Â so this fraction gets bigger and bigger, right?

Â And so churn Is decreasing over time is effectively what we're finding here.

Â Recall back we'd said earlier that

Â 23:57

the way we set up this logistic regression expression was.

Â If we think of the churn event as happening

Â with the probability of e to the x over one plus e to the x,

Â the retention event happens with the probability of one over one plus e

Â to the x, so churn is decreasing in time.

Â So if we want to see how this performs overall, let's set up our calculation for

Â 24:27

how many customers, we can do this in terms, in two ways.

Â The way that we've set this up is for the number of remaining customers, and

Â that's we can use our survival functions to do this.

Â We take our 1,000.

Â We've always got 1,000 customers.

Â And let's just multiply that by the expected survival probabilities

Â 26:24

It looks pretty good.

Â I mean, compare this to what we saw with the geometric distribution and

Â we're doing a heck of a lot better.

Â So, what we've done is, we've allowed the churn probability to change over time.

Â In particular, what we find is that churn is decreasing over time.

Â All right, so it doesn't make sense for

Â us to assume that there's a constant churn rate.

Â 26:46

You'll find on the next tab in the workbook, under the more heavily annotated

Â version of this spreadsheet, you get the same kind of results.

Â And what I've done on the last tab with just giving a comparison and

Â I included data that goes a little bit further out into the forecasting period.

Â All right, so going into the forecasting period, we see that again,

Â we're doing a much better job than if we used this full data

Â period than with the geometric model was capable of.

Â 27:25

So we started with as simple a model as possible,

Â saying let's just assume it's a coin flip.

Â Each time period, there's a constant probability of churn

Â that was our geometric model.

Â And we saw okay, it's not great,

Â we saw that's essentially splitting the difference.

Â But when we look at the overall forecast,

Â we see that the geometric model just doesn't do that well for us.

Â Yes, it captures the overall shape, but

Â sometimes it systematically over-predicting or

Â under-predicting churns in other time periods goes the other way,

Â whereas allowing for the model with time varying churn rate,

Â we'd going to end up with a very much better forecast.

Â Now, one of the things we'll talk about a little bit later in the course, something

Â that's kind of pushing the limits of Excel is what's driving this dynamic.

Â Two potential explanations.

Â One is that individuals are literally changing over time becoming more and

Â more loyal, less and less likely to churn.

Â The other explanation is that the individuals who remain

Â later on are fundamentally different than the individuals who remain early on.

Â So, one explanation is dynamics in the churn probability.

Â The other explanation is heterogeneity across the user base.

Â Ideal model is going to combine both of those factors.

Â So we'll talk a little about unobserved heterogeneity,

Â the importance of heterogeneity later in the course, but again,

Â it's something that is really pushing the limit of what Excel can do,

Â better suited to a computing environment, such as R and MATLAB.

Â But in terms of, yeah, accuracy of the model, just bringing in a simple time

Â trend here, a linear time trend, does much better than ignoring that time trend.

Â And so we are able to pick up some of the dynamics that occur in that churn rate.

Â