0:05

All right, so we've talked a little bit about binary choice

Â modelling using Excel as our software package.

Â We're going to still use that choice modelling approach as our foundation.

Â But let's look at how we would apply that to timing data,

Â or to build duration or timing models for use.

Â So a couple of different places that duration data's going to show

Â up in marketing.

Â A lot of it shows up in services research.

Â So looking at how long until a prospect takes action, looking at the impact of

Â customer satisfaction, the impact of service encounters on customer retention.

Â Anything looking at customer retention really relies on these duration models.

Â In some work, on some behavioral research, we've looked at response latency.

Â How long, after you're exposed to a stimuli,

Â does it take you to respond accordingly.

Â When we talk about new product forecasting,

Â we're going to look at a dataset a little bit later on.

Â That says, based on the behavior of individuals,

Â based on how long it took them to acquire the product,

Â can we forecast how much of the population ultimately going to try this product?

Â And ultimately customer-based analyses,

Â we're trying to build models that allow us to get a customer evaluation, identify

Â which of your customers has lapsed, which of your customers are still active.

Â All of these are built on duration data, built on those interactivity times.

Â And we're going to use that choice model framework to build discreet timing

Â models today.

Â 1:42

All right, so let me give you an example of where this comes into play.

Â So this is based on some published research where

Â it looks at a cohort of customers who began service

Â at a telecommunications provider at the same time.

Â And we're trying to forecast out how many of those customers are left.

Â And so if we look seven months out, looks like we're around 50%.

Â Well, what does this curve look like in month 8, 9, 10, 11, and 12?

Â Can we forecast that out?

Â Now, one simple approach would be to say, all right, well,

Â let's run linear regression.

Â Linear regression, as we're going to see in a second,

Â is going to have a problem to it.

Â Because linear regression says, I'm going to fit a line to months zero

Â through seven, and the trajectory of that line is going to continue.

Â There are a number of different functional forms that we can try.

Â 2:47

but what happens in month eight?

Â What happens in month nine?

Â Well, it turns out that in a lot of those cases,

Â those forecasting models don't do a very good job.

Â We have some models that even though we're dealing with survival curve projections,

Â start to go up again because of the functional form that was chosen.

Â We have others like the linear model, that keep on going down at that same rate.

Â And so, none of these do a particularly good job of being able to

Â forecast out what customer retention looks like in the future.

Â Even though all of them have good R square values,

Â none of them did a good job at forecasting the future performance or

Â the future decisions of this cohort of customers.

Â So can we build something based on a simple model, and when we start putting

Â all of the pieces together, allows us to get very good forecasts?

Â Well, that's going to be the goal.

Â All right,

Â a couple of things to keep in mind when we're dealing with timing models.

Â And this does not come up in any of the other forms of data that we

Â had talked about.

Â Well, for timing models,

Â we only observe actions taken during a specific period of time.

Â So for example,

Â let's say that we're looking at customer retention for a 12 month period.

Â Well, we observe all the customers who dropped service

Â during that 12 month period.

Â We also observe a set of customers at the end of 12 months who still have service.

Â Well, that issue is referred to as right-censoring, all right?

Â That I observe data during this particular window, 0 to T.

Â What happens after T?

Â I have no idea.

Â Left-censoring is a different problem,

Â we've only observed beginning at a particular point in time.

Â We observe everything that happens after that.

Â Well, we don't get to observe what happened before that.

Â So suppose that we're looking at a queue.

Â People lined up at a customer service window, and

Â we have some people who are in that line and we know what time we got there.

Â We have no idea what time the people who were there before us got there.

Â So we have a minimum guess for how long they've been there but

Â we don't know the exact time, that's the issue of left-censoring.

Â Interval-censoring, we know that something happened in a particular interval of time.

Â Let's say within a particular hour or within a particular 15 minute chunk,

Â but we don't have it down to the exact second.

Â That's going to be more common for us to have to deal with,

Â just in terms of the nature of the data that's coming in.

Â If you're dealing with clickstream data, that's something that we might have.

Â We might intentionally group observations together into

Â more coarse units to simplify our analysis.

Â And in fact, the examples that we're going to be looking at,

Â that's what we're going to do, is we're going to assume

Â discrete intervals of time rather than continuous time.

Â But general timing models, you can account for all of these forms of censoring.

Â 5:52

So let's begin by building as basic a model as we can.

Â And that is let's assume that in a given month,

Â customers have a probability of theta of cancelling service.

Â Well, the flip side of that,

Â which means that they're going to keep service with a probability of 1- theta.

Â All right, so each month, customer makes a decision.

Â But for month t, for that customer to make the decision, that means that he

Â had to survive, he had to decide to keep service for the first t minus 1 months.

Â All right, so what's the probability that a customer drops service in month t?

Â All right, well for month t, there's a probability,

Â theta, but what about months 1 through t- 1?

Â Well, that customer had to keep service in all of those months,

Â so we're going to multiply it by probability (1- theta) for

Â keeping service, raised to the power of t- 1.

Â So that gives us the probability that a customer drops service in month t.

Â 7:02

And if we were to look at the other possibility for the data that we're going

Â to observe, either we observe the month in which customers do drop service,

Â and that's our likelihood function for those data points.

Â The other possibility is that customers are right-censored.

Â They keep service until the end of our observation period.

Â So if the length of my observation period is t,

Â some customers are going to keep service for that entire length of time, all right?

Â So those are the ones who still have service at the end of our

Â observation period.

Â This is what's referred to as a shifted geometric distribution,

Â the shifted part because there is no t equals 0 in this distribution.

Â So we can estimate this model, actually using any statistical software package.

Â We're going to do this using Excel, using the Solver tool that's built in.

Â But for every data point we have,

Â we can specify the likelihood associated with that datapoint.

Â So for the customers who drop service in month 1, in month 2,

Â in month 3, all the way through the t minus 1 month of our data period,

Â and even the t-th month of our observation period.

Â This is the formula that describes their likelihood.

Â For that set of customers who hold onto service and

Â didn't drop it by the end of our observation period,

Â this is the likelihood that we're going to use for those customers.

Â 8:48

So between 0 and 1, how many customers did we lose?

Â Well, we're going to have a difference of 131 customers,

Â who got rid of service after one month.

Â All right, what about between months 1 and 2?

Â We go from 869 to 743, well, we've got 126.

Â Customers who dropped at the second op,

Â at the second possible time that they could have.

Â Then we keep on going down, we're going to be looking at these differences.

Â The next group, we've got 90 customer or 90 of our remaining

Â subscribers who cancelled service at that third option, all right?

Â And then we've got at the end of 7 months,

Â here's our 491 customers who still have service after month 7.

Â All right, so for all of the ones who dropped service,

Â we know the likelihood associated with them dropping at a particular time.

Â For this 491 customers, there's a different likelihood function

Â associated with them because they survived all 7 months.

Â All right, so we're going to head over to Excel,

Â use that to estimate what is that probability.

Â