Welcome back. I hope that Regu's content on regression and related techniques makes sense to you. I hope it's pretty clear now how we can take a bunch of data, from let's say period one, whether it's again, past behavior or marketing activities, competition, whatever. To predict something about period two, whether it's the number of purchases, whether someone stays with us or not. It's really, really important to be able to do that. And fortunately those techniques are common. They're very accessible. You don't necessarily have to have special software. You can do it in something as simple as Microsoft Excel. In fact I'm going to talk about an example where people were doing that kind of thing, and long before they had any kind of the computational power that we have today or even the rich data that we have today. I want to take you back to the late 1960s, the early 1970s, it was the dawn of what today we would know as direct marketing. It really was when a lot of these ideas of customer analytics were born. It was the first time that we really had any kind of granularity about what particular customers were doing. And a desire to know what each and every one of those customers would be doing next, and for how long, and for how much money. And so it became very important for companies to come up with what we like to call KPIs, key performance indicators. Can we look at some indicators of what people had been doing in the past in order to make some accurate statements about what they're likely to do in the future? And again, this is just a natural area to run something like a regression model. And indeed, regression models were used for this kind of purpose. But it wasn't, let's just throw in tons and tons and tons of data. Because part of it is the data was limited. Part of it, as I said, is that our computational power was limited so we had to think very carefully. It was very, very important for us to come up with just a few measures that would be fairly predictive of what customers would be worth in the future. So our forefathers in direct marketing, they basically did the kinds of things we've been talking about here. Let's take our data set, let's chop it into two pieces, let's collect some data from period one. To see which elements of that period one data would be most predictive of what people did in period two. And again, period two would be looking at how many purchases they made or what was the dollar value of those customers. And they ran lots of models to try to find out which bits of data were most predictive. And they'd do it over and over and over again on lots of different data sets, for lots of different products, lots of different geographies, lots of different customer segments. because we wanted to find a few of those explanatory variables that were pretty robust, that time and time again would prove to be predictive. And this is where our forefathers in direct marketing came up with the idea of RFM, Recency Frequency Monetary Value. What they found, time and time again, back in the 60s, early 70s, and we still see true today here in the 21st century is that you can give me these three summary metrics. You give me recency, frequency, monetary value. You tell me the last time that someone made a purchase with me or did some other kind of economically valuable activity. Maybe they took a sales call. Maybe they visited the website. So they did something that suggests that they're going to become a more valuable customer. Generally we're talking about a purchase, so that's R, that's recency. Now tell me about frequency, tell me how many purchases they made or how many economically beneficial activities they did over a set period of time. Let's say the last year or two. And third would be monetary value. And I think that's pretty much self-explanatory. So when they did those economically beneficial activities, what was the overall or the average monetary value of each and every one of them? So if you can give me RFM, recency, frequency, monetary value, I can make a very accurate statement about what that customer's going to be worth in period two. And again, this was one of the first areas where regression analysis was used in marketing. It was one of the first ways for folks in marketing to say, you know what, all that data that we've been collecting, not really sure what to do with it. Whoa, there's real value there, we can really predict stuff and then we can start to change our business to take advantage of these insights about what's likely to happen in the future, not just what happened in the past. So I just want to put RFM out there as just one very nice example of an application, of the kinds of things that Regu was talking about. And now I want to go one step further. So we can run these regression models and we can take whatever data we have. Again, we can start with something as simple as RFM, we can bring in many, many more kinds of measures. Much more complicated, much more interesting and make statements about what's likely to happen in period two. And again, if all you're interested in is making statements about period two, how many purchases are going to happen in the next year? Who's going to churn or not? Then regression type models are fine, in fact, you can't do better than regression type models, or different types of data mining that might be out there. But what happens when you want to go beyond period two? What happens when you want to make statements about period three or period four? Or what happens if you want to talk about something like customer lifetime value? Well we don't want to limit our statements just to what a particular customer is going to do over the next year. But if we want to go out there and acquire customers, if we want to figure out what's the maximum amount that we should be willing to spend on a customer. We can't limit ourselves just to how much they're going to pay us, how much profits we'll get from them in the next period. We need to project that out way into the future. And the problem is, regression-type models are fairly limited at their ability to do that kind of thing. And let me try to explain why. And let's go back to the timeline that I described before. All right, we get all of this data in period one to make a statement about what we see in period two. And we run a regression model to predict sales as a function of visits to the website, usage of social media, marketing activities, everything under the sun. That's great, but what happens if we want to make statements about period three? Well if you all you want to do is make statements about period three, that's not so bad. You'll say wait a minute, wait a minute, wait a minute! I have this data on period two. Instead of using period two as my dependent variable, that's the thing that I want to explain in my regression, why don't I look at period two and get my explanatory variables from it? Why don't I look at the visits to the website, the marketing touches, the RFM. I have period two, so let me take all the period two data now to try to make a prediction about what will happen in period three. And hey, I already ran my regression, so I have my regression coefficients. I have all the outputs. I have everything that Regu was talking about. So let me just jam in my period two data into that regression and make statements about period three. You see? I can predict the future, and that's great. And if you want to go one period out, terrific. But what happens if you want to go to period four? We don't have any data beyond period two. We don't have any x variables from period three, in order to predict period four. What are we going to do there? How far out into the future can we go? The problem with regression type models is that they're limited. That if you don't have any data to use as inputs into the model, then you can't get the outputs. So no matter how long your observation period might be, you're limited as far as how far into the future you can make statements. Now, in many cases, this isn't a problem. For many kinds of decisions that the companies want to make simply being able to make statements about one, maybe two periods out is perfectly fine. In fact, it might say that most decisions are at its perfectly adequate and this limitations of regression aren't going to be a problem. And I agree. But there are times, especially when we want to ask when type questions, or long run type questions, like I mentioned, customer lifetime value already. It's one thing if we want to make a statement about, is this customer going to churn in the next period or not, regression models are going to be great for that kind of thing. But if I want to ask a question instead, when will this customer churn? If they survive through the next period, how many more periods will they survive? Regression won't really work well when we're projecting way outside of the range of data that we had in the first place to run the original model. So if we want to make these longer run projections and I'm going to keep coming back to talk about customer lifetime value as one very, very nice, very, very practical example of something that we're going to want to do over a longer period of time. And today, as firms are talking much more about customer centricity, that we want to figure out who the right customers are, and we're willing to invest in them because they're going to be so worth it in the long run. We need to have some visibility into the long run. We need to be able to make these predictive statements about the long run in order to see if those investments are justified. So there's much more interest than ever to be able to make statements beyond period two. And so I want to talk about a very different kind of modeling approach. It's not nearly as popular as regression models are, but it's not necessarily any more complicated. And as our view to the future goes further and further out, it becomes more and more important to add this other kind of modeling approach to your toolkit. And that's what we'll do next.