The Zodiac experience was great, not only commercially, but from a research standpoint as well, and again it's just wonderful when the research you're doing can drive commercial application. But the commercial applications themselves say, "Wait a minute. We've got to change the models." Now, we didn't change them drastically. Again, what you've learned, what you've experienced, hopefully would do at a chance to practice in the course so far. That is the heart and soul of what Zodiac was and some of the work that I'm going to be talking about as we move ahead. But there were certain bells and whistles that are missing from the basic models that we found out that we needed. Give you a specific example. When I look at the sales pattern for a company, if we look at a cohort of customers who we acquired a certain time, and we watch them making their repeat purchases, we see the same basic patterns that we saw before. That for that cohort of customers, their sales, their donations, whatever activity we're talking about, just tend to go down over time. Now, it can go up and down. Either randomly, sales could go up for a while, or maybe it's seasonality, or maybe there's some change in competition, or maybe there's some promotion or marketing activity. It turns out that for a lot of organizations, it's not just a slow drop in this buy-till-you-die world, but there's a lot of peaks and valleys. Now, when I run the models and most of my academic work, I'd say, don't worry about that. That's just noise, no big deal. All we want to do is to capture that basic baseline of how sales for the cohort are going to decline over time, and then project that way out into the future to have a sense of what the lifetime value will be for each one of those customers, or for the cohort as a whole. It turns out that life is a little bit more complicated. For instance, like I said, I've ignored all those peaks and valleys, but then we came up with a way to accommodate them. Number 1, in some cases, you just couldn't ignore them. In some cases, the sales patterns were so messy that we needed a way to pull out some of those spikes and account for some of those valleys. For some companies, seasonality, Christmas season is so strong that it would be a big mistake to ignore it, you can't just assume that all of a sudden customers are just waking up and loving the product, then going back to sleep again. It turns out that once you bring time-varying covariates like seasonality, like marketing activities, like competition or macroeconomic factors, once you bring them into the model, there actually are some changes. You see, the basic models, the ones you've experienced so far will continue to forecast very well in the aggregate. But if you want to go a step further and start to make statements about different customers, remember Bob and Sarah, Mary and Sharmila, remember all that stuff? If we want to make statements about individual customers, and we ignore a lot of these issues that I'm talking about, it turns out that the model will stretch out the heterogeneity, the difference across the customers, too much. It'll make the good customers too good. It'll make the bad customers too bad because the model isn't able to capture those differences over time. So it's actually quite important to bring in time-varying covariates, and we've done that. It turns out there is a bunch of academic work other people have been attempting to do the same thing. On the one hand, maybe it doesn't matter. If you have a big issue, seasonality, and so on, that as long as you're accommodating those things, you're okay. Although we find that there are slightly better ways to do it. For instance, without getting into too much of the technical details, we found the Zodiac ways that we could bring in those time-varying covariates in a way that really wouldn't hurt the computational aspects, so we can continue to run the models really fast and on the same data inputs as we had before. So there are some subtle technicalities to care about, but the big thing is dealing with those time-varying covariates. But wait, there's more. For instance, another aspect of the model, if you remember the stories that we've been telling, is that people flip their coin, heads I buy, tails I don't, and that next period, I flip the same coin. But there might be linkages from one period to another. There might be things like regularity, that for instance, let's say we're talking about grocery shopping, that I always do my grocery shopping on Tuesday. So I could bring in a spike saying, "Tuesday is a special day, your propensity is much higher on that day." But it could be different for different people, and it might not be tied to a particular day of the week. It might be that I do my grocery shopping roughly every week. So even if I went last Thursday, that means I'm going to go this Thursday, and then I might change from one week to another. So there could be some regularity there. That there tends to be this fairly regular cycle that exists over time. Of course, the length of that cycle and just how committed people are to following it might also vary from people to people, or on the other side of it, there could be something that Eric Bradlow, who you know from another module in this course, has done a lot of research to talk about the idea of clumpiness. For instance, we worked on a project with a streaming television company, and of course, as many of you know, there's a lot of binging going on when you're talking about streaming TV. If you watch some content on that channel today, there's a very good chance that you're going to watch it again tomorrow, much higher than your own buy or not buy, a coin might suggest. So you tend to do things in clumps, and then you let go for a while, and then a new clump starts up. Clumpiness and regularity are, dare I say, opposite sides of the same coin. Sometimes it's beneficial to capture those as well. Sometimes it's absolutely necessary. In other cases, like I said before, with respect to covariates, if you leave them out, the model tries to pick them up in other places, and this might mess up your forecast, mess up your inferences, mess up the marketing decisions that you might make as a result. Clumpiness and regularity would be another bell or whistle that we might consider. Number 3, I mentioned many times now about how we build these models at the cohort level. We look at a group of customers who we acquired at one point in time and watch them over time. Well, one of the important things that we'll do is we'll look for cross-cohort effects. We can fit the model separately for the customers we acquired in January versus February versus March versus April. Wouldn't it be ideal if all those cohorts are very similar to each other? In which case we can get one set of parameters to describe each and every one of them and then say, you know what? Before the May customers are acquired, we know what they're going to look like too. Life doesn't work like that and very often there are cross-cohort dynamics and very often they tend to go in one direction. Generally, they tend to get worse. If you think about it, when you start a company or you launch a new product, those customers you acquire are amazing, they're great. They're the ones who are lining up around the block to buy that thing. Then the next group, slightly worse, group after that, slightly worse. So we'll often see some degradation, not just within the cohort over time as you've already seen, but when we move from one cohort to the next, to the next. In other words, those downward curves will become a little bit steeper and will go down a little bit lower from one cohort to the next. So it's very important to capture and project cross-cohort effects. When I'm working with companies, very often I'll have startups coming to me and saying help me out and I'm saying, I'm not sure I can work with you because the models are going to be wonderful, terrific even if I'll have only one or two cohorts of data. If I want to make long-run statements about the health of the customer-based years from now, is important for me to say, 10-12 different cohorts. So I can play, connect the dots, and really understand those differences across the cohorts and start to make statements about what I think that next cohort is going to look like even before it's born, even before we acquire them. To the extent that we do something differently in our acquisition activities to try to come up with better customers, we have this natural baseline of what we think they're going to look like. We can see how well those new different acquisition activities are doing, not relative to the customers we acquired last time, but relative to our best guess of the kinds of customers we would have acquired this time if we hadn't done those kinds of interventions. Besides tweaks to the model and there are a few others as well, but again, all of these things are layered on top of the basic body die and there's plenty of academic work out there, very happy to share with you. Last thing is the data inputs themselves. If you remember from what I've covered with you so far, we had data on each and every one of the customers making their decisions at each point in time. But sometimes we can't get it that way. Sometimes either the company isn't able to collect or store the data at that level. Sometimes it's just too much data and so they want to find trim down ways of using it. Sometimes there's regulatory issues like over in Europe, there'd be the GDPR. Here in the US and California, at least there's the CCPA, that prevents companies from saving and leveraging data at that kind of granularity. So there's a host of different technical, legal, practical issues that keep companies from using that full blown data structure. Again, I've lots of research as well as some other people have worked in this area to say, how can we run our same body and die model but using different data inputs? In many cases, we'll just get limited aggregate data. The beautiful thing about our models, and we've discussed this a little bit in some of the earlier sessions, is that we could then not only say what's the probability that you're going to make a purchase at that particular time, but we can roll it up and make statements about what percent of our customers will be active. What percent of our customers will make three or more purchases? What percent of our customers will be inactive for a while and then active again? Basically any kind of statement you can make, we could come up with a formula for it. If we get data on any one of those things like how many customers make three or more purchases or something like that, then we can match up that observed data to the formula from the model and as long as we have enough of these different data points, then we can match them all up together and then pull out the parameters of the model. Some of the research that I've done, we found that if you have the right data points, then our ability to pull out the parameters of the model, the buying and dying propensities, and how they vary across all the customers, we can do it almost as well as if we had all the raw data in the first place. It gets to the point where you start to say, I don't even need all that raw data, I'm good. Just give me the right kinds of rolled-up metrics. It's going to be so much easier for me to run the model. It's going to be so much easier for me to make comparisons across different groups of customers. It's going to be so much easier for me to link these kinds of aggregations to different kinds of managerial decisions. So in some sense, running the, let's say, model, the simplified data inputs can often be very advantageous. Its scale is much better, has really good commercial implications, as well as a really interesting academic challenge. I just want to point out that the models you have seen are great, they're terrific and a lot of the time we're going to use exactly the kinds of models and that's it. But there are other times when we need to bring in some of these bells and whistles and of course, as a busy academic, all the time I'm thinking about new ones to bring in, new data structures and of course, new questions that we can answer as well. We'll get back and talk about some of those questions in a moment.