In this session, we're going to dig a little bit deeper into the idea of forecasting. When I say, dig deeper, I actually mean, we're going to drill down and look at. Predicting behavior when it comes to individual customers and that's what customer analytics is about. So, the promise of marketing is I'm going to be able to deliver the right message to the right customer. Well, we want to be able to influence the behavior of individual customers, which customer gets a particular coupon, the depth of the coupon? Which customer do I send which message to? In order to answer those questions, we really have to have an understanding of not just the average behavior of customers, but the behavior of individual customers ad that's we're going to be able to look at with some simple models within Excel. So, where is this idea of a customer-centric analytics going to come into play? Well, the behavior of existing customers, behavior of prospective customers. How are they different from each other? What's the right message to send them? When is the right time to send that out? If we need to know what product should we launch, well, that requires understanding of the customer segments that are in the marketplace. Not just the average customer, but how many different types of customers are there? How many types of customers are there of each? If I want to look at effectiveness when it comes to promotional activity. Promotions aren't going to be equally effective for everyone. Some customers, they're going to have absolutely no impact on, other customers are going to be very effective. And ultimately, if we're looking to make allocation decisions, if I've a fixed marketing budget, how do I allocate resources across all of my customers and prospective customers? So if we look at the toolbox that's available to us, the statistical methods that we have, the right model is really going to depend on the type of data that we're looking at. Now we've got truly four different, I have included continuous data in this set, but what kind of customer level data might we have? Well, we're going to have a lot of choice data. I'm choosing between Coca-Cola and Pepsi. I'm choosing between AT&T, Verizon and Sprint. I'm making a choice in that context. We might have count data. How much am I going to purchase? How much quantity am I going to purchase on a particular shopping occasion? We might look at timing data. When do I become a customer? How long do I stay as a customer? How long is it between visits to my favorite website? How long is it between purchases? All of those are timing or duration observations. And if I were to combine some multiple pieces of data might be, let's say, choice and count data. Well, I choose a brand and then how much of it do I buy? Well, that would be multivariate data or I have interpurchase times and then how much do I buy combining count and timing data? Well, those are the different types of data sources that we might encounter and all of those are going to have different methods associated with dealing with them. In terms of marketing, we might be interested in ROI, understanding marketing effectiveness. We might be interested in understanding clickstream behavior of customers online and targeting advertisements At them. If you're looking at loyalty programs or social media activity, all of this is producing individual level data. It's not just being produced at the level of how many total purchases do we have, but it's who are the individuals conducting these purchases? And as we have access to those individual histories, that's what's going to allow us to make those individual forecasts. Conduct customer evaluation at the level of the individual customer. So, let me give you a brief refresher from what we talked about last class with our four testing models and building out regression models. We came up with a prediction, our Y variable is our prediction. So, our prediction Y based on a set of predictors. X1 marketing activity 1, 2, however much marketing activity that we have. And we said that's our best guess, but we don't always observe our best guess and that's where the error term comes in. Well, the error term's the difference here, Epsilon between what did I predict, Y hat and what did I actually observe? And we make the assumption when we're running regression models, specifically when we're running linear regression models that, that error term follows a normal distribution that, that error term follows that normal bell-shaped distribution. Another way that we could work out this would be to say, my observation Y follow itself, follows a normal distribution with a particular expectation mu and that's my best guess is mu and that's what's giving me my regression equation. That's where my X variable come into play. Well, that's assuming that everything follows normal distribution. When we're dealing with customer level data, the normal distribution isn't necessarily going to be appropriate if I'm dealing with choice data or count data or duration or timing data. Using that normal distribution just doesn't make sense, because that's not what the data ultimately looks like. But just like when we're conducting linear regression, we're going to start by building up a model based on the data that we observe and we're going to refer to this as the likelihood function. So if I have N different observations, Y1 through YN and I make the assumption that they all follow a particular distribution. Now when we're doing linear regression, we assume that they follow a normal distribution. But what we're going to ultimately look at is when we estimate parameters, the question we're asking is how likely under a given set of values for those parameters, how likely am I to observe the set of data that I did? And what we try to do is make that likelihood, make that probability as big as possible. So, we're going to maximize the likelihood of observing our data. That is we're going to choose the right values for our coefficients or our parameters that makes observing the data as likely as possible. So if our likelihood function is a product of the likelihood associated with each observation, we're going to choose the parameters theta that make observing the actual data, Y1 through YN as likely as possible. So, just let me just give you this example using the normal distribution to give you a sense that we've been doing this all along. We just didn't know it. So, linear regression is actually maximum likelihood estimation. So if we had a set of data, X1 through Xn, that we said came from a normal distribution. Well, that is the likelihood expression that goes along with n data points from a normal distribution. And if I want to find a value of mew to make that as likely as possible. Well, I'm going to take the derivative and set the derivative equal to zero. And so rather now from a computational standpoint, rather than trying to maximize the likelihood which is going to in most cases, end up being a very, very tiny probability that computer programs can't distinguish from zero. Well, we're going to employ a mathematical trick here. Rather than maximizing the likelihood, we're ultimately going to maximize the logarithm of the likelihood. Turns out that it's not going to to change our results, just changes the scale that we're working on. So if I take the log of the likelihood function, this is the equation that I have. Now, let's take our first derivative and find the value of mu that's going to maximize that log likelihood. Well, taking our first derivative and this is our first derivative that we're setting equal to 0. And now, we're just solving for the value of mu. And what we end up with, really your maximum likelihood estimate for the mean is the sample average. So if you've got data that you believe follows a normal distribution, your maximum likelihood estimate is no different from just taking the sample average. So in a lot of cases, what we think intuitively is going to line up with that maximum likelihood estimate. And we can also then look at what's the variation? What's the uncertainty around those estimates? So, let's drill down. And again, we're going to focus today on two types of data very common within marketing, choice data and timing data. Now within Excel, we're going to talk a lot about binary choices. Yes, no outcomes. The technique that we're going to be using is going to generalize to those multinomial options when I have three, four, five different options and I'm picking one among that set of options. So, we'll look at the techniques for choice decisions. We'll also look at models that are specific to duration data. So choice data, very common throughout marketing. I've put together here, some examples of choices that customers might face. Do you buy a particular category on a shopping trip? Do you buy particular brand on a shopping trip? Yes or no? Did you acquire service, yes or no? Did you keep service, yes or no? Did you decide to file a complaint with the company, yes or no? Any time we're categorizing things into yes or no brand A versus brand B, we're talking about a binary choice. So, it's going to be a very common type of marketing data all for us to deal with.