All right in this session we're going to dig a little bit deeper into the idea of forecasting. When I say dig deeper I actually mean you know we're going to drill down and look at predicting behavior when it comes to individual customers. And that's what customer analytics is about. So the promise of marketing is I'm going to be able to deliver the right message to the right customer. Well, we want to be able to influence the behavior of individual customers. Which customer gets a particular coupon, the depth of the coupon, which customer do I send which message to? In order to answer those questions we really have to have an understanding of not just the average behavior of customers, but the behavior of individual customers. And that's what we're going to be able to look at with some simple models that can be built within Excel. All right, so where's this idea of customer centric analytics going to come into play? Well, the behavior of your of your existing customers, behavior of prospective customers. How are they different from each other? What's the right message to send them? When is the right time to send that out? If we need to know what products should we launch, well that requires an understanding of the customer segments that are in the marketplace. Not just the average customer, but how many different types of customers are there, how many types of customers are there of each? If I want to look at effectiveness when it comes to promotional activity, promotions aren't going to be equally effective for everyone. Some customers, they're going to have absolutely no impact on, others customers are going to be very effective. And ultimately, if we're looking to make allocation decisions, if I have a fixed marketing budget, how do I allocate resources across all of my customers and prospective customers? So if we look at the toolbox that's available to us, the statistical methods that we have, the right model is really going to depend on the type of data that we're looking at. We've got truly four different I've included continuous data in this set. But what kind of customer level data might we have? Well, we're going to have a lot of choice data. I'm choosing between Coca-Cola and Pepsi. I'm choosing between AT&T, Verizon, and Sprint. I'm making the choice in that context. We might have count data. How much am I going to purchase? How much quantity am I going to purchase on a particular shopping occasion? We might look at timing data. When do I become a customer? How long do I stay as a customer? How long is it between visits to my favorite website? How long is it between purchases? All of those are timing or duration observations. And if I were to combine multiple pieces of data, might be, let's say choice, and count data. Well, I choose a brand, and then how much of it do I buy? Well that would be multivariate data. Or, I have interpurchase times, and then how much do I buy? Combining count and timing data. Well those are the different types of data sources that we might encounter and all of those are going to have different methods associated with dealing with them. In terms of marketing we might be interested in ROI, understanding marketing effectiveness. We might be interested in understanding clickstream behavior of customers online and targeting advertisements at them. If you're looking at loyalty programs or social media activity, all of this is producing individual level data. It is not just being produced at the level of how many total purchases do we have, but it's who are the individuals conducting these purchases? And as we have access to those individual histories, that's what's going to allow us to make those individual forecasts. Conduct customer evaluation at the level of the individual customer. So, let me give you a brief refresher from what we talked about last class with our forecasting models and building out regression models. We came up with a prediction, our Y variable is our prediction. So our prediction Y set on a base of predictors. X1, marketing activity 1, 2, however much marketing activity that we have. And that's our best guess but we don't always observe our best guess and that's where the error term comes in. Well, the error term is the difference here, epsilon, between what did I predict, Y hat, and what did I actually observe? And we make the assumption when we're running regression models, specifically, when we're running linear regression models that that error term follows a normal distribution. That that error follows that bell-shaped normal distribution. Another way that we could look at this would be to say my observation Y itself, follows a normal distribution, with a particular expectation mu, and that's my best guess, mu. And that's what's giving me my regression equation. That's where my X variables come into play. Well, that's assuming that everything follows a normal distribution. When we are dealing with customer level data, then normal distribution isn't necessarily going to be appropriate if I'm dealing with choice data or count data or duration or timing data. Using that normal distribution just doesn't make sense because that's not what the data ultimately looks like. But just like when we're conducting linear regression, we're going to start by building up a model based on the data that we observe. And we're going to refer to this as the likelihood function. So if I have N different observations, y1 through yN, and I make the assumption that they all follow a particular distribution. Now when we're doing linear regression, we assume that they follow a normal distribution. But what we're going to ultimately look at is when we estimate parameters, the question we're asking is, how likely, under a given set of values for those parameters, how likely am I to observe the set of data that I did? And what we try to do is make that likelihood, make that probability, as big as possible so we're going to maximize the likelihood of observing our data. That is, we're going to choose the right values for our coefficients or our parameters that makes observing the data as likely as possible. All right, so if our likelihood function is a product of the likelihoods associated with each observation, we're going to choose the parameters, theta, that make observing the actual data, y1 through yN as likely as possible. All right, so let me just give you this example using the normal distribution to give you a sense that we've been doing this all along, we just didn't know it. So linear regression is actually maximum likelihood estimation. So, if we had a set of data x1 through xN that we said came from a normal distribution. Well, that is the likelihood expression that goes along with N data points from a normal distribution. And if I want to find the value of mu to make that as likely as possible, well, I'm going to take the derivative and set the derivative equal to 0. And so now, from a computational standpoint, rather than trying to maximize the likelihood, which is going to, in most cases, end up being a very, very tiny probability that computer programs can't distinguish from 0. Well, we're going to employ a mathematical trick here. Rather than maximizing the likelihood, we're ultimately going to maximize the logarithm of the likelihood. It turns out, it does not going to change our results, just changes the scale that we're working on. So if I take the log off the likelihood function, this is the equation that I have. Now let's take our first derivative and find the value of mu that's going to maximize that log likelihood. Well, taking our first derivative, and this is our first derivative, that we're setting equal to 0. And now we're just solving for the value of mu. And what we end up with, really your maximum likelihood estimate for the mean, is the sample average. So, if you've got data that you believe follows the normal distribution, your maximum likelihood estimate is no different from just taking the sample average. So, in a lot of cases what we think intuitively is going to line up with that maximum likelihood estimate. And we can also then look at what's the variation, what's the uncertainty around those estimates. All right, so let's drill down. And again, we're going to focus today on two types of data very common within marketing, choice data, and timing data. Now, within Excel we're going to talk a lot about binary choices, yes-no outcomes. The technique that we're going to be using is going to generalize to those multinomial options. When I have three, four, five different options and I'm picking one among that set of options. So we'll look at the techniques for choice decisions. We'll also look at models that are specific to duration data. So, choice data, very common throughout marketing. I've put together here some examples of choices that customers might face. Do you buy a particular category on a shopping trip? Do you buy a particular brand on a shopping trip, yes or no? Did you acquire service, yes or no? Did you keep service, yes or no? Did you decide to file a complaint with the company, yes or no? Any time we're categorizing things into yes or no, brand a versus brand b, we're talking about a binary choice. So it's going to be a very common type of marketing data for us to deal with.