All right, so here is the first example that we're going to be working with. We have a direct marketing company that is selling a unique product, the Whatchamacallit. And we observe which customers decide to acquire the product or which individuals decide to purchase this product based on the marketing efforts that were target at them. So we know how much we spent on the targeted advertising. We know the price that it was offered to them at. We know whether or not they sent a coupon and whether or not the company sent out an email. And so what we're going to try to do is look at this set of x variables. That set of x variables, these four x's, we're going to try to tie to the acquisition probability. And that acquisition probability is going to drive the acquisition decision. So the model that we're going to build is going to give us a baseline as well as coefficients for each of these four marketing actions so that we can understand which of these pieces is going to impact that acquisition decision. In this exercise we're going to look at is connecting marketing variables to an acquisition decision. So you see on our spread sheet columns A through D, these corresponds to different marketing variables and each row of this spreadsheet corresponds to the marketing activity that was aimed at a prospect and then column E tells us the result of that acquisition decision. Did the prospect end up becoming a customer? One for yes, zero for no. So as we looked at earlier, we can't use linear regression in this case because our outcome is binary, it's zero or one. So there's no best-fitting line that we can draw given the nature of that outcome. So what we have to do instead is make the assumption that that zero, one outcome comes from a Bernoulli distribution governed by a particular probability and we're going to use logistic regression to develop this particular model. So we're going to do the majority of our work on the worksheet that's labelled LL, that's for log likelihood and let me walk you through the process we're going to use. Same data columns A through D, that contains our x variables. Column E contains our outcome. Did the customer, did the prospect become a customer, yes or no? In column F we're going to calculate the linear combination. And that's going to be comprised of an intercept and the impact of each of the marketing activities. We're then going to translate that into the likelihood that's going to take into account whether or not the prospect became a customer, we're going to do that in column G. We're going to convert that to the log likelihood just to make estimation easier for ourselves. And once we’ve done that we are going to use the solver function within Excel to maximize the log likelihood function and find the values for the intercept, price, email, coupon and advertising coefficients that best correspond to this data. What we're trying to do is make that likelihood, make the joint probability associated with the observed data as high as possible so we want to maximize the likelihood function. So other software tools such as SAS, SPSS are have logistic regression build into them, Excel doesn't have that tool build into it so we are going to create that ourselves. So to start out, what we're going to do is create cells that contain the values for the coefficients that we're ultimately going to estimate. So in K1 through K5, we're just going to put in placeholder values and I'm just going to use zeroes, to get us started. And then we want to write out what's the linear combination corresponding to the marketing activity in that particular prospect. So in cell F2 we're going to specify a regression equation that's going to look very similar to linear regression. We're going to start with our intercept value. We're going to make sure that we use absolute references. Then we are going to incorporate the effect of price. So that's K2 multiplied by the price that this prospect received plus the effect of the email. Again an absolute reference, did this prospect receive an email? The affect of couponing. Multiplied by whether or not the prospect received coupons and the impact of advertising. Multiplied by the units of advertising aimed at this individual. So that's our linear combination. And you'll notice since we had assumed values of zero for all of our coefficients, just as placeholder values, we're getting a zero here. But we're just going to copy this formula all the way down for all of the prospects in our dataset. All right, so how do we take that linear combination and turn that into a likelihood function? All right well, the likelihood's comprised of what's the probability associated with a particular outcome and did we observe that outcome? And so the way that we can write this out is using an if statement. So if acquisition happened. Well the probability associated with the acquisition happening, it's going to be the probability associated with LOGIT model. It's going to be the exponential function to the power of the linear combination divided by one plus the exponential function raised to the power of that linear combination, right? And this is if we look at the numerator and the denominator, always going to be positive, always going to be bounded by zero and one. So it's guaranteed to be a probability. That's if we observe the prospects being acquired. What if the prospect is not acquired? What's the probability corresponding to one minus that logit probability? And we can write that as one divided by one plus the exponential of our linear combination. And we had enough parenthesis to close out that if statement, all right? And again, we can copy that formula all the way down our spreadsheet. And now, if that gives us the likelihood, what's the log likelihood going to be? This is for mathematical convenience. We're going to use the natural logarithm of the likelihood itself. Ideally what we'd like to do is maximize the likelihood that's given in column G directly. The problem being when we have thousands or hundred of thousands of individuals in our dataset, multiplying those individual probabilities together is going to create an incredibly small number that a computer is not going to be able to distinguish from zero. So rather than maximizing the likelihood, we're going to maximize the log likelihood. All right, so we take the logarithm, the natural logarithm of the likelihood for each individual and we can specify our sample log likelihood as the sum of the individual log livelihoods. All right, so we specified the sample log likelihood. All that's left for us to do is use solver to estimate the appropriate values for the intercept price, email coupon, and advertising. And so we're going to do this using the solver feature If you don't have it loaded on the computer that you're using you go under File > Options, click on the Add-ins on that menu. Manage Excel Add-ins, we click on Go and you just want to make sure that Solver Add-in is checked off, all right. So on the data ribbon, we're going to click on solver. And you'll see that solver is going to ask us for three things. So first, what cell contains our objective? That's going to be our log likelihood. Second, it's going to ask us what we want to do with it? In our case, we want to maximize the log likelihood and how do we want to do that? What cells are we allowed to change on the spreadsheet in order to do that? Well, we're going to change K1 through K5. We are not going to impose any constraints. So intercept and the coefficients can be positive or negative in this case. You do want to make sure that this box make unconstrained variables non-negative. You want to make sure that that's not checked off. Now, we're going to be dealing with relatively straightforward problems using Excel, so we're not going to worry too much about the solving method. You can see a couple of different options that Excel has added in. Again, in this particular example, not going to make that much of a difference for us. So we're telling Excel, we wanted to maximize the log likelihood that contained in K7 by changing the values in cells K1 through K5 and we're going to click on the solve button. Let's try this one more time, we'll just use the original estimation method. And there you see it has produced those results for us. Now one thing you do want to keep in mind when you're using solver, it is going to be sensitive to these starting values that you use so you may want to conduct some sensitivity analysis, use different starting values. In some problems as we see In some of our other exercises you may want to impose some constraints to prevent solver from going toward values that are going to produce an error message. So there's a little bit of fine tuning that's going to need to be done but for relatively straight forward problems such as this one, solver is able to get us those coefficients. And so we see the intercept value price has a negative coefficient and that's what we would expect. Higher price means lower likelihood of becoming a customer. Email and couponing have positive effects, as does advertising, all positive effects. So using these coefficients we could then plug in different values for our marketing activity and estimate what is the probability associated with a different set of marketing mix activities. Now this solver tool is one that we're going to come back to throughout the rest of the course. We're going to use that for a range of problems as far as being able to estimate these maximum likelihood models.