I just talked to you a moment ago about price elasticities and that we're going to discuss them this week. But before we discuss with price elasticities, I need you to understand one modeling technique called regression. It turns out that regression is very useful in pricing settings. It's also useful in other business settings as well. At its fundamental core, regression is about the statistical relationship between two variables. And the two variables that we're going to be interested in are price and quantity. Because that's going to allow us to find the relationship between those two very important variables. >> Hi, my name is Yael Grushka-Cockayne. I'm a colleague of Ron's and I'm an Associate Professor at the Darden School of Business. Today, we're going to talk of one of the most powerful statistical tools linear regression. Linear regression can be used for statistical inference. It allows us to understand and study relationships between variables or quantities of interest. For instance, you might be interested in studying the relationship between a movie's box office revenue and its production budget. Or the critic's review of it, or for instance the advertising spend. Linear regression also allows us to make statements about a future quantity or come up with a forecast of an unknown quantity that we're hoping to see in the future. For instance, if we're about to produce a new movie, it allows us to come up with a forecast of what that movie will do at the box office. Let's take a closer look at an example and this will allow us to understand better how linear regression works. Imagine that you're a marketing director at a large retail chain. And imagine that you're interested in studying the sales of your store brand toilet paper, let's say a pack of 36 rolls. To do this, to study the sales, you randomly choose 22 or so equally sized stores and you randomly change the prices in those stores. And for a week, you collect data on the units sold. Once you have the data, you can start on digging deeper and understanding what affects the units sold. To do this, let's take a closer look at the data. In front of us in Excel, we can see the information that we've collected. We can see here that we have 22 stores. We have the prices that were set to our pack of 36 rolls of toilet paper. And we can see the unit sales in each one of those stores during this week. We can also see here a calculation of the average, which comes out to be 505 units. And we can see a calculation of the standard deviation. I will just write these out that this is, for instance, our average and this is our standard deviation just so we keep track. So the question is if I'm trying to come up with a forecast for the next store, how many unit sales will it have of this brand of toilet paper, of our store brand of toilet paper? 505 of the average might be a pretty decent guess. But there is some variability, maybe plus or minus 57. But let's see if I can learn more about our potential sales of toilet paper based on the price. What is the relationship between price and unit sales? The first way to investigate our relationship is to look at a scatter plot. And again, I'm going to use Excel to do this. I'm going to highlight the two columns of Price and Unit Sales, and I'm going to go in and insert a scatter plot. Let me try and make this as large as I can and increase the font. And of course, this spreadsheet is available to you to download if you need to take a closer look. But here, we see in front of us a scatter plot. On the x-axis, we see the prices. And on the y-axis, we see the unit sales. And we see the cloud of points are 22 stores. It seems like the cloud of points is declining from the upper left down to the lower right, does this make sense? Well, it means that with an increase in the price, I have a lower number of unit sales. That makes sense. We can also imagine that this relationship might be quite linear. Meaning it can be described by a line, a straight line that cuts through the cloud of points. In Excel, I can quickly add a trendline here to my chart. I will add a linear trendline here to my chart and maybe hopefully you can see this line. It's a dotted blue line, and that trendline gives me a sense of whether or not there could be a line that represents the relationship between my price and my unit sales. To better understand what this trendline is or what are the characteristics of it that allow us to answer the question. If I change my price by $1, what is the slope of the line or what is the corresponding change in my unit sales? In order to do this, I need to run a regression. Luckily, Excel does a lot of the calculation for us. And so we're going to learn how to use Excel to run a linear regression. The first thing we need to make sure is that we have the Decision Analysis ToolPak available in Excel. This will appear under the Data Analysis ToolPak. Excuse me, this will appear under the data ribbon. If you don't have it, let me show you how you can add it. You can go into File > Options. You can go into the Add-Ins under Excel. Choosing these Excel add-ons, pressing the Go allows you to highlight Analysis ToolPak. Tick that box and you will now have the option Data Analysis on your Excel data ribbon. Once we've added this, we can now click Data Analysis, choose Regression. And in this dialog box, we're going to fill the following. First, we're going to choose our y input. In this case, our y-axis is our unit sales. So we're going to highlight this column of Unit Sales. We can also choose our x. And in this case, our x is our price. So we're going to highlight the column of Prices. There are labels in our top rows. So we're going to tick this box that says Label and we're going to choose where to put our Output. For now, let's place it here in column K or you can place it anywhere which is convenient to you. I press OK and I get some report out. Let me make the font much bigger. And get rid of some of the decimals, so we can focus on only the meaningful piece of information. And of this report, again, I'm going to focus our attention on just a few numbers. What we're interested in establishing is whether or not there is a relationship between our price and our unit sales. And so what we look at are these coefficients, the intercept coefficient and the price coefficient. What these tell us, this tell us an estimate from the data for our intercept of the line and for the slope of the line. So imagine we can write our relationship as follows, we can imagine that our relationship is unit sales and this is an estimate. I'm writing a little hat to state that this is an estimate. The estimate of unit sales, my forecast of unit sales is based on some intercept a plus some kind of slope b times my price, okay? And so based on our report and Excel's analysis, it came up with coefficient estimates for this a and for this b. This is going to be my a estimate. And this is going to be my b estimate. And so now, if I am asking myself, what is going to be the number of units that I sell for a given price? I can plug in these parameters. I can plug in and assume, ask myself, what am I going to get in terms of units sold? Given that I have, for instance, a price of $21. So given the price of $21, I have to add 1,204- 32 x 21. Why minus 32? Does minus 32 make sense? Well, of course it does. This corresponds to our negative trendline. For each dollar increase or decrease in our price, we will see a change of 32 units in our unit sales. If I increase my price, I'm going to see a decrease in the number of unit sales. That's a declining trend, that's why it's a negative coefficient. And so running through this analysis and during this calculation, we can see that this calculation gives us an estimate for what our unit sales will be at a price of 21. This comes out to be 532. However, there's still some error in our model. The points are not exactly on the line. Our line is only so good at explaining our unit sales. To understand how much variability there still is in our model, we look in our report at what's called our standard error. If you see up here, this is our standard error. So we still have a margin of say 40 that this number 532 could fluctuate. To understand what else might affect the unit sales, we have to collect more data. We may have a situation in which multiple variables affects how many units of toilet paper we sell in our store. And that's what we're going to do next.