So as I mentioned at the start, we're going to focus on using regression as the technique for building up our forecasting models. And we've seen a little bit of this already with the autoregressive models. So what we have to make a decision for is what's going to go in the model itself? Do I need to include a trend component? What is the nature of that trend component look like? Is it going to be a linear trend? Is it going to be a trend where we see increasing or decreasing returns as we go further out in time? You may want to include a cyclical component taking into account, macro state of the economy. We may want to take into account seasonal factors, for example, if we're dealing with a seasonal demand for products. Perhaps, that needs to come into our analysis. If we've got marketing activity, if we have information, as far as, the amount of marketing spending that's been used in each of our past time periods, that's something that we can bring into the analysis. When we get into looking at store level, or individual level predictions, the regression analysis, we can even include factors to take into account that heterogeneity, that exists across stores, across regions, across individuals. So that's the big advantage with the regression framework, is we get a lot more control over what's going in that right hand side of the equation. Now, in terms of assumptions that are being made. We're typically going to be putting these in as additive term. And that's something that you can play around with as far as changing the form of the regression model. I'm going to demonstrate it using linear regression today. But there's nothing stopping us from building a more duplicative model. To change the way in which these components enter into that prediction. So again, let's go back to the data that we had started with where we'd said that there seems to be a positive trend here in the data. And so, we might say, well, let's test out a model that includes a linear trend. And what I want to emphasize here is let's start with the simplest model possible, because when we're interested in forecasting, our goal is not to come up with the model that best fits the data that we're going to use during, which call the calibration period. The calibration period is the data that's used to construct the model. The hold out, or the forecasting period, that's where we want to make sure that the model is capturing what's going on so that it's going to be useful for the future. So we're going to take the data that's available to us, we're going to divide it into those two components. Calibration period, forecasting period. We're going to build the model on the calibration data and then apply it to the forecasting data. So the forecasting data has never been used in terms of estimating coefficients. Once we come up with a modeling framework that works, we can then apply it to the next period of time that hasn't happened yet and make our predictions. And so, again, in this case looks like we've got a linear trend. We've got actually three years of data in here, we're going 2008 through 2010. So what we're going to do is we're going to use the earlier part of this time period to calibrate the model, and then we're going to see how well the model does at predicting the final year in the data. So this data set is already up on the course website that you can work with it. But the model that we're going to run, simple linear regression, it's going to be our y variable and this case the weekly demand. There's going to be an intercept that we're going to estimate and then there's going to be a slope that we're going to estimate and the only predictor that we're going to include is the week number. So week one, number one, week two, number two, week three, number three, and so fourth. So we're just assuming that there's a linear tract. Another two different ways that we can go about doing this. One is to use the regression tool that's built in to the data analysis tool pack within Excel. The other way that you can do this is to use the commands that you type directly into the cells. The equals intercept command and equals slope command. So these two commands have one particular advantage over using the data analysis tool pack. Because these commands are going to stay active. Because these formulas, you're not going to be typing in fixed values. Anytime that you update the data set, if you've used the intercept and slope commands, if you change the y values the x values its automatically going to recalculate and re-estimate the intercept and slope coefficients for you. So if you're building a worksheet that's going to be used, let's say every quarter to make your forecast, you want to keep as much active in that spreadsheet as possible. If you're running the data analysis tool pack regression tool every single time, when new data comes in, you've gotta run that analysis tool again. Whereas, if we were to build it using the formulas, new data comes in, I update the data, I feeded it, I import the data into the appropriate cells. If I've used these formulas, I don't have to retype anything. I don't have run anything new. It'll just automatically update once the new data's there. So what does this look like if we were to use the first two years of data as calibration and that's the dark black line and then try to predict what happens in the future? And those forecast are represented by a dashed black line. And if we were to look at this it doesn't look that bad. Looks like we're doing reasonably well. Doesn't look like we're systematically missing anything in our data but there are some places where perhaps we can do better. So for example if we look at these couple of regions and they seem to recur around this summer, those two regions might tell us that we're missing something systematic. And then, if we go out into the future it also seems like we're missing, and it turns out that it's in that summer period again. So one way that we can try to dig in a little bit more to understand what needs to be done is to look at the residuals. Before we get there, is it actually a linear trend? Well, rather than drawing a straight line over the two year period, what if we were to draw a line that best captured the trend during the first year? And drew a separate line that does captures the trend during a second year. Now, if it's a linear trend, these two lines should have the exact same slope. If you see a difference in the slopes that's going to be the tip off that it's not a linear trend. So let's take a look. We haven't gotten around that issue in both of these years, we still have that under prediction happening in the summer observations. But notice the difference in the slope of these lines. If we were to continue these lines out, notice how much faster the trend is in the first year, and that's evidenced by how quickly the red line is growing compared to the green line, if these were the same slope, these lines would overlap each other. So this difference in slope is telling us that, we're observing much more growth early on in that first year in the 2008 data whereas in 2009, it looks like we're observing slower growth. Well, what does that mean for us as we try to forecast out 2010? Well, that's something that we're going to have to be careful of b ecause if I forecast using that red line, I'm going to over-predict a lot more. So we want to recognize that this is not a linear trend. It looks like it's slowing down over time. So we need to build our trend in such a way that it accounts for the fact that it's decreasing. Still growing, but at a decreasing rate over time. One of the ways that we can take that into account is instead of just using the week number as our predictor, what we're going to replace that linear trend with is the logarithm or in this case more specifically the natural logarithm of the week number. If the natural log curved, it takes on the form of growing and then there's going to be curvature to it so that it's going to keep on growing but at a slower and slower rate over time. And so, given what we observed in our data, fast growth during year one relative to year two, we might decide that this is a better function or a better functional form for our trend variable. Now, here's the trick. As far as Excel is concerned, we're still running linear regression. We still have an intercept. We'll call that alpha. We still have a slope. We're going to estimate that's beta. And what's our x variable? Well, instead of just being the week number, now our x variable is the logarithm of the week number. So that's what's going to go in as our sole predictor. So as far as Excel's concerned we can put in any functional form that we might want but it's linear regression because of the general structure of this equation. So let's take a look at how that looks when we estimate that model. And we seem to be doing a little bit better. We're not capturing it perfectly. And we still have those summer periods, we pointed out earlier, where we're not doing the best job, but in terms of the forecaster, we seem to be more in line with this general pattern, not picking up those high levels that we would like to pick up. So how do we fix that part? So we have talked about looking at your trend. Perhaps making it more flexible than just a linear function and there's nothing magical about using the natural logarithm. Anything that grows at a decreasing rate would suffice. So we might use something like the square root of the week number. In other cases if the trend were accelerating we would need a functional form that increases over time. So perhaps a square term or a cubic polynomial. There are many different ways that we can think about trying to capture the trend. It really is just going to be a matter of what works best based on your data.