In this session, what we're going to be looking at is how to put together a basic forecasting model. So when we're looking at forecasting, it might be from a finance perspective, trying to forecast what stock price is going to look like. From a marketing perspective perhaps we're trying to forecast consumer demand, and what is our revenue going to be for products. And maybe we're trying to do that at national level, maybe we're trying to do that at a store level. What we'll get into in the next session, maybe we're even interested in doing that down to the level of the individual consumer. So what I've put together here is just a plot of weekly demand for a product over the course of a couple of years. And so, we see that there are going to be some ups and downs, some fluctuations on a week to week basis. But we also detect what might be a positive upward trend in our data set. So, for example, if we were to look at just trying to put together the growth of this product over time, perhaps what we see is that incline. So it seems like we're growing over time, from week to week, we're seeing more baseline sales. Now we've been throwing in that trend line. It's not enough to capture all of the fluctuations, all the spikes and valleys that we see in the data. So what we want to look at today is what are the different components that we need to put into a forecasting model? How do we assess if those components are necessary? And how do we use our forecasting models to project out into the future? So, I'm going to cover briefly a couple of different methods, smoothing methods and auto-regressive methods, before we end up focusing on regression-based forecasting models. And those regression-based models, that's going to be the workhorse that we're going to keep on coming back to because they provide us with much more flexibility than smoothing and auto-regressive methods alone are going to provide. So if we look at smoothing models, the underlying belief here is that the past is the best predictor of our future. And more specifically, the recent past is going to be the best predictor of the future. So, for example, if I'm trying to predict what are sales going to be next week, or next month, or next quarter, well let me look to the most recent week or month or quarter. All right, and that's going to serve as a baseline for me. But I'm not just necessarily going to look back one week, one month, one quarter at a time. I'm going to look back multiple periods over time, and that's going to incorporate some of the fluctuations that we see from one time period to the next. So the idea with smoothing models is going to be, let's take a bunch of recent observations and essentially average them out. Right, so we know that If we're focusing on weekly demand, there are going to be those natural fluctuations from week to week. And we're not interested in picking up what those random, short-term fluctuations are. What we want to capture is the underlying level of demand. So as I'd said before, we're going to take an average. And the hope is that that's going to smooth out some of those short term fluctuations. So, for example, in one week demand is going to be above average. In another week demand is going to be below average. If both of those observations are included in our smoothing model they're going to cancel each other out. So if we look at kind of the simplest model that we can use is what's referred to as a simple moving average. So I'm going to take a period of time, length L. And let's say L is going to be four weeks. Well, if I want to predict what is next week's demand going to look like, I'm going to look back at the most recent four weeks, and I'm going to take an average. And that's all that this equation is formalizing for us. So, in the numerator, what we're doing is we're just adding up the most recent four observations that we have, the most recent four levels of demand. And then in our denominator, we're dividing by how many periods have we accumulated? So we're taking that simple average. All right, well, there may be some problems associated with this. So, for example, when we're doing this, we're saying all four of those weeks are equally informative. Well, what if you believe that the most recent observation is more informative than the observation that's furthest in the past? This model wouldn't be enough to account for that. What if we have something like seasonality in the data, and I'm on the border between different seasons or I'm on the border between different quarters? Well, the simple moving average model, not going to be able to account for that on its own. So to alleviate one of those issues, we're going to move toward what's referred to as a weighted moving average. So rather than putting an equal amount of weight on each of the observations. So in our last example where we said L equals 4, we're going to look back at the foremost recent weeks of observations we are essentially putting one quarter or 25% weight on each of those observations. Well what we might want to do is put more weight on the most recent observation. So perhaps we decide to put 50% of the weight on the most recent observation. Well, now I need to allocate that remaining 50% of the weight among the remaining three observations. So this is going to give us that additional flexibility to say, perhaps some weeks or some observations are more informative than others. Now this equation's going to look a little bit more complicated, but it comes from the same underlying idea. In our numerator instead of just taking the sum we're going to take a weighted sum. Where we're weighting each of our observations Y, by the weight based on how far into the past they are, and that set of weights is given by the Ws, all right? And then, in the denominator, we're just adding up what those weights are. Now these weights, think of them as probabilities, think of them as being between 0 and 1, all right? So imagine, for example, if we were to say each weight is going to be equal to 1? All right, well if each of these weights is equal to 1, the top term is just adding up my Y value. So it's just taking the sum over those Y values. The denominator just becomes the number of observations that we have. So if we plug in weights equal to 1, we're back to using our simple moving average. If we think of these weights as probabilities, in the denominator, the sum of the weights is going to add up to 1. So what we're left with is just a weighted sum where we get to determine what are those weights W. So if you believe that the most recent observation is more valuable, perhaps we say the most recent observation gets a weight of 50%. The next observation gets a weight of 30%. The next observation gets a weight of 15%. The final observation, which they get to weight of whatever is remaining. Now, these weights don't have to be between 0 and 1, you can put in any positive integer, or any 0 and above, any number that you like. But if we think about these weights as proportions, that's going to make it a little bit easier for us to understand how this weighting is happening. So a simple moving average, all that it's doing is saying all of our weights are equal. So the weight is really just 1 over the number of observations that I have here. Much more flexibility, we get to decide ultimately what those weights are, and hopefully that is going to be informed based on what we've learned in the past. A set of values that we've seem to be particularly predictive. So, where do these modules potentially break down, and why do we need anything different? Well let's suppose that there's a trend in the data. We'd seen in that first graph that I showed, that it seemed like there was a positive trend, that it was growing over time. Well let's think for a second what's going to happen if you're using a smoothing model that says let's look back at the most recent set of observations to try to predict the future. Well If there's a positive trend as we go further out into the future, we're expecting more and more. We're expecting to have more growth. But if I'm relying on observations from the past to predict the future, that growth isn't going to be taken into account. So any time that there's a trend, the simple moving average or the smoothing model's not going to be able to capture that for us. Now if we're focused primarily on the short term, that's probably going to be where we have the best application of these models, particularly if we're staying within a season or within a quarter, and we're trying to understand what's going on from week to week. Now we're going to see this come up again, or the idea behind it come up again, when we look at regression models. When we think about a regression model, what's going to help us predict the future? Well one of our predictors might be past sales, past levels of demand. So we can think of this as incorporating lagged variables into our regression analysis. So it's going to take the idea from smoothing models that there is information contained in past y variables, but let's bring those in as predictors for the future. All right, so I said that smoothing models, maybe that's not enough. Maybe we need something that's a little bit more flexible. Well, that's where an auto-regressive model can come into play. And what I've put up here is just a simple structure using the most recent two periods to predict what the next period's demand is going to look like. And so, if we look at this, looks pretty similar to a regression model. And, in a sense it is, but we're putting a very specific structure on the predictor variables. Typically, what's on the right hand side of the equation are x variables. Well the only x variables that we're including in this analysis are going to be the most recent two observations that we have. So those are our predictors, all right? Well, if those are the predictors, what do the coefficients mean? Well, just like in regression analysis, our intercept value alpha, that's going to give us our baseline level. And then the other two coefficients that we're interested in estimating, in this case, beta 1 and beta 2, those are going to be the weights that we put on our predictors. So very similar to the smoothing models but we're going to be estimating beta 1 and beta 2 based on the data, rather than making assumptions about the value of W for those particular weights. Now, I've included here the equation corresponding to using the two most recent observations. We might go look further back in time and say, perhaps we need to look back three periods, four periods. Well generalizing this model is going to be as easy as adding a few more terms. Adding Beta 3, and that's going to be multiplied by the Y value at time, t-2, and if we wanted a Beta 4, going back to the Y value of time, t-3. So that's another decision that we have to make. How many lag variables do we need to include in this particular model? And that's something that model fit criteria can reveal to us. Just how far back into the past do we need to go? Now, one of the things to keep in mind, the more terms I put in the past, I need that many more observations when I'm making my predictions of the future. So, for example, if I'm looking back and I require four data points to predict the future, I can’t make a prediction until I have at least those four data points. So there's a trade off that we're going to have to wait longer for those observations to come in if I want to include them in my analysis.