Hi, I'm [INAUDIBLE]. We are starting session 2 of the first week of our operational analytics course. In session one, I introduced the news vendor problem and talked about its exciting applications. I also talked about uncertainty and the need for forecasting future events. We talked about objective and subjective forecasting tools. In this session, we will continue our focus on objective forecasting methods. I'm gonna introduce a simple but powerful tool called moving averages. I'm also going to mention a method called exponential smoothing. We're gonna learn how to build descriptive statistics and how to adapt these statistics for prediction. Welcome. It's week one, session two of operations analytics. We are focusing on descriptive analytics, and we're gonna look at forecasting. With past historical data, moving averages, and exponential smoothing in the advance slide. And that will be our session two. Recall our past demand data. We had 100 periods and we observed data throughout them. In those 100 periods, let DT denote the demand observed in period t. From the past data we have demands D1, D2, D3, and so on up to D100. In this example, D1 is 29 and D100 is 41. Now let's look into descriptive statistics, particularly the mean and the standard deviation. Let's start with the sample mean or the sample average. Sample mean or the sample average is just the arithmetic average of all the data points you have in your data set. Suppose you have N data points. You sum up all the N demands. D1, D2, D3, D4, up to DN. And divide by the total number of data points you have and that gives you a sample mean. We call it sample mean by meal. Sample mean roughly tells you what to expect the next observation to be. Itâ€™s straightforward to calculate, and it can also be calculated using the excel average function. Remember, whatever your average is, the demand in future will deviate from the average. Now, we look into sample standard deviation. Standard deviation, as we saw before, is a measure of how much noise, or variation from the average, there is in your data. And the way to calculate standard deviation is to look at the difference between your demand and the average. Square the differences and add all the differences for the entire data set. Add the squares of all the differences for the entire data set. Divide by n minus 1. Where your data has n deviant points. Take the square root that gives you the standard deviation. But however, standard deviation can be calculated using the excel formula stdev, which is simple and straightforward and I'm going to show you how. Now, we are ready to look at descriptive statistics for our data. Let's look at the entire data set, D1, D2 through D100. We calculated the sample average and we get 52.81. We calculated the sample standard deviation and we get 13.73. In the excel file, DemandData.xlsx, I show you how to calculate these two from our data set on an excel sheet. This might be a good time to pause the video and go look at the excel sheet and see how the calculation is done. I'm showing you the demand distribution or the demands that happened over the past 100 periods. You see the data was collected over the past 100 periods, and it's shown in column B. And on the right is a graph I showed you a couple of slides back. Let's calculate the mean and standard deviation of this entire data. To calculate mean, we use the average function. This is just an average of all the numbers here. One through 100 periods and you get the average data, that's 52.81. To calculate the standard deviation, we use the standard deviation function, which is SDDEV. Again, going from the first period demand, to the last period demand. Once you close the bracket and press enter, you have the standard deviation which is 13.73023. If you want to use the number of decimals we want to use, we just go click here and we close and we get 13.73. So the descriptor statistics, we have for our data is the sample average of the sample mean, which is 52.81, and the standard deviation is 13.73. Mean and standard deviation are two descriptor statistics for our data. If our data will normally distribute it, these two service checks would be sufficient to describe the demand, predictive statistics. However, if we need to use our data to forecast, we need to adjust our sample standard deviation for forecasting. We will talk more about this later. Let's look at the notation for our forecast. Remember that we denoted past demands as D1, D2, D3 up to Dt, which are the past values of the demand. We are interested in making a forecast in period t for a future period. We have data up to the period t. We did not forecast by letter F and in this case, we can see the letter F has two subscripts. The first subscript tells you which period you're making the forecast in. The second subscript tells you which period you're making the forecast for. For example, Ft,t + tau, means it's a forecast that's being made in period T, for a future period t + tau, where tau can be one, or two, or three. Tau denotes how many periods ahead you are looking, so Ft,t+3 as an example, is the forecast made in period t for three periods ahead. Let's put some numbers in there and look at an example. F100,100+3 is the forecast made in period 100, for demand in period 103. This is called and three step forecast, why? Because we are looking to forecast three periods ahead. We're gonna look at one-step forecast. This is because typically, we are interested in the next outcome, of simply, one-step ahead forecast. F t, t + 1 is the forecast made in period t for period t + 1. Just looking at another example, F t- 1, t is the forecast made in period t- 1 for the next period t. To simplify things, we'll use the shorthand. F t + 1 is simply going to say, what is the one-step forecast made for period t + 1? So if I want a one-step forecast for period t + 1, I simply call it F t + 1. In other words, the F t + 1 stands for forecast made in period t for the next period, t + 1. Forecasting for the stationary series. Stationary data shows no trend behavior. Roughly speaking, the future resembles the past. An example is our Newsvendor Demand data. If you look at the Newsvendor Demand data file, it show no perceptible trend. So a stationary time series has the following form. Demand in any time period t, demand D t, can be written as sum of two things. Mu, which is a constant, sometimes the sample average, plus epsilon t, which is the random variable. Epsilon t is a random variable with mean 0 and standard deviation sigma. See that the mean and the standard deviation are not dependent on time, and therefore, there is no trend in this data, and it's a stationary time series. We say random variables in section one of this course, and this is a good point to revisit that video and brush up your knowledge on random variables. Let's move forward by thinking about how to use past data for forecasting. Two common methods of forecasting stationary series are moving averages and exponential smoothing. And I will cover moving averages in this session. And exponential smoothing will be covered in the advanced materials slides. Moving averages method. A moving average is forecast is the arithmetic average of n most recent observations. We'll denote the moving averages method that uses n data points simply as MA (n). To come up with one-step-ahead forecast for period t, using the moving averages method, all we have to do is the following. To come up with a forecast for period t, F t, you take the end data points from the past periods. Going from t -1, t- 2, up to t- n, and then / n. The average of the past data points gives you the forecast for the next median. If you wanted forecast for several datas ahead, that is you wanted a multi-step forecast, for moving averages, the multi-step forecast is the same as the single-step forecast. So you can just use the single-step forecast for your multi-step forecast under the moving averages mater. We saw how to come up with one-step-ahead forecast using moving averages method for period t. You might wonder why it is called the moving averages method. It's called the moving averages method because the chosen data points move and are always the most recent n data points. For example, if you wanted to forecast for period D, you took n data points that went from D- 1, D- 2, up to D- n. Now if you wanna forecast for the next period, D+ 1, what happens is you have one fresh demand data, D t, and you add that here, and you drop the last observation that you used in the previous period. So F t + 1 is going to be sum of n data points, D t, D t- 1, up to D t- n- 1 / n. We have essentially moved the data we are using by 1 period. We added D t, and we took away D t- n. Let's use an example to understand this better. Let's calculate the moving average for the past ten periods. Using our notation, I simply call this MA(10). First, we will look at the descriptive statistics, then I will show you predictive statistics which you can use for forecasting. You will see more of how to do the predictive analysis in week four of this course. For the moving averages method, I'm gonna show you how to use the last ten data points to come up with your forecast. The last ten data points from our data set are marked in red. Let's look at an example of descriptive statistics. We take the average of the data of the last 10 points in our observation, and the average is 49.60. That's our sample mean for the limited data. And our sample standard deviation is also calculated similarly, and we get s = 10.28. Again, I'll show you how to calculate the descriptive statistics using the template. We can look at the template Week1MATemplate.xlsx. And our data, if it was normally distributed, let's say like a bell curve, the above statistics that we calculated, namely, the mean and the standard deviation, would be sufficient to describe the demand distribution. In this Excel video, I'm gonna use week one MA template, or moving averages template file, to show you moving averages calculations. We have hundred periods in column b and we have hundred demand points in column c. And you see there are few data out here because I've hidden some rows and we can unhide those rows. And look at all of them for purposes of display, I'm going to hide those rows again. Starting from 10 all the way up to 79 or 80. Right click and hide, and this doesn't actually affect our calculations in any way. To calculate descriptive statistics, first I'm going to show you how to calculate mean and standard deviation using the moving averages method of the last ten data points. That is MA(10). And that's straight forward. Again, the average, or the mean, is given by the average function. Take the average function. Start selecting the data. And you can see, as you go up, you can see the count happening. And that's your ten data points. And typically, on your bottom of your Excel sheet, there is also a count expression that you can check. So, once you have this. The average answer will be 49.60, which is what I showed you on the Excel sheet. Again, these ten data points here give you the average of 49.6. And we're gonna calculate the standard deviation now. The standard deviation, again, is given by STDEV function, and select the ten points. And that gives you the descriptive standard deviation for the ten data points you have. And it's just 10.27, and we can reduce the number of decimals and we get 10.28 here. Similarly, we can calculate moving averages for 20 data points and standard deviation for 20 data points. So let's calculate the average first. And the average of 20 past demands is going to be these 20 numbers that I'm picking up, and you get 51.95. And the standard deviation is going to be, for those 20 points, 9.61, or 9.616, which is 9.62. Here we have the average of the sample mean, and the standard deviation for ten data points. And for 20 data points. Similarly, you can do it for all data points. For all your data points, we should get the numbers we got before. That is periods 1 through 100, the average is 52.81 and the standard deviation for all this data is 13.73. And as you can see, the more data you have, the more ability you have to calculate these numbers, and more confidence you have in the forecasting process. We can use this 36 that we calculated for prediction, or for forecasting. We call this predictive statistics. The mean for predictive statistics is the same as the descriptive sample mean. In other words, the descriptive sample mean that we calculated is an unbiased estimator of the mean of the true demand distribution. And hence, the sample mean that you calculated can be used for prediction. However, because of insufficient data, the standard deviation of the prediction needs to be adjusted and that's what we're going to do in the next step. If you know that the data is normally distributed, like in our case, we can adjust standard deviation for predictive purposes, and this is how we do it. The standard deviation for prediction is going to be the standard deviation we calculated plus a corrective factor. And the corrective factor is s, standard deviation calculated, divided by the square root of n, where n is the total number of data points you used. Let's look at some predictive statistics when we use 10 data points. When the data is normally distributed, the mean for prediction is the same as the descriptive sample mean, so we use mu = 49.60. However, as we noticed, the standard deviation for prediction needs to be adjusted. And the adjustment gets you a sigma. So here we calculate s is equal to 10.28. The correction factor is s over square root of n. So 10.28 + 10.28 / square root of 10. Because we use ten data points, that gives us 13.53. As you can see, that's a significant correction. Now we can use mu and sigma for predictive purposes. Let's continue with an example of descriptive statistics using 20 data points, or moving average of 20. Again, we calculate the sample mean. The calculated sample mean from the last 20 points comes to be 51.95. That's our descriptive sample mean. The sample standard deviation comes out to be 9.62. Let's do an example for predictive statistics using 20 data points. Again, when the data is normally distributed, the mean for prediction that we got before, the descriptive sample mean, is sufficient. The mean for prediction is 51.95. We expect that the demand in the next period is going to be 51.95 on average. However, the standard deviation for prediction needs to be adjusted. From what we calculated on descriptive statistics. So the standard deviation for prediction is going to be sigma, which is equal to s, which is the descriptive sample standard deviation. Plus the correction factor of s over square root of n. In this case we get 9.62 + 9.62 / square root of 20, because we used 20 data points, and we get sigma as 11.77. For forecasting purposes, we can assume that the actual demand distribution will deviate from the average. With the above standard deviation, which is 11.77. Now let's think about using more data. As we have more data for forecasting. What happens is descriptive statistics approach predictive statistics. As we have more data, we gain more confidence for prediction. Also note as the number of data points increase, the descriptive standard deviation approaches the standard deviation for prediction. That is, as n becomes larger and larger. The second term, the correction term, disappears and sigma becomes almost equal to s. Suppose we use our entire data set for calculating our statistics. That is, we're using all the data available of n equal to 100. The descriptive statistics, as calculated from Excel, are the following. Mu is equal to 52.81 and standard deviation s is 13.73. And if our data was normally distributed, for normal distribution, the predictive statistics are as follows. The mean, mu, is still going to be the same, it's 52.81. However, the standard deviation for prediction, as we saw before, needs to be corrected and adjusted. And that is gonna be 13.73 plus a correction factor of 13.73 divided by square root of 100, because we used all 100 data points. And that comes to 15.10. So what does this really mean? That means for our demand, for prediction purposes, we can use a normal distribution with mean 52.81 and standard deviation 15.10 in the coming weeks. I show you how to calculate all this numbers, in an Excel file, and the solutions are all documented in the Excel file, Week1 MASolution.xlsx. Based on our predictive analysis, we can now generate a normal distribution graph to visualize how our demand is distributed. We use a mean of 52.81, a standard deviation of 15.10, and the figure shows you how your demand distribution looks like. You can see there are a lot of numbers close to the average, 52.1, so demand is likely to be closer to the average, and there are small probabilities where demand is likely to be very low, or demand is likely to be very high, but it's possible. Based on our analysis, so far we can now discuss moving averages before we wrap up and look at other data sets. The advantages of moving averages method is pretty easy to understand. It's straightforward to compute and it does provide stable forecasts. There are some disadvantages, though. First, if there is trend in your data set, the moving averages method will lag behind the trend, as we will see in section three. Finally, it's not a causal model. That is, the moving averages method won't explain why your demand realizations in the future behave in a certain way. It might be very good for prediction, but it doesn't explain why your demand behaves in a certain way. Finally, know that the moving averages method drops all old data. So if you have a moving averages method of ten data points, it will drop all the data that is older than ten time periods. Now this leaves us with the question about how to choose N. How do you decide how many data points that the Moving Average Method should be using? Moving Averages, what data should we use for moving averages? If you choose to use moving averages method of last ten data points, remember two things. One, all older data is ignored. That is, data from 12 periods back is not used in your calculations for moving averages. The second point is all the recent ten data points are all equally weighed the same. That is, yesterday's data has the same weight as the data from a week back. You might want to give some more weight to more recent data and less weight to older data. Now how do we do this? Exponential smoothing is a method that allows you to do this. It's based on this precise idea. We're gonna see more of this in the advanced slides. As I mentioned before, any forecast you come up with will have some noise in it. It's going to be wrong because the point forecast is inaccurate. So how do you measure if our forecasts are good or not good? Forecast error is a good way to measure it. Forecast error in period t, let's say, is denoted by epsilon of et. The difference within the forecast for demand in period t and the actual value of the demand realized in t is your error. For one step ahead forecast, your error in the time period T is the difference between forecast made in that period, FD, minus the demand that occurred in that period, DT. The difference tells you what your error in the forecasting was. Now there are three ways to measure errors. I'm just going to define them now and we will walk through an example. The first one is mean absolute deviation, or MAD. Mean absolute deviation calculates the following. It calculates how much absolutely your forecast deviates from the actual demand and takes the average. Mean Squared Error, or the MSE, calculates the following. It calculates the error in your forecast squares the error, and calculates the average squared error. Now, why is this useful? We'll see that it's actually useful in fitting trend data set later on. The last way to measure error that we will cover is called MAPE or Mean Absolute Percentage Error. MAPE is the following. It takes error that we calculated, divides it by the actual demand as a percentage error, then calculates acts of all the percentage errors and calculates their average. And this will give us a mean absolute percentage error, or MAPE. One thing we can say is that lower the errors, better the forecasting process is. A good forecasting process will have a low MAD, low MSC, and low MAPE. The last thing I want to cover is biases in forecasts. A bias occurs in forecast when the average value of the forecast error tends to be on one side of the forecast. It tends to be either too much or too little. Positive or negative. So that will help us initial biases. Using our dataset again, we'll go through different methods of calculating errors in our forecast. Assume we have data up to 80 periods. We will using moving averages of ten periods to calculate our forecasts for period 81 onwards. Once we have a forecast for period 81, the demand for period 81 occurs. Once we have a forecast for period 82, the demand for period 82 occurs, and so on. So, we are able to calculate the errors. And using MA(10), and then we can calculate the errors by comparing the demands and forecasts over periods 81 through 100. In Week1ErrorsTemplate.xls file, I'll walk you through an example exercise. Let's calculate the errors using the template file that I'm showing you. This is Week1ErrorsTemplate file. After I calculate the errors, I have a solution file, and I have posted the solution file also on the website. In the template file, I start with the same 100 points of demand that you saw before. 100 periods and 100 demand observations. In this, I hide rows five to 62. It makes it easier for me to show you the methods that I'm gonna show you. But you might ask well, unhide it whenever you want to and go back and hide it once you are ready to start your calculations. So in this case, I'm gonna look at moving averages of ten. So the forecast for moving averages of ten periods is just the average of the last ten observations. So I just do that here. I start in period 81, and I'm going to take the average of all demands from period 71 to 80. And that gives me 55.8. The forecast is for 55.8. The actual demand turned out to be 52, so there is an error. And the error is measured typically by forecast, minus the demand, and you get Positive 3.8. That's your error. In column G I'm gonna calculate the absolute error. Absolute error is calculated by Abs function and taking the number. And absolute error is of course 3.8. If your error was negative 3.8, you would be getting an absolute value of 3.8 in column Z. In column H, I'm going to calculate the squared error, which is 3.8 times 3.8. And then, in the last column, I, I'm gonna calculate the absolute value of the first integer. Sometimes you're interested in the first integer. You add the error of 3.8 but how does the error compared to the actual demand? So in this case, the error was 3.8 and the actual demand was 52, it's about 7% and we will see that. So this is calculated by taking the absolute value of the error divided by the demand. And as you see, it's 0.07. We have the error, absolute error or the absolute deviation. We have the square error and we have the percentage error. Let's repeat this calculation for all the pivots from 81 onwards. And I do this just by dragging the values up to period 100. And as you can see the errors can be distributed quite significantly. You have an error as high as 21.8 and as low as -2.7. We can see errors can be negative, -10.4, in period 90. Within a couple of periods they have as a positive and it's positive 14.1. So the absolute deviation takes care of both negative and positive errors as errors, and calculates the average absolute deviation. We can do that, and that's our MAD. MAD is mean absolute deviation, of the average of the absolute errors, and in this case we get the average of absolute errors is 8.9. Similarly, we calculate the mean square error, or the average square error. And that's going to be the average of all these square errors. And that's 113.15. Finally, we calculate MAP, or mean absolute percentage error. Which tells us roughly what your percentage deviation of your forecast is from the true demand. So let's take the average of all these numbers, and we get 0.197. To express this as a percentage, we have 20%, or actually, 19.72%. So on average, your forecast deviates from the demand by 19.7%. The square error is about 113%. The absolute value of the error you have is about 8.9 units of demand. So you might wonder whether we can do better. Let's check whether we can do better by looking at moving averages of 20 data points. So I'm gonna do that here. Starting from column K. So where I generate the moving averages of 20 periods. So let's start with the forecast for period 81 using an average of 20 data points. So that goes from period 61 up to period 80. That's 20 data points. And you get 55.5. Earlier forecast was 55.8. The current forecast is 55.5. The error is 55.5 minus the demand which is 3.5. The absolute value, I'm repeating calculations here just to make it very clear. The absolute value of the error is the absolute error. It's measured as a positive value regardless of errors being negative or positive. And that's 3.5. The square error is 3.5 times 3.5 is 12.25. And then finally, the percentage error is absolute value of your error divided by the demand. And that's 0.067. So we take these five numbers, again we drag them over the last 20 periods, and we have forecasts, we have errors, we have absolute errors, we have square errors, and we have percentage errors. Again let's calculate the average, absolute error. And that's 7.655. That's come down, compared to what it was before, 8.9. The average square error, or the mean square error, MSC. That's come down. And that's 92.611. And the percentage error, let's calculate the percentage error in this case. And that's come down too, and in fact it's 17.29, 17.29%. So MAD has come down absolute deviation. Square error MSC has come down, absolute percentage error, MAP has also come down. So it looks like using moving averages of 20, that's better than using moving averages of ten for our dataset. As they say often, there is a model choice, and this is one of those situations where there is a model choice. Even though you can forecast, you have to think about what data to use for your forecasting process. The solution file is also available on the web site as Week1ErrorsSolution.xlsx. Let's return back to our slides. At the end of the analysis, I'm going to show you that if you use a moving averages method of MA(10) we will get, MAD is 8.9, MSE is 113.15, and MAPE is 19.72%. Using MA of 20 points, moving average of 20 data points. The MAD, or mean absolute deviation, is 7.66. The mean squared error is 92, and the absolute percentage error is 17.29%. So this at least answers that MA(20) seems to be performing better than MA(10). The solution for the Excel analysis is available in Week1ErrorsSolution.xls. It may be preferable to use MA(20) over MA(10) in this case, by comparing the corresponding errors. Thus, measuring errors allows us to understand better the choice of which method to use in our forecasting. Finally in our data there is no evidence of any bias, but we need to collect more data to be sure about that. We are about to wrap up now. In this session we saw a variety of things. We saw how to forecast using the moving averages method. We how to measure the headers, how to look for biases. We learned about descriptive statistics and how to adapt the descriptive statistics for prediction. See you in the next session. What are we gonna do in the next session? We're gonna think about trends and seasonality. And let's look at trend and seasonality in session 3 of week 1. See you there. In this session, I introduced you to some fundamental forecasting concepts. We also learned several technical details on forecasting methods as well as single step and multi-step forecast. And we learned how to forecast in stationary data settings. We covered the moving averages method and also exponential smoothing in the advanced material. We covered some example problems to illustrate the concepts and we talked about errors, biases and forecasting outcomes. We learned some descriptive statistics, mean and standard deviation. They'll be used to describe our data, and we learned how to adapt descriptive statistics for prediction, for forecasting future events. We have more to learn in the next session about trend, and seasonality, and data.