Remember that regression analysis consists of four steps. In this video, I will explain how to perform the first two steps. You will learn how to make a fitted line plot, how to perform the main regression analysis, and how to interpret the output. In the previous video, we were wondering if the number of production stops was related to the number of broken tea bags. The number of broken tea bags was therefore our Y variable and the number of productions stops is our X variable. We already establish that both are numerical. Hence, we need to perform a regression analysis to test if there's a relationship between these two variables. The data we gathered looks like this. Now, pause the video, load the data into your Minitab before you continue. These are the four steps of regression analysis. First, we have to make a fitted line plot and study the data. This is what your data should look like in Minitab with the day in one column, the number of defect bags in another column and the number of stops, production stops in the third column. To make a fitted line plot, we go to the Stats menu, Regression, and here we find the Fitted Line Plot. What is your response, or your Y, or CTQ? That's of course bags, the number of defects. And what's your predictor? That's stops. That's it, click on OK and this is the output you get, a graph, the fitted line plots, and output in your session window. Here we see the fitted line plot. We also see the formula for the fitted line which is equal to the function Y = a + bX. The a represents the constant of the line, which is a value when X equals to 0. So, with 0 stops, we will still expect 6.3 defects. The b represents the slope of the line, this is the steepness of the line. In this case, if the number of stops is increased by 1, the number of broken tea bags is increased by 1.944. However, we still have to determine if these findings are due to chance or if they are structural effect. To do this, we have to perform the main part of the regression analysis. This is step two, let's look at the output in a session window of Minitab. We see that the P value is equal to 0 and it is smaller than 0.05. Hence, we conclude that there is a significant relationship between stops and bags. P-values are part of hypothesis tests. The null hypothesis states that there is no significant relationship between the two variables. The alternative hypothesis states that there is a significant relationship. If the p-value is smaller than 0.05, the null hypothesis can be rejected and the alternative can be supported. Since we found a very small p-value, the alternative hypothesis can be supported. If the p-value would be larger than 0.05, it means that the results are either due to chance or that we did not gather enough data to prove them. Our R squared is 89%, which is quite large. This means that the number of production stops is a strong predictor for a number of broken tea bags. The influence factor stops is a big fish. If your R squared would have been small, it means that there's a lot of additional variation in your Y variable that remains unexplained. The influence factor will be a small fish. Let's take a look at an example. Can you guess what the p-value and R squared would look like for these four graphs? In the top left graph, we have a p-value that's low and a very high R squared. The influence vector has a significant effect and is a big fish. In the top right graph, the p-value is still very low. However, the R square is smaller since the data is more widely spread around the line. Therefore, more variation in the Y variable remains unexplained and X is a smaller fish. In the bottom left example, we see a very high p-value. There's no proof of a relationship between the variables, hence R squared has no interpretation. The final example shows a p-value which is high. So there is no proof of our relationships between the variables. This is caused by a lack of data as there are only three observations. Again, the R squared has no meaning as there is no proof of relationship between the variables. Let me ask you a question. How many defects do you expect if we reduce the number of production stops to 3? Minitab computed the formula for us, the transfer function. You can use this formula to answer that question, fill in 3 as stops and we get a prediction of 12 defects, these 12 defects is the expected average. You can also use fitted line plots to get this answer. The second question then for you, what is your prediction for 25 production stops? We cannot answer this second question, because we have data on our X that only goes up to 9. So, 25 is far outside of this range and therefore, we cannot make a prediction. We discussed steps one and two of the regression analysis. Steps three and four are explained in the next videos. In summary, the fitted line plot indicates the constant a and the slope b of your line. The p-value indicates whether the findings are due to chance or whether they are a structural effect and the R squared shows you how strong the relationship is.