You've previously learnt to run linear regression and check your assumptions in R. In this lecture, we're going to examine what you found in a little more detail and interpret the results. When you visit the model in R, you should have obtained an output that looks something like this. You can see the estimate for the regression coefficient is 74.111, and you can interpret that as, for a one unit increase in FEV1, is predicted that the walking distance will increase on average by 74 metres. The 95% confidence interval for this estimate is 46 to 102. But what does this interval mean, and how would you interpret the width of it if you were presenting it to a colleague? The 95% confidence interval range is quite wide, this means that there's a 95% chance the true population parametre will lie somewhere in the range from 46 metres up to 102 metres. The confidence interval does not include the value zero, so we know that the result is significant at the 5% level. In fact, the lower confidence interval estimate of 46 is a long way from zero, so we know the p-value will be much more than 0.05. In the regression output, you would have also seen the adjusted R-squared was 0.21. So, what does this value mean? So this tells us that, the regression model explains 21% of the variability in the observations. To check all assumptions, you examined the residuals and one of the plots you would have obtained was the Q-Q plot, which you can see here. So, from it, you can see the observations fit closely to the straight line. So I hope you'll agree with me that the assumptions of normality look like they've been met. Finally, you would have also produced the plot that looked a bit like this, where the residuals have been plotted by the fitted values. This allows you to assess the assumption of constant variance. So what did you think of this plot? Do you think the assumption of constant variance is reasonable? Well, I would say, there is some evidence for non-constant variance, as from the plot, you can see for higher fitting values, the observations are quite close together compared to the scatter we see at the lower values, and we call this heteroscedasticity. So, that's a difficult word to say and spell but it simply means unequal variance of one variable across the range of another. The opposite of heteroscedasticity is homoscedasticity, and that's what we're looking for, where there's equal variance across the fitted values. I'm introducing these two terms now as you may come across them and you might want to know what they are. But what is really important is that, you are checking for constant variance across predictors. So you've had the opportunity to draw scatter plots of FEV1 and walking distance, with that least squares regression line fitted. In this plot, you can again say that, there is less variability in walking distance in patients with higher FEV1 values. So you may wonder why we're bothering to examine the residual plots. Well, We can easily check this when we form one predictor in our model, but we can't easily do this when there's multiple predictors. This is because the models are no longer two-dimensional, so you find residual plots really do come into their own with multiple regression modeling. There you are, fitting and checking our regression model in R is as simple as that, so you're going to get more practice now looking at age and walking distance, and then join me later where we're looking at multiple linear regression model for the first time.