0:08

In a linear regression analysis,

Â the accuracy of the model is assessed by the mean square error, or MSE,

Â which is the difference between the model estimated value of a response

Â variable denoted as Y hat, and the observed value of the Y response variable.

Â This difference is computed for each observation, where each

Â observation is denoted by the i subscript, and then the difference is squared.

Â Finally, the squared errors for all the observations are summed, then divided by

Â n, the number of observations, to get the mean square error.

Â There are two characteristics that we need to consider when selecting an accurate

Â statistical model, their variance and bias.

Â Variance refers to the amount by which the model parameter estimates would change

Â if we estimated them using a different training dataset.

Â If a method has high variance, then small changes in the training data can

Â result in large changes in the parameter estimates.

Â Ideally, the parameter estimates are stable across data sets

Â meaning that the method has low variance.

Â Bias refers to the error that is introduced by using a statistical model

Â to approximate a real life phenomenon.

Â It is a measure of how far off the model estimated values are from the true values.

Â For example, we might use a statistical model to try to predict whether or

Â not a person will become nicotine dependent.

Â In reality, there are lots of factors that lead to nicotine dependence.

Â Whether or not a person becomes nicotine dependent depends on a complex

Â interplay of multiple factors like genetics, behavior, attitudes, and so on.

Â By comparison, the statistical models we developed to estimate nicotine dependence

Â are very simple, and as a result they don't fully capture this complexity.

Â This in turn leads to biased parameter estimates.

Â What we would like to do is find a statistical model that

Â has both low variance and low bias.

Â The problem is that these two properties are negatively associated.

Â Increases in one result in decreases in the other, hence the bias,

Â variance trade-off.

Â Generally as model complexity increases,

Â the variance tends to increase, and bias tends to decrease.

Â Simpler models will be more stable across samples, meaning that they will have low

Â variance, but they are also likely to be more biased.

Â In this figure, you can see an example of a simple model and

Â a complex model fit on a training data set.

Â In the complex model on the right,

Â the observed values are all very close to the estimated regression line.

Â So the error rate is low, meaning that bias is low.

Â However, more complex models attempt to capture every pattern in the training

Â data set even those that occur by chance.

Â The chance patterns are specific to the sample on which the model is fit and

Â are not likely to exist in the test data set.

Â As a result, the model will not fit as well in the test data set

Â which means that the test error rate will be high in the test data set.

Â When a model has a small training mean square error but

Â a large test mean square error, the model is said to be overfitted.

Â When we overfit the training data,

Â the test mean square error will be very large because the patterns

Â that the method found in the training data set simply won't exist in the test data.

Â On the other hand, the simpler model on the left doesn't predict the observed

Â values as well, and as a result it has a high training mean square error.

Â This simpler model may be considered underfitted.

Â Basically, the simpler model ignores many of the patterns in the training data set,

Â which lead to increased bias.

Â The high mean square error means that it is not taking into account

Â many patterns that are likely to be real, so

Â is also likely to result in a high test error rate.

Â On the other hand, the simpler model is also likely to overlook random

Â sample specific patterns which means that variance will be low.

Â This figure shows how model complexity impacts training and test error rate.

Â In a really simple model, there's a lot of predication error.

Â Bias is high, but variance is low.

Â In this case the model is underfitted.

Â As model complexity increases, you can see that the prediction error, or

Â bias, decreases in the training sample.

Â Similar to the training sample, prediction error, or bias,

Â decreases, and variance increases as the model becomes more complex.

Â However you can see that there's a point which an overfitted or

Â increasingly complex model will actually increase the test error rate.

Â In this situation a model that is ove fitted on the training sample

Â leads to a low error rate in the training sample

Â at the cost of fitting poorly in the test data set.

Â The ideal model complexity is where the test error rate bottoms out.

Â The model at this point will have low bias and low variance,

Â both of which will provide the lowest possible test error rate.

Â Assessing model accuracy and the bias variance trade-off also applies to

Â situations in which we develop statistical models to classify observations

Â into different levels of a categorical outcome variable.

Â Logistic regression is an example of a classification model.

Â In logistic regression, model accuracy is determined by how well a logistic

Â regression model developed on a training data set, correctly classifies

Â observations on a categorical outcome variable in a test data set.

Â The same bias variance principle applies in this case, such that a logistic

Â regression model with low bias and low variance will have a low prediction or

Â classification error rate in the test data set.

Â For example, we might develop a logistic regression model to predict whether or

Â not a person is nicotine dependent.

Â The statistical model developed on the training sample

Â can be applied to observations in the test sample to see how accurately the model

Â classifies observations by comparing model predictive nicotine dependence diagnosis

Â to actual diagnosis for observations in the test data set.

Â A confusion matrix,

Â like the one shown here, can be used to estimate prediction accuracy.

Â A model with low prediction error will have a high percentage of correctly

Â classified observations, and a low percentage of misclassified observations.

Â In this example, the training data statistical model incorrectly classified

Â a total of 123 of the 992 observations in the test sample, meaning that

Â the statistical model misclassified 12% of the observations in the test data set.

Â