0:00

In this video, our final topic will be on prediction and decision making.

Â How can we determine if our model is correct?

Â The first thing you should do is make sure your model results make sense.

Â You should always use visualization,

Â numerical measures for evaluation and comparing between different models.

Â Let's look at an example of prediction.

Â If you recall, we train the model using the fit method.

Â Now we want to find out what the price would be for

Â a car that has a highway miles per gallon of 30.

Â Plugging this value into the predict method gives us a resulting price of $13,771.30.

Â This seems to make sense.

Â For example; The value is nauts negative,

Â extremely high or extremely low.

Â We can look at the coefficients by examining the coef_attribute.

Â If you recall the expression for

Â the simple linear model that predicts price from highway miles per gallon.

Â This value corresponds to the multiple of the highway miles per gallon feature,

Â as such an increase of one unit in highway miles per gallon.

Â The value of the car decreases approximately $821.

Â This value also seems reasonable.

Â Sometimes your model will produce values that don't make sense.

Â For example, if we plot the model out for highway miles per

Â gallon in the ranges of 0 to 100 we get negative values for the price.

Â This could be because the values in that range are not realistic.

Â The linear assumption is incorrect or we don't have data for cars in that range.

Â In this case, it is unlikely that a car will have fuel mileage in that range.

Â So our model seems valid.

Â To generate a sequence of values in a specified range,

Â import numpy, then use the numpy arrange function to generate the sequence.

Â The sequence starts at one and increments by one till we reach 100.

Â The first parameter is the starting point of the sequence.

Â The second parameter is the endpoint plus one of the sequence.

Â The final parameter is the step size between elements in the sequence.

Â In this case, it's one.

Â So we increment the sequence one step at a time.

Â From one to two and so on.

Â We can use the output to predict new values.

Â The output is a numpy array,

Â many of the values are negative.

Â Using a regression plot to visualize your data is the first method you should try.

Â See the labs for examples of how to plot polynomial regression.

Â For this example, the effect of the independent variable is evident in this case.

Â The data trends down as the dependent variable increases.

Â The plot also shows some nonlinear behavior.

Â Examining the residual plot,

Â we see in this case the residuals have a curvature suggesting nonlinear behavior.

Â A distribution plot is a good method for multiple linear regression.

Â For example, we see the predictive values for prices in

Â the range from 30,000 to 50,000 are inaccurate.

Â This suggests a non-linear model may be more suitable or we need more data in this range.

Â The mean square error is perhaps

Â the most intuitive numerical measure for determining if a model is good or not.

Â Let's see how different measures of mean square error impact the model.

Â The figure shows an example of a mean square error of 3,495.

Â This example has a mean square error of 3,652.

Â The final plot has a mean square error of 12,870.

Â As the square error increases,

Â the targets get further from the predicted points.

Â As we discussed, R-squared is another popular method to evaluate your model.

Â In this plot, we see the target points in red and

Â the predicted line in blue and R-squared of 0.9986.

Â The model appears to be a good fit.

Â This model has an R-squared of 0.9226.

Â There still is a strong linear relationship in R- squared of 0.0806,

Â the data is a lot more messy but the linear relation is evident.

Â In R-squared 0.61, the linear function is harder to see but on closer inspection we see

Â the data is increasing with

Â the independent variable and acceptable value

Â for R-squared depends on what field you're studying.

Â Some authors suggest a value should be equal to or greater than 0.10.

Â Comparing MLR and SLR is a lower MSE always implying a better fit?

Â Not necessarily.

Â MSE for an MLR model will be smaller than the MSE for an SLR model,

Â since the errors of the data will decrease when more variables are included in the model.

Â Polynomial regression will also have a smaller MSE than regular regression.

Â A similar inverse relationship holds for R-squared.

Â In the next section we'll look at better ways to evaluate the model.

Â