In this demo video, we're going to walk through the process of tuning hyperparameters in order to prune decision trees using scikit learn. So remember that one of our project objectives is to predict a customers daily average number of steps based on their other recorded metrics. Therefore, we're interested in another user level aggregation, and we're going to do the same thing with the user metrics lifestyle table that we created in the previous demo. So let's just take a quick look at that table, and this looks pretty familiar from before. Now we're going to convert it to a pandas dataframe so that we can work with it more easily in scikit learn. And then we will set up R, X and Y. And do we train, test, split? Now we're going to start with fitting a base decision tree. So this is just a baseline model where we don't do any hyperparameter tuning. And we're calling our decision tree regressor and fitting it on our training data. And then we will look at the results using the R squared. So we see that while we get 100% accuracy on the training set, we only got 84.5% accuracy on the test set. That's a pretty big difference, and since the training score is so high, we can see clearly that there is over fitting and fairly high variance, since the test set its core isn't very good. And the reason this is happening is because the decision tree is unpruned and we didn't adjust any hyper parameters or prune it at all. So it's fitting 100% perfectly onto the training set, but then it doesn't generalize very well when it comes to the test set. So remember, we talked about some of the hyperparameters that we can tune in a decision tree to prevent overfitting on our training set. One of these is the maximum tree depth. Which limits how deep your tree grows, or how many levels of splitting that it goes through. So we're going to instantiate a new decision tree regressor model, and this time we're setting our Max depth hyperparameter to four. There are different ways to go about choosing hyperparameters, and that is a whole another lesson. So for here will just randomly choose the max depth and not worry too much about it. Then will fit this model onto our train set. And then we will look at the score. Okay, so we see now that there's not so much of a difference between the scores. The models no longer overfitting to our training data, and it got slightly better on the test set. But now it just has high bias, which means it's not actually learning the training set very well, and that makes sense because we're restricting the depth. This is a pretty shallow tree, so it doesn't have a chance to really learn that training set very well. And this is not ideal because neither of these scores are very good, so we'll try tuning another hyperparameter. This time will look at the minimum node size an remember that this sets your requirement of each node has to have a minimum number of data points in order to continue and split it further. So if you're node reaches the minimum number of samples it needs in order to split it, will your tree will just stop, and it's fun. And this time we will call our new model DT node and you'll notice that we're continuing to use next step. But this time we changed it from four to six just to give the tree a little bit more of an advantage to learn that training data set. And we're just going to kind of randomly choose three for the minimum samples per split. And so we will go ahead and train the model and see how it does. Okay, well, it's getting better, so we have less bias of 91.5 score on the training set. So it's learning the training set a little bit better, probably because we increase the max depth. But we see that the test set, while it is a little bit better, it went from 87.6 to 89.2. The bias and variance are still a little high. 'cause these scores aren't great. And the performance on the test set is still pretty far below the training performance. There a little too far apart from each other. So then we want to try something else. This time will look at studying the minimum leaf size. And this is the requirement that at least a certain number of data points are in each leaf in order to create that split on those leaves. Okay, so we're going to keep next steps, but this time will increase it a little bit more to 8. We're going to have min samples per split, and we're going to decrease that to two, and then we'll kind of randomly try 3 for the minimum samples per leaf. Okay, very slight improvement, but it didn't really help much, so will try one more hyperparameter we can adjust for decision trees. And this time we will look at Max features and again we're keeping the previous three hyper parameters. But this time will add Max features and say 3 for the maximum number of features that the tree can consider at each split. And this introduces randomness because at each split that remakes. It's randomly choosing 3 features from which it can decide how to make the split instead of the entire data set and all of the features. All right, we train our model and not really a big change. The training score 92.9 is still not great on the test score 89.9. It's not terrible, but there are ways that we can improve and do better with our predictions. We've seen that even with tuning the hyperparameters it can be difficult to get a decision tree that doesn't have high variance even though we've restricted. How well it fits how much overfitting it does to the training set?