Now if we think through our over and undersampling solutions, as we discussed earlier, in regards to possible solutions for unbalanced classes, we can also mix upsampling and downsampling. So you can use SMOTE to upsample and then to remove noise, your objects and not have to upsample too heavily, we can then downsample some of our majority class as well by removing points using either Tomek's link or by using Edited Nearest Neighbors. Now I'd like to briefly touch on an intuitive onsample technique for unbalanced datasets called blagging or balanced bagging. Hopefully you can tell at this point, data scientists love to mash together words. The idea here is to ensure to continuously downsample each of our bootstrap samples. So as you see, we have our bootstrap samples and then we downsample each of the majority classes and then you use these now balanced samples to learn each one of our individual decision trees. This will again allow for more weight to be attributed to that minority class, ensuring that we have a more balanced decision being made. Now I want to briefly go over some steps to keep in mind whenever we're working with unbalanced datasets. Now that first step is going to ensure that you first do your train test split before doing any of this over or undersampling. Recall that if we do this oversampling first, we can end up with values being both represented in the train and test set, which means we can very easily overfit, have that data leakage and even if we're synthetically creating those new samples, those values will be very close to those train set samples and we can still have that overfitting. So always do your train test split first. We also want to ensure that we use sensible metrics and we start here with a sensible metric, AUC, for example. Because this will give us a trade-off between our true positive rate and false positive rate for our minority class, and we can see that for different thresholds so that we can get a clear picture of this classification method as a whole and can even look here to get an idea of what threshold is optimal for your business objective. The same thing can be said for precision-recall curve as well. Thinking of precision and recall, we can also use our F1 score because this will always balance out the precision and recall and will punish you more strongly for getting either of those two too low for your given class. So unlike accuracy, it will not be skewed by unbalanced classes assuming again that you are testing it for that minority class, that precision or recall for the minority class. Then we have here Cohen's Kappa. This is a new one that we haven't discussed, but we felt it was worth at least letting you all know that it's available. This is best if you're working with a team and it's actually a measure of agreement between two different raters or two different models, where each rater will be classifying n items into mutually exclusive categories so just performing classification, and the goal here is to come up with a ratio of observed agreement between these two models as compared to the probability of there being agreement just by chance. So with unbalanced classes, we want to make sure that we have strong agreement and that agreement between those two different models are better than just agreement by chance. So the higher this value is for Cohen's Kappa, the more you can trust the agreed upon predictions of those two models. Then finally, we want to make sure that you do not use accuracy as that can be easily fooled when working with unbalanced data. So to recap, in this section, we discussed additional approaches to dealing with unbalanced data, starting with just using class weights hyperparameter for many of our models without messing too much with re-sampling. We then got into random and synthetic oversampling techniques such as SMOTE and ADASYN, with both relying on generating random examples based off of the K-nearest neighbors algorithm. We then discussed undersampling techniques such as near miss, Tomek links and Edited Neighbors, and then we also briefly discussed using balanced bagging or blagging to address unbalanced class data. Now in regards to oversampling and undersampling, I want you to note here that for both of them, they each have their advantages and practices for each one of them, for the specific techniques for each of them. But it'll often be difficult to tell for oversampling or undersampling which technique to use without cross-validation since we can almost never visualize as clearly as we did in each one of these examples. But when deciding between under or oversampling in general, remember what we discussed earlier. Undersampling will probably lead to a bit of a higher recall for that minority class at the cost of precision whereas oversampling will keep all the values from our majority class and thus will have a bit of a lower recall than undersampling, but better precision on those predictions. Now that closes out our video on balanced classes, as well as our lecture here, introducing many of the Machine Learning models available and the best practices. Now I would note here, as a practicing data scientist, having an intuitive feel for how each of these models work will go a very long way in ensuring you can choose and turn your models in a timely and rewarding version. Thank you.