Now, your machine learning system will make mistakes. It's important to understand what these errors look like and how they might affect the user experience that's driven by the output of your machine learning model. In this module, we'll discuss some of the ways in which you can evaluate inclusion as you're developing and testing your machine learning model. One of the key things to really know, which will help in understanding inclusion and how to introduce inclusion across different subgroups within your data, is by understanding the confusion matrix. While you may be familiar with evaluating your model over the entire dataset, it's also important to evaluate your model over subgroups. So, instead of just looking at how your model performs overall over your entire dataset, we'll focus instead on breaking the performance down to the subgroup that you wish to improve performance on. For example, suppose you're doing face detection. Essentially, you're building a machine learning model to say whether or not there is a human face in a photograph. This is not necessarily an easy problem. Your subgroups might be men, women, adults, children, people with hair, people who are bald. You want to look at the performance of your model across all these subgroups to identify areas of improvement. So, a common way that we evaluate performance in machine learning is by using a confusion matrix. Now, there are other methods for other types of problems, but for the purposes of this module, we'll focus on the confusion matrix to explain these points. The idea is using the confusion matrix in order to look at inclusion. And you do this by first creating the confusion matrix, but you do so for every subgroup in your data, subgroups of what you're interested in measuring performance. Now, in the confusion matrix, you have comparisons between your labels, which, of course, may or may not necessarily reflect your ground truth because sometimes we don't necessarily have access to the ground truth. But nevertheless, you're comparing those labels to your model predictions. From here, we look at the positives and negatives. So in our labels, there are some things that are considered correct, we will call those a positive label, and there are some things that are considered incorrect, and we call those negative labels. On the machine learning side, we have positive predictions about what there is and we have predictions about what's not there, and those are called negative. We compare this in the confusion matrix in order to understand the decisioning machine learning system is inferring, starting with the true positives, which is when the label says something is there and the model predicts it. So, in the case of face detection, a true positive would be when the model accurately predicted that there is a face in the image. Now, when the label says something exists and a model doesn't predict it, that's a false negative. So, using the same face detection example, the model does not predict there being a face in the image when in fact the label suggests that there is a face. When the label says it doesn't exist and your model also doesn't predict it, that's what's called a true negative. Basically, what that means is, using this face detection example, the model not predicting that the face is present in the image is correct because it's also not present in the label. And lastly, this is the false positive case, where the label says there is no face but the machine learning model predicts that there should be a face. So, in this instance, perhaps there is a statue in the image and the model falsely identifies that statue as having a face. But really, what I want you to focus in on here are the false negatives and false positives. Remember, false negatives are the things you incorrectly do not predict, things you exclude when instead it should have been included, and false positives are things that you incorrectly predicted, things you include that aren't actually there in the label and should have instead been excluded. And these are often referred to as type I errors and type II errors in other domains. But the cool thing about this sort of basic breakdown into four different kinds of matches to the label is that you can start to calculate a ton of different metrics that can be used to gauge the amount of inclusiveness in your model.