So, we've covered some of the ways in which you can make your machine learning model more inclusive through evaluation metrics. But getting the best results out of a model requires that you truly understand your data. The challenge here, though, is that sometimes, datasets can contain hundreds of millions of data points, each consisting of hundreds or even thousands of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. The key here is to utilize visualizations that help unlock nuances and insights into large datasets. And in this section, I'll talk about an open source data visualization tool called Facets. Facets was developed at Google and is one of the ways in which you can make machine learning models more inclusive. So, there's two parts to Facets: Overview and Dive. And in this slide, you're seeing the screenshot of Facets Overview, which automatically gives you a quick understanding of the distributions of values across the features of their datasets. The example you're seen in this slide comes from the UCI Census data. The data was extracted from the 1994 Census Bureau Database which contains anonymized information about the United States' population. The information in this dataset includes demographic and employment related variables such as age and salary. This dataset was put together by the Research Committee and is often used as a prediction task to determine whether a person is likely to earn $50,000 or more annually. Multiple datasets, such as a training set and a test set, can be compared on the same visualization. With Facets, common data issues that can hamper machine learning are pushed to the forefront such as unexpected feature values, features with high percentages of missing values, features with unbalanced distributions or distribution skew between data sets. Using the same screenshot from the previous slide, what you're seeing here are two numeric features of the UCI Census Dataset: Capital Gain and Capital Loss. The features are sorted by nonuniformity with the feature with the most nonuniform distribution at the top. Numbers in red indicate possible trouble spots. In this case, numeric features with a high percentage of values set to zero. The histogram at the right allows you to compare the distributions between the training data which is in blue, and the test data which is in orange. Facets Overview can also visualize categorical features. In this example, what you're seeing here is a breakdown of the target feature which is the label that represents whether or not the person earned an annual salary of more than $50,000. But in particular, what we're looking at are all the instances where the annual salary is less than or equal to $50,000. But do you notice something's suspicious about this target feature? Notice that the label values differ between the training and the test datasets due to the trailing period in the test set. Facets Overview even went so far as to sort these discrepancies by distribution distance with the feature with the biggest skew between the training, which is in blue, and tests, which is in orange, at the top. Encountering a label mismatch like this would cause a model trained and tested on the data to not be evaluated correctly. Now, shifting over to Facets Dive, you can see in this slide that it provides an easy to customize intuitive interface for exploring the relationships between the data points across the different features of a dataset. With Facets Dive, you control the position, color, and visual representation of each of the data points based on its feature values. More specifically, in this example, Facets Dive is displaying all data points in the UCI Census test dataset. The animation shows a user coloring the data points by one feature, relationship, fastening in one dimension by a continuous feature, age, and then fastening in another dimension by a discrete feature, marital status. In Facets Dive, if the data points have images associated with them, the images can be used as the visual representation. So, in other words, it's not just only limited to categorical or numerical features. The example you see in this image comes from a research-based image dataset that contains many objects and animals in the world used to train an image classifier. The Ground Truth Labels are arranged by row, and the Predicted Labels are arranged by column. This configuration produces a confusion matrix view allowing us to draw into particular kinds of misclassifications. In this particular example, the machine learning model incorrectly labels some small percentage of true cats as frogs. Can you spot the frog cat in this image? The interesting thing we find by putting the real images in the confusion matrix using Facets Dive is that one of these true cats that the model predicted to be a frog is actually a frog from visual inspection. With Facets Dive, we can determine that this one misclassification wasn't a true misclassification of the model. Instead, it was actually an incorrectly labeled data that was featured in a dataset. So, the hope here is that tools such as Facets can help you discover new and interesting things about your data that will hopefully lead you to creating more accurate and inclusive machine learning models.