In the final part of this week's material, you'll learn about an evaluation concept you can use for GANs. Specifically, you'll learn about precision and recall in the context of GAN evaluation, though this will be applicable to all generative models and you'll gain some intuition for how they relate to fidelity and diversity. Additional evaluation metrics have recently emerged and few of them use a notion of precision and recall that I find particularly noteworthy and interesting, especially to break down fidelity and diversity into more familiar machine learning topics. If you happen to be familiar with the notion of precision and recall and classifiers, this is a cool extension to generative models such as GANs. Imagine a space of all reals, the true distribution Pr here, where P stands for the probability distribution, r stands for reals. Over here in red, Pg is all of what the generator can generate, the fake distribution. The best thing that can happen for your GAN is for Pg to completely overlap Pr in every single way where there's no Venn diagram business or concentric circles. You don't even want to be a subset of Pr, you want to be completely Pr. Essentially for your red Pg, your generated distribution to be completely like Pr, your real distribution. You might know that there is this important intersection area and both precision and recall will be considering this intersection area. First precision, focus on the points within the dotted line. All of these points are generated samples and the ones that are filled in are generated samples that overlap with the real distribution, the blue and the ones that are not filled in are generated samples that don't overlap with the real distribution. Precision measures this intersection area, specifically, the fake examples that intersect with the reals divided by this whole dotted line area of all the possible things that generator can generate. Intuitively, this means you overlap where the generated images look pretty real because it's overlapping the real space divided by this entire space, which not only includes his real self, but also includes some samples that look very much fake like this tennis ball dog or I'm not even sure what that is, some horse thing, so just pure gunk. You don't want this extra gunk, so the more stuff you have out here, the worse your precision will be because your precision will be these overlap points divided by all fakes. Your precision is essentially the fakes that look real over all of your fakes. You want your denominator to be as close as possible to that overlap. Perfect precision means that everything you generate will look real. But, that doesn't mean you cover this entire real thing. You can be a subset in here. Precision relates to fidelity because there's a notion of this extra fake stuff that's modeled but is unnecessary. The better your precision is, the higher the quality of all your generated samples are. The truncation trick can help with precision reducing your weird stuff. But it could also hurt recall too, because it could shrink all this generated part into something much smaller. On that topic of recall, recall on the other hand, is how much overlap there is divided by all of the real sample. Essentially, the mirror of precision. That includes the stuff that the generator is missing and isn't able to model. These points out here, which are real samples, but perhaps the generator was not able to model them. They look like pretty weird real samples too with this donkey looking in and this car full of watermelons. It's basically a measure of how well the generator can model all the reals and it ignores all that extra gunk that we were seeing before and that's not used in the measure of recall here. Recall relates to diversity, because you can see if the generator models all the variation in the reals or not. Just to compare with precision, here it overlap of reals over all reals. Recall is trying to get a sense of how well the generator is able to model all possible real images. Models I've seen tend to be pretty good at recall meaning somewhere at some value of Z of that noise vector, they can actually generate every single image and the real dataset and perhaps more. The whole real distribution of all possible dogs or faces or whatever it is trying to model that is in your real dataset at the very least. But typically, these models, especially when they're large, produce a lot of extra gunk in the parameters that didn't get a lot of feedback from the discriminator because there are just too many parameters in the model, so some parts of it will inevitably produce gunk because it's already been able to model all the reals and doesn't need a cutback. What that'll look like is Pg looking more like this, being a superset of the reals. The state of the art models can often be bad at precision as opposed to recall. This is why the truncation trick can come in handy for downstream applications to weed out all that extra gunk, which is everything outside of the dotted line then. In summary, you now know about precision and recall in the context of generative models. You can now relate precision to fidelity and recall to diversity to some extent in your images. Models still tend to be better at recall or modeling the distribution of real images because of the sheer number of parameters that are in our models now. But you can apply the truncation trick in your downstream applications to improve their precision as well, to weed out that gunk. You're now at the end of this week's lecture material and ready to go on to the coding assignment. Good work and go have fun figuring out which one of your GANs or GAN checkpoints is best. Hope to see you build the next state of the art.