In this lecture you'll learn about evaluating GANs. Starting with understanding a couple of criteria or properties you want your GAN to have. So first you'll see why it's challenging to evaluate a GAN. And then you'll learn about two of the most important properties, fidelity and diversity. So evaluating GANs is similar to evaluating other models in that you typically take a model checkpoint, weights frozen at a certain point. And compare its output against some metrics, and these metrics could be used across models, not just your own for an independent evaluation. But evaluating GANs is a particularly challenging task in an active area of research that's seen a lot of progress recently. So to illustrate, first in supervised learning with a classifier. There are tests you can pass on which have labels on how your images should be classified. And based on these labels, you can evaluate if your model is right or wrong, if it's correct or not, you have a sense of correctness here. And usually there's like held out test set on which you can evaluate how right or wrong your model is. And this test set can be used to evaluate your model or other peoples models. However, with a GAN you pass in some random noise to it and you get this fake image. But there's no concrete way of telling how realistic these generated images are. You don't know the exact pixels it's supposed to generate, you can't say. This pixel right here is slightly off or right there is supposed to be green. And so because there's that, there's no clear goal for what pixels you're supposed to generate given this noise vector going in. So this model is more like an advanced artist learning how to paint masterpieces. As opposed to learning exact brush strokes in a known painting. Which is just this correctness from the classifier, because there's a clear right and wrong. Additionally, the discriminator in this GAN which classifies real versus fake doesn't ever reach perfection. And often overfits to discriminating real versus fake images for its particular generator. So you might think wow, because its classifying between real and fake images maybe it could be useful for this. But no, it's overfitting to your generator, so it will likely think a lot of images from your particular generator. Even though they look realistic, are in fact fake because it can pick up on certain qualities. And so those certain qualities could be small, sometimes even perceptible things that the generator is producing. And because of this, there are no perfect or universal discriminators that can look at two generators. And say for sure that this one is better than the other, so then, how do you evaluate GANs? Well, you can start by first defining the desired criteria you want from these properties you might want. And one primary property, one primary objective is certainly fidelity or the quality of your generated images and how realistic they look. And you can think of quality overall and fidelity overall as its realism factor but also the crispness of the image. For example, a blurry face could still look realistic, but it wouldn't be high fidelity per say. So the fidelity of this picture of a person is pretty good here. But just generating a single image is not what you want from your generator. A good generator also produces a good variety of images, it wouldn't be very useful to just produce this one image here. So the second property you typically care about is diversity. What's the range, what's a variety of images this generator is able to produce? Is it able to produce the diversity that's inherent in the training data set or the desired classroom modeling? That is, of all dogs in this case, can it model all types of dogs, all breeds of dogs, dogs in different places, in different positions? Or is it just going to generate a single, very realistic image, like with this face over here? And so that's why when you're evaluating again, it's important to consider not only fidelity, that quality of images which obviously is very important. But also this diversity aspect as well of what's the range of images it can generate. To get an idea of whether your GAN covers a good variety of what's expected and real. And so, all of these can be pretty tough, because how do you really evaluate or quantify whether something has sufficient diversity. When you don't want to necessarily memorize the training data set? In summary, we have two properties, fidelity and diversity, and there on two axes. And sometimes you can think of them as trading one off for the other. So now diving a bit deeper into each of them. So for fidelity, you can consider how realistic your images are looking from your GAN. So think of it this way, so for each fake sample, how different is it from its nearest real sample. So you can also think about it more generally as, how far are 100 fakes from 100 reals to get a better representation? And you don't want a one hit wonder GAN, you don't want to miss good and bad high and low fidelity samples. Because those could differentiate this GAN from another GAN, you want a GAN that can consistently give you good results. And so there are a few ways you can do this comparison and you'll learn about some of them later in this course. On diversity, you want the generated images to cover the whole diversity, the variety of the real distribution. Meaning if a GAN is only generating the same single image, but it's very realistic, that's not a well performing model. You might remember that this is similar to what happens when there's mode collapse. So you want a GAN that can generate a variety of different images, such as these 8s written in different styles. And you can also measure and get a sense of the spread of say, 100 for examples to the spread of 100 real ones. So when evaluating GANs, fidelity and diversity are both criteria you care about. By capturing fidelity and diversity, you can get a pretty good notion of how well you generator is generating fake images. And this could be just looking through how close your fake images are to your real images. So in summary, it's challenging to evaluate a GAN because there is no global discriminator. Giving a ground truth that would enable fair comparisons across GANs. And to evaluate a GAN, you want to consider fidelity or the quality of images. As well as the diversity or variety of those images coming out of your GAN. And with these properties, these criteria mind, you'll learn ways you can evaluate your GAN in the following videos.