In this video, you'll learn about another way to calculate the distance between reals and fakes using the Inception v3 model. This method was developed before FID and used to be widely used but now has been largely replaced by FID and is by enlarge more relevant to conditional generation. That said, the inception score which is what it's called, is reported in many paper. It's so good to know what it's measuring and how it differs from FID. Remember the Inception v3 classifier pre-trained on ImageNet that was used for feature extraction, the inception score also uses this model. However, unlike FID, you use the classifier as is not as a feature extractor. You keep the classifier intact and don't use any of those intermediate outputs. First you input a generated image and see what the inception v3 classifier says it's classified as. For example, maybe this image looks like it's 0.6 dog, 0.3 cat, and 0.1 bird. Intuitively a high value on one class, for example, a dog here with low values on the others suggests that this image clearly resembles one class over the others. That is, it is arguably high fidelity. You can observe this value for many samples to get a sense of how well overall this gain is able to produce clear objects based of course on the classifiers judgment. If this dog were 0.9, then this would do even better. It would have an even clearer dog, arguably. Now because you want your gain also to produce diverse outputs, then looking across many samples, you would expect that many different classes would be generated. You would expect across all your samples that were trained on all of these classes, perhaps that dog appears and cat appears and bird also appears, and it's not just dog each time. Again, this can be observed by using this classifier to then see what this distribution is. You'd want this diversity probability distribution over all the different classes to be nicely spread out and not spiky on a single class. More formally, the first measure which looks at the distribution of classes given an image, is measuring the distribution p of y given x, where given your image x, what is the distribution over those classes given that classifier? With images that have clear high fidelity generated objects, that distribution should have high pets with high probability are a few select classes, or even just a single class, and here maybe few select classes with overall very low probability on the remaining classes. What this is trying to approximate or signal is that there are clear objects being picked out in the generated images by your classifier. This is known as having low entropy because the probability distribution is clustered at those select points and not scattered all about. Entropy in a sense is randomness, and this fidelity measurement corresponds to low entropy. Now on the diversity front, you want lots of classes to be generated across your many samples. The distribution of classes across your samples should be high entropy, meaning it's not concentrated on a particular or a couple particular classes, which would be if you put your thinking cap on reminiscent of mode collapse because that means you're only a generating one type of thing and nothing else. This distribution is also known as the marginal label distribution or p of y. This is the distribution over all of your labels, all labels across your entire dataset or across a large sample. This might be confusing because you have high down here and you have low up here. But just keep in mind that these two are supposed to be very different, that is what's key here. With this notion of fidelity being low entropy and of diversity being high entropy, you could combine their measurements into a SQL score, which is convenient because then you can score your model and say I'm getting better or I'm getting worse by getting a distance between that diversity are high-value minus fidelity, a low value. The Inception scores specifically uses something called the KL divergence from p of y to p of y given x. KL divergence essentially tries to measure how much information you can get or gain on p of y given x, given just p of y. If you only have information on p of y, and so let's say p of y is really uniform on your different classes, It's just as uniform distribution, then what would you guess for your p of y given x? That would be very difficult. Meanwhile, if your p of y really spiky, then what would you guess for your p of y given X, you'd probably guess it was spiky around this area too. If p of y essentially isn't very informative like in this high entropy case because it's a lot of randomness and is all even across the classes, then you can't get much information on p of y given x, you've no idea what class it's from. However, if p of y is full of peaks or full of one peak right here, then you probably have a clue of what p of y given x might be. A good guess at least probably that peak there. At the same time, if p of y given x is full of peaks, it's harder to guess at that exact distribution than if you're just guessing, It's even across all of the classes. It's easier to then guess for this guy that it is even across all the classes. Now, that's some very basic intuition and no worries if you don't completely understand KL Divergence? For the purposes of understanding Inception Score, you can think of KL Divergence here as approximately getting how different the conditional label distribution for fidelity is, from the marginal label distribution for diversity, which is their relative entropy from one another. There's a reason actually why it's not called KL distance, because KL Divergence isn't equal in the opposite direction. So with these two terms flipped, it's not actually an equal value, but it's a very close notion to that sense. As a result, you'd expect to see a high KL Divergence, when your distributions are far apart, which is what you want. You want one to be high, you want the other to be low. In other words, basically, when fake images each have a distinct label and there's also a diverse range of labels among those fakes. Notationally, it expands to this out here and feel free to dive more into KL Divergence outside of this course, it's a generally very useful concept for machine learning and is a core information theoretic component behind evaluating models against ground-truth. But essentially, it does evaluate to looking at the conditional distribution here and also the marginal distribution here. The Inception Score then sums over all the images and averages over all the classes. You see this p of epsilon, perhaps better to say p of g, is samples from your generator and then at the end, there is an exponent. This is the same as doing e to the, this huge term up here and this exponent isn't really for calculating this value or important for KL Divergence in any way. It's actually just to give a nice human-readable score, like 100 for Inception Score and not 0.000001, which is not as useful and a very possible output of this inner term. So mathematically the lowest possible value is 0 and the highest is infinity. But during implementation in this case, the lowest possible Inception Score will be 1 and the highest or best possible score can be the number of classes present or 1,000 in this case, because the Inception-v3 classifier is trained on ImageNet with a 1,000 classes. Now, higher the score the better here for Inception Score and that means the entropy of the conditional probability distribution is low, finding relevant objects and features, while the marginal probability distribution is high, finding a diverse set of features. So typically when this score is low or bad, it's because both distributions either have low entropy, having high peaks, so no diversity or both distributions have high entropy with no clear objects found. However, as you might have guessed, there are a lot of shortcomings with using the Inception Score. First, the metric is easy to exploit or game on the diversity front, because if your GAN generates one single real image for each classifier class, so a 1000 images for each of the ImageNet classes then it actually gets a perfect score. But ideally, your GAN can do better than generate one image for each class. I'd like two golden retrievers, please. This is definitely a form of mode collapse, I can go undetected by Inception Score. Another big issue is that the Inception Score only looks at the generated samples. You might have noticed this, that I never mentioned the real samples here and it doesn't compare the generated samples to the real images. These proxy statistics might be a bit far off and idealistic and it's also dependent on the classifiers tasks and abilities. Finally, since the classifier is trained on ImageNet, the values and scores might be very imprecise in a way that's similar to FID's shortcoming. For example, what if the generated images have a lot of objects in it? You'd expect high entropy in the different classes, in the classifier outputs such as a bedroom or office full of different items. Because it picks up on many different classes in that one image or what if there are a lot of features that aren't detected by the classifier because they're not relevant, such as a generator that only generates profiles of faces, when the classifiers trained for mostly dog breeds and other objects or even a generator that generates human faces with features in the wrong place. But that looks real because components of the face are clearly found by your classifier. So spatial relationships in particular can pose a problem for the Inception Score. It only really makes sense when the training data-set is close to ImageNet and this is FID's shortcoming too. Despite all this, Inception Score still remains as one of the most widely used metrics for GANs after FID and is relevant mainly to conditional GANs that expect to clear classes. Now, you know how fidelity and diversity translate into low or high entropy, and how to find the divergence of both to arrive at the Inception Score. This score had been pretty popular, but it has now largely been replaced by fresh Inception Distance. But you'll see it in many, many different papers. It's important that you know how it works.