In this video, you'll learn how to extract features from your images, and these features are what you'll use when computing feature distance between those images. So specifically, you'll learn how to extract features using classifiers that have trained on so many images that these classifiers are generally regarded as being able to model natural images. Or photographs and these many images are typically from the famous RJ, data set called ImageNet, which I'll also touch upon in this video. So in order to compute the feature distance between real and fake images, you first need a way to extract the features from those images. You can get a feature extractor by taking the weights of a pre trained classifier, typically on many images and ideally on data on classes that are somewhat related to the features you want to extract. For example, if your classifier was classifying dogs and cats, then it would be a pretty good feature extractor for dogs and cats, but maybe not for office objects. In the weights of these pre trained models have essentially encoded features a lot of features as they have to, for example to classify dogs from plants. For example, it would have to have figured out what this wet nose Golden coat eyes are, as well as maybe also what a plant shape looks like and this might not be at the very last layer here. This might be an intermediate layer where there are more nodes downstream and they would have to have figured out some of these features along the way. Although in reality, note that these features extracted are a bit more abstract and don't necessarily correspond to our notion of a wet nose or Golden coat or a plant shape. You might be wondering if you need to build and train a you classifier each time you evaluate a GAN. Thankfully the answer is no as there are neural networks that have been pre trained on millions of images and hundreds or thousands of classes. You can plug in an used to extract features from an image input and these images are typically considered broad enough to be applicable to most natural images. That is, photos of the real world and so these pre trained classifiers are readily available for public use and offer a method that can be used across a variety of GAN, so all different GANs that you train. So these classifiers can classify many different classes and as a result they encode lots of relevant features in their network. All right, so to use this pre trained classifier, you actually don't want the end task of classification at this output. You don't care about the actual classifier part, but the rest of the network is valuable. Its weights have learned important features that help with this classification task at the end. So as a result you can lobby off this final classification layer and just grab the outputs from an earlier layer that contains useful information about the image you're putting through. So here you see the final layers of a generic convolutional neural network, or CNN a convolution, pooling. Fully Connected or linear layer with an activation right after that goes into your output predictions and you can usually lobby off this last fully connected layer. Because the most common place to get output features is the pooling layer before that last fully connected layer that's used for classification. Because at this layer, at this pooling layer, you have the most fine grained feature information. That is, this layer must have encoded quite a bit of information so we can then go on pass that information on to this fully connected layer to classify that image with just one more fully connected layer. So you can then truncate the network as you saw with lobbying this last part off and use this last pooling layer. And this just means the values that come out of this layer, you take this essentially intermediate values out and these values. Let's say there are 100 of them come out from your let's say 500 by 500 output, and so those hundred values out here represent the features extracted from this model for this particular input image. And you can think of this layer as the feature layer because it is extracting those feature values. And you can also tell quite immediately that this feature layer will be outputting far fewer values than your input, so it condenses the pixel values in your input into these 100 features. It's also okay to use an earlier layer for your feature layer, so using the last layer is just convention because it has the most information. But it also could be over fit to your data set and task like the classes for your classification out here. So that means earlier layers are better for getting more primitive information, so as you go back along the network to an earlier convolution or typically pooling layer. And for those earlier layers, on one extreme, if you go to the earliest layer would probably just mean vertical edge detection and be able to extract features around the vertical edges in your input. Versus the last pooling layer out here, which will have features that apply to specific image classes, like whether there's a cat in the image or not, if the output is predicting a cat somewhere here. So choosing what to call your feature layer is something you can experiment with, but I typically suggest starting with this last pooling layer because it will be trained on a huge data set with a broad task. And that huge data set is typically ImageNet, so a good way to start is typically with a classifier that was pre trained on ImageNet. An ImageNet is a data set with more than 14,000,000 images and 20,000 categories and you can see some of the categories on these images here. ImageNet that includes various types of dog breeds as well as cat breeds and species and almost any other type of animal you can imagine, along with several types of objects. And in this data set, essentially provides a plethora of information that can be meaningfully encoded in a classifier that is trained on it. In the features you extract from that classifier are sometimes known as ImageNet embeddings, because they embed and compress that information from an image. Into a smaller vector of information using the weights from a network trained on this ImageNet data set to then guide that embedding process. And these features, these embeddings are just vectors that exist in what's more broadly known as feature space or embedding space. So you can imagine an image like this going in, and the output features are really just a vector coming out of negative three, two, five. A various values that represent a vector in a certain space, which then represent your features, and this again can also be known as an embedding. So in summary, you saw how to get a feature extractor from a pre trained classifier by cutting the network and using weights from the layers before that output layer. And its most common to use that last pooling layer, but if you use an earlier layer, you can extract more permanent features. Such as vertical edges or natural looking patterns, using a classifier that was trained on large data sets such as ImageNet, which has millions of images. You can encode really meaningful information on your images as features. And then the following lectures you will learn about a popular classifier that uses ImageNet data and how it can be used to evaluate your GAN.