In this video, you'll get familiar with image-to-image translation and see how it works on actual images, as well as on other types of input. First, you'll get to learn what image-to-image translation is and see some examples of different kinds of image-to-image translation as well as other types of translation tasks. First step, what's image-to-image translation? Well, imagine taking an image and applying a transformation to it to get a different image, like translating across different styles of an image. So here you see this black and white image getting transformed into a colored image. In a way this is actually a type of conditional generation, but it's conditioning on the content of one image, say this black and white image to create another and that's saying, conditioning on this image give me a colored style of that image. Another example image-to-image translation task is going from a segmentation map. These are segmentations of this road, of a car or pedestrians, of the sidewalk, of trees. Going from that to a realistic photo that maps onto trees where it's labeled trees, road where it's labeled road, and car where it's labeled car. It's going from one domain to another. You can imagine that with the semantic segmentation map, you can get realistic views that look very different. It doesn't have to just be corresponding to this one realistic image, this photo here. Another cool image-to-image translation task, or perhaps here's video to video, where you have frames of a video go map onto the frames of another video. It is essentially image-to-image translation, but for many images, many frames. You can take this black and white trained film from back in the day and make this old film 4K with some realistic color on it as well. This is actually not just adding realistic color but going from low resolution to a high resolution, also known as super-resolution. An image-to-image translation is really where GANs have been able to shine because they're able to create these very realistic images in here videos, every single frame looks very realistic still. Now diving deeper into a type of image-to-image translation that you'll be exploring first. The first is paired image-to-image translation. What that means is that you are pairing the input and then an output. This means that for your training dataset, every single input example that you might have, you have a corresponding output image or target image that contains the contents of that input image with a different style. So it maps one-on-one. Basically what you do is you condition on the input to get the output image. As you can probably tell the output pairs which are the second images in each of these, the facade in this building and the colored image and this butterfly, they're not the only type of image that could be generated from that black and white photo or that segmentation map that you see. You can imagine a lot of different buildings that could be generated from those labels and a lot of different butterfly types that can be generated from that black and white image. So the paired output image is not necessarily the ground truth, but a ground truth, a possibility. Here are other examples going from day to a night time photo, also edges to a photo and you can actually get edges to create this paired training data by just using an edge detection algorithm, which is a typical computer vision algorithm. That's pretty cool because you can easily create this paired dataset. Then you again can actually go from any kind of edge that you do draw to a realistic photo after it's been trained. Then image-to-image translation can actually get even more complex. Instead of conditioning on a single input image, you can take as input models wearing different clothes and also a point-wise map of where they should be standing, their pose. Those dots represent different poses. Then from these two pieces of information, you want your game to generate that person in a different pose. That's what you see on the right here. There are certainly other types of translation tasks that go beyond just images. I think of image-to-image translation as really a framework on which you can build and understand how this mapping works. But really it can work across other modalities as well. For example, you can go from a text to image. Here you see the text. This bird is read with white and has a very short beak. You can imagine again, looking at that text and generating exactly with that photo should look like. One other cool application is neural talking heads or taking an image, say, of Monroe and then taking face vectors from a different person, say me or you, and then you can actually speak through Monroe. These are actually not Monroe speaking at all. These are other people making facial movements and then conditioned on those facial movements, those face vectors or face landmarks plus this image of Monroe, you can animate this image of Monroe, and you can do this for Einstein or Mona Lisa as well. In summary, image-to-image translation is a framework of conditional generation that transforms images into different styles. Taking in an image and transforming it to get a different image of a different style, but maintaining that content. Because GANs are really good at realistic generation, they are really well-suited for this image-to-image translation task. You'll learn more about some of these GAN models in the next few videos. Of course, there are other types of translation tasks, such as text to image or image to video or image plus faced mark to video. These are definitely things you can explore and understanding the image to image translation framework will set you up to understanding all of these.