In this video, you learned about style mixing, which mixes the different intermediate noise vectors, as well as stochastic noise, which adds a little bit more extra noise into your model and your images. In this video, you'll learn about controlling coarse and fine styles with StyleGAN, using two different methods. The first is style mixing for increased diversity during training and inference, and this is mixing two different noise vectors that get inputted into the model. The second is adding stochastic noise for additional variation in your images. Adding small finer details, such as where a wisp of hair grows. First some intuition on cell mixing. You see a tabby cat in this first row, and these are generated tabby cats as well as a tuxedo cash generated along these three images in the bottom row. In the middle row here, what you see is actually a mix of the two of a tabby cat and a tuxedo cat. This is what style mixing is trying to get at. If you can generate images from the first row and you can generate images from the third row, then maybe you can generate images from that second row by mixing them in some way. You had gotten a little bit of a sneak peak on how this might work in a previous video. Although W is injected in multiple places in the network, it doesn't actually have to be the same W each time. You could actually say, "I don't want to inject W into this last block here or I don't want to put it in any of these here, I only want to put it there, or I only want to put in the first half of the network and only in the second half," for example. What you actually do is you can actually have multiple W's. You can actually sample a Z, let's say Z_1. You're sampling a Z that goes through the mapping network, you get a W, it's associated W_1, and you injected that into, let's say, the first half of the network. Remember that goes in through AdaIN. Then you sample another Z, Z_2 now, and that gets you W_2, and then you put that in to the second half of the network. You inject that into all the different blocks in the second half of the network. Again, that goes through AdaIN layers that are part of those blocks. The switch off between W_1 and W_2 can actually be at any point really, it doesn't have to be exactly the middle for half and half the network. This can help you control what variation you'd like. The later the switch, the finer the features that you'll get from W_2. W_2 here is largely informing these finer control features, and your W1 was originally controlling these top more coarse features. As you might expect, this improves your diversity as well since your model is trained like this, so that is constantly mixing different styles and it can get more diverse outputs. Let's take a look at what that might look like. Here's an example using generated human faces from StyleGAN. Here in this first column, are our images generated by W_1, and you can imagine this being W_1A, this is W_1B, and this is W_1C. These are definitely not the same W's generating these three images. But let's call them all W_1 for now. W_2 will be on this first row here, where these are all W_2, A, B, C, D, E. These are all W_2 here. What's interesting is that this first row here, if you only look at this, is actually getting coarse styles from W_2, so from the top row here, it's getting those coarse styles, and so you can see rough face shape and maybe some type of gender-related thing, and then it's getting finer styles and maybe middle styles from this W_1 image right here", just as one imagery here, that's being applied. Each of these are a mix of these two different intermediate noise vectors that generated those associated images from the top here, and there's one image here. Then if you look at this last rows, we're skipping the second row for now, if you look at this last row with this woman here, you're actually getting only fine styles from W_2, so you're only getting fine styles from the top here, and what you can see is that those finer styles are just informing much smaller and slider variation to this person, and this person takes on all the coarse and even middle styles of this original vector that generated this face. These all look pretty similar. In the middle here, it's a mix of both. You see the vector that generated this image is actually being performed with middle styles from the ones up top up here. You see you definitely much more variation than the bottom over here, but you see variation as informed by these top images compared to the first row. You see it being much more similar to this original image, but of course not so address against this bottom here. This is what the style mixing is all about. You can input different W_1, W_2 vectors, and you can get these mixes and you can control the degree to which how much you want of one image versus another, and what type of styles coarse, middle or fine, that you want from each of these intermediate vectors. It's definitely much more than just coarse, middle, and fine. These are just three broad ways of thinking about it because you're injecting your W intermediate noise vector many different times, much more than three times into your StyleGAN generator. That was really cool. But what about slider variations that don't require you to mix two different images, that don't require you to necessarily say, "I want the style from that person versus this person. I want to just perturb or see differences with this one thing I generated." StyleGAN has this too, which is adding this additional noise to the model, which will add stochastic variation to your image. That's shown here. What's really cool is that this first half is injecting random noise into the finer layers, the later layers of your model, and that gets the person to have these much smaller curls in the hair and these eyebrows that are more wispy, versus injecting that noise in earlier layers of the network where there is more coarse variation that's expected from the noise, and you get these larger curls on the image and a smoother eyebrow it seems here. In order to do that, is actually a separate process of injecting noise, so it has nothing to do with Z or W here. With our noises is that it'll actually be added in before your adaptive instance normalization. But first you want to sample noise from a normal distribution, so you just sample completely random values from this, and then those noise values are then concatenated to your X, which is your convolutional feature map output before it goes into adoptive instance normalization AdaIN. That's just adding some stochasticity, some randomness into your image, into those values. The degree to which this noise actually affects these convolutional outputs is controlled by a factor, let's call it Lambda 1 here and Lambda 2. These are learned value. Lambda 1 could be, let's say 0.00001 so it doesn't affect it that much, and let's say Lambda 2 is 0.5 and, so does matter a lot, it is added a lot onto that convolutional output. These are not real, actual values of how much it necessarily does scale it. It is a learned value in terms of how much this does help. This variation can even change super subtle things that I think is so cool and that's changing this wisp of hair. This is a zoom in into this person's pair over here that's generated, and it's just so slight in terms of the arrangement of this person's hair as well as this person's hair, which represent, StyleGAN's ability to model all of that. Though I will say and I do pick up on this a lot, is that this baby at the bottom here doesn't look very real. Not all outputs looks super real. To recap style mixing with your intermediate noise vectors, can increase the diversity that the model sees during training and allow you to control coarse or finer styles. Stochastic noise is another way to inject variation into your output. Also the coarse or fineness depends on where in the network your style mixing or noise is added in earlier for coarser variation and later for finer variation, which is pretty consistent across neural networks, including classifiers.