Hi. In this coding tutorial, we will get our hands dirty with normalizing flows. More precisely, we will be dealing with masked autoregressive flow. We will start by building model based on a single masked autoregressive flow bijector and we will then enrich our implementation to make a more expressive model with more layers consisting of chained masked autoregressive flow bijectors and in between, permute the given dimensions. Let's get started. In this tutorial, will rely on the two minus dataset. This dataset represents two interleaving half circles. Although you see two labels here, we will not use them in this tutorial. We first start by defining the base distribution, which in our case will just be a standard normal distribution, then we have to define the trainable distribution. Let's first start by defining a function that make a masked autoregressive flow bijector. This function will take in two arguments, the hidden units and the touch of the activation function. First, we made the autoregressive network, params 2, then we can specify the event shape and the hidden units and the activation comes from the definition of the function. Then we can make the masked autoregressive flow bijector by passing in the made bijector for the shift and log scale fn keyword argument. Finally, we can construct a transformed distribution by passing in the base distribution, as distribution showing how cases will be the standard normal distribution, and by passing in the masked autoregressive flow bijector, as bijector. We can also specify the event shape, in our case, this is two. Then we make free imports for the plot. Here we have a plot routine to plot the contour. Now we can plot. Let's use the make masked autoregressive flow function to define a new transformed distribution, that we'll have as activation function activation. In our case, let's start with a value activation, then we can use the newly define plot routine to plot the contour. Here we are. Now, lets try different activation function and take a sigmoid, for example. You can see that the plot looks different. Now let's make samples. We can sample from the base distribution and then we can construct two lists, one containing the names of the base distribution and the name of the bijector of the trainable distribution, in our case, this will just be masked autoregressive flow, and a second list containing this sample from the base distribution and the result of the forward method of the bijector of the trainable distribution. Now we have another plot routine that will plot the element of the symbols list and that will use the names listed to title the different plots. Now we can plot, here we are. Now let's move forward to training. Here we have a training routine. You can see here that in order to define the model, we compute the log probability of the input with respect to the trainable distribution. We then compile the model using the other optimizer. Now let's use this training routine to train our trainable distribution. Let's set 600 epochs and let's display the losses every 50 bucks. It shouldn't be too long as we only have one bijector for this trainable distribution. Now that we have to train our trainable distribution, we can retrieve the losses, the training loss and the validation loss, and we can then plot the learning curves. Now let's make again the samples, but now we will have our train trainable distribution and we can use the previously defined plot routine to plot the samples. Now when we look at the second plot, we notice that we get something that starts to look like the two moon shape. Here we have another plot routine that enables to visualize the results of our training to compare the two moons dataset with what we get after training. As we said, it starts looking similar to the two moons. We can plot the contour. Now, we will exploit the idea that we can compose bijectors to make a more expressive model with more layers consisting of chained masked autoregressive flow bijectors, and in between, used the permute bijector to permute the hidden dimensions. The way to do that is pretty simple, we just have to stack the bijectors in a list. Here we use the make masked autoregressive flow function that we previously defined. We can take a ReLU activation function. Then we append this bijector in the bijector list. Then we add on top of that another bijector, a permute bijector, which takes a keyword argument, permutation, this will specify the order by which we permute the event dimensions. You should be careful at the end when we chain all these bijectors to discard the last permute layer. Now let's define the transform distribution. So the distribution will be the standard normal distribution, the base distribution. The bijector will be the flow bijector. Then we specify the event shape, two in our case. So here we have a function that automates the process of making samples. This will effectively go through the bijector list and stack the name of the bijector as well as the result of the forward transformation. We can then retrieve the names and samples list, can then plot to see what happens as we go through the different bijectors. So far not a lot of change, that's normal, we haven't trained our distribution yet. We can then use visualized training data to compare the data with the results. Nothing special, that's normal. We haven't done anything so far. Okay, so finally we can train our distribution. So as for before, we set the number of epochs to 600 and we plug the losses every 50 epochs. So this should be longer than before and that's normal because we have more bijectors in the trainable distribution. So we have more parameters to learn. Finally, we retrieve the losses, the training validation loss. We can plot the learning curves. We can see here that we have a large spike in loss in the beginning of the training. So we can test different optimizers and learning rates and see if you get better curves. Now let's use the make samples routine to retrieve the names and samples and we can then plot. So the second argument here specifies the number of lines. Okay, here we are. So that's a pretty nice plot. So you can see that we initially start with a normal distribution. You can see it on the top left figure with this centered cloud of points. Then this cloud of points gets gradually mapped and gradually transformed as we go through the bijectors until the end, where you can clearly identify the two months shape. Now we can use the visualized training data routine that we previously defined and compare the results of the trainable distribution with the data. Here we can observe that with this more expressive model, with this richer model, we have finally got something that is much more similar to the two months. In this coding tutorial, we have trained a pretty simple transform distribution made of a single masked autoregressive flow bijector. We have then seen how to enhance this trainable distribution by composing mask autoregressive flow bijectors. I hope that you are now more familiar with the notion of normalizing flows. It is now your turn to put what you have seen into practice and keep exploring. For example, you could use all the bijectors you have seen so far, to build your own bijector chain and then make a trainable distribution out of it and see if you get better results. This is our last coding tutorial together. You are now ready to put this into action in the programming assignment. I would like to congratulate you to have gone through this week's content. We're done for now. I will see you later.