In this lecture, we'll talk about deep generative model. So here's the outline of this lecture. We'll talk about two different methods. One is called generative adversarial network. We'll talk about the method first, then introduce one health care application in synthetic patient record generations. Then we'll talk about another deep generative model called variational autoencoder. We'll introduced the method, then show you an application of variational autoencoder for molecule generation. So let's start it with generated adversarial network, GAN. So the goal of GAN is to create new samples that were present or very similar to the training data. Like most of the generative model, that's the goal. To generate new samples. So the idea of GAN is very interesting. Instead of building a model, just training, trying to create a new samples, here they try to achieve the goal by training two different neural networks. One neural network is called generator, so that's the neural network that create realistic but synthetic samples. Then we have another neural network called discriminator, and the goal of discriminator is to differentiate synthetic samples and real samples. And we do this kind of generation and discrimination iteratively so that the generator can become better and better. Eventually the synthetic examples cannot be differentiated by the discriminator. In that case, those samples will be very realistic. So that's the kind of high level idea of GAN. And that's a paper published in NIPS 2014. So for healthcare application of GAN, there are actually quite a few and then a few paper in the category of generating synthetic electronic health record data. And in this lecture we'll talk about the 1st paper. And there are some follow-up work on that as well. Here's just some examples and other applications in drug discovery, as one task called the De Novo design. That's about generating normal molecules and GAN has been used there as well. There's a few papers in that direction. So next, let's first talk about this algorithm. This pipeline of GAN in more details. So GAN has this two neural network. One is generator called G and another one is the discriminator called D. And the generator trying to learn realistic data sample. So the output of generator will be some fake examples. And the input to generator just some random noise because we want to generate things from scratch. And the discriminator on the other hand is trying to differentiate this two classes. On one side we have real examples that coming from the real data set and the other side we have the fake example generated by the generator. The discriminator's job is to do this binary classification. Trying to differentiate which one is real, which one is fake. And finally we have this loss function at the end. So let's first look at the loss function of GAN x of two terms. The first term is about the log likelihood of the real samples being real. And here you can see the expectation of x over samples coming from the real data. And the second term is sampled from random noise. Then the random noise will be input into the generator, then the output of this generator will be the input to the discriminator. And the discriminator trying to differentiate whether this is fake sample or a real sample. And this particular term is the log likelihood that fake sample being classified as fake. So that's the last function of this two terms. The objective function is actually a Min max function right for the discriminator we want to maximize the sum of these two term. So that we can correctly classify this real examples against fake examples and then we have this generator trying to minimize this, this total sums. And so that the discriminators would not be able to differentiate the real sample from the fake samples. So if we look at the discriminator are more closely this is the input to the discriminator [COUGH]. It's a binary classification problem. And one class we have the real examples from the training data one, another class we have the fake samples from the generated result. An the loss function would just penalize a fake samples from the real samples, when it's trying to penalize the misclassification of this two groups. And we can update the parameter of discriminator using back propagation from start from the loss, then I apply this I mean all the parameter in the discriminator. And then we have generator, so the generator it's a more different from the standard neural network. Here the input will be just some random noise like Gaussian distribution from zero mean and unit variance for example, or some uniform distribution. And then the generator were trying to generate synthetic but realistic samples, and then we have also this discriminator will use trying to score those two groups fake samples in this particular case, and to compute some loss. And the generator loss will penalize for the case where this fake example failed to food the discriminator. So in that case they will consider the fake example are not that realistic, so will penalized for those examples from the perspective generator. Then we can do this end to end back propagation from the final loss, generator loss in this case and then update everything after the generator by propagates through the discriminator. So let's look at the generator loss a little bit closer, right? So generated losses, just a second term of the total loss, right? So the generators loss will be kind of the sample from some prior distribution, random noise z. And then log(1- D(G(z))), right? G z is the fake examples and D apply on the fake example give us the likelihood of that example being real or not, okay? So we treat this as the loss, so want to minimize this whole term for the generator. So this turns out to be harder numerically speaking to update for generator in terms of this loss, but luckily this origin losses equivalent to this terms, so it's generating loss in this equivalent form is just minus of expectation of z over log of D(G(z)), right. So instead of 1 minus loss, we want to minimize this, we just want to minimize negative of just log term itself, right? Which is much easier to train empirically speaking, so that's the actual generator loss we will be using when minimize that term. So in term of training GAN, and we do this iteratively. We can train the discriminator first for a few epoch over the data set. Then we train the generators for a few epoch by fixing the discriminator, right? So we don't do this, for example, because if the discriminator changes too frequently then the generator can get confused. So we want to train discriminator for a few epochs, so they'll fix that, then train the generator and do this iteratively. So in terms of convergence of GAN model, it turns out to be a student open problem. Because there's no reliable metrics. Indicating the convergence. Unlike reconstruction errors, where you can say that all this is small enough traditional neural network. But here since we do this minimax iteration, it's not strictly just minimizations, right? So in that case there's no reliable metrics to indicate we converge. And is actually observe empirically that if you train this GAN model for too long you can actually over train it. Meaning that the performance can degrade that the fake sample you generated become less realistic if you train for too long. So that's the overall kind of a challenging problem to determine when the again, GAN models converged. And most of the time what people do is look at those examples, right? If you generating a fake images, look at the images and see if there realistic enough. And then, if they're good, and then you can stop training. But that doesn't really work for all type of data, let's say if you don't have a good easy to do verifications like image, then you may not be able to tell when to stop. So that's still open problem here.