Hi everyone. My name is Nicha Dvornek. I'm an Assistant Professor in the Department of Radiology and Biomedical Imaging and the Department of Biomedical Engineering at Yale University. In this part of the lecture we're going to talk about deep learning. As mentioned in the last section, a deep learning is a subfield of machine learning. The types of models used in deep learning are called neural networks, which are inspired by their biological namesake. The neuron shown here is the basic processing unit and the nervous system. The dendrites are these small branches off of the cell which receives inputs to the neuron. If these inputs are high enough, the neuron will fire. Meaning an action potential will be sent down the axon away from the cell body. Then the signal will be output to other cells at the axon terminal. In the brain we can think of the larger organization of these neurons as a chain that processes lower-level information into higher-level information. For example, low-level neurons may recognize things like lines and edges while higher-level neurons will recognize faces. Now in deep learning, the basic computational units will model this behavior of a biological neuron, where the unit will receive many inputs. This information is processed and then passed down as a single output for that unit. Here's a simple example of how we can model a digital neuron. Here we have an input layer with two nodes, x1 and x2, and also as constant node with value 1. We have an output layer with one node y. Here we have the connections between the input and output nodes with their associated weights, which are the parameters of the model. Going back to our picture of the neuron, this output node represents a signal that would be passed down the axon and the input layer corresponds to the dendrites, where this constant node represents the baseline activity, the neuron. We can model the neuron activity using this function shown here. First, we see we compute a linear combination of the inputs. That's x1 times w1 plus x2 times w2 plus b. Let's call this result a. We pass this through a non-linear function f, which is going to do the following. If the input to f is greater than 0, f will output a 1. Otherwise it's going to output 0. We see that the neuron fires on this function, outputs the 1. Note that there are many other forms of the function f that are used in deep learning, and this is just one example. Again, the parameters of our model are this w1, w2, and b. We often refer to these as the weights of the model since they are waiting are incoming nodes. The goal in deep learning is to learn these weights based on our given training data. What we just saw was a simple model where we have just one input layer and output layer. But actually to solve more complex non-linear problems, we're actually going to need many more layers with more nodes. This is called the neural network. One motivating example for using the multiple layers is to solve the exclusive or problem. Just as a reminder, the exclusive operation is true when the input arguments differ. Here we have the table with two inputs, x_1 and x_2, which can take on the value of 0 or 1. When both x_1 and x_2 are the same, the exclusive or will output a 0 as shown in the black squares, and when the inputs are different. The exclusive OR will output a one as shown in the blue squares. It turns out that the single neuron model cannot solve this exclusive OR problem. However, a multi-layer model with multiple neurons can. In this model we again have our input layer with x1, x2, and the constant node. Following this, we have what's called a hidden layer with another two nodes, h1 and h2 and a constant. Finally, we have the output layer with our node y. We can see where this middle layer is called a hidden layer since it's sandwiched in hidden inside the network. How are all these nodes connected? Let's first focus on each one. Here we have the weights 2 for x1, 2 for x2, negative 1 for 1. We can write out the function for each one as shown on the right here. Similarly, here the weights for h2. We can write out the expression for h2 as shown here. Finally for node y this connects to the hidden layer with the weights shown here and we can write the expression for y as shown on the right. Again, here's our exclusive or table, and if we worked through the four scenarios where x1 and x2 can either be zero or one. We would see that this network does indeed computes the exclusive OR operation. I encourage you to try working this out for yourself or you can watch someone else do it for you at this video linked below. Now the network we just saw had one hidden layer between the input and output layers. This is called a shallow neural network. If we add more layers, this is called a deep neural network. Deep networks allow for modeling the more complex non-linearities. To learn the weights of a deep neural network for a particular problem, we first need to gather lots of training examples. Then we need to guess the initial weights and adjust these weights so that the network learns to predict the correct answers. These adjustments are going to be automatically computed using what's called the backpropagation algorithm. In a neural networks we've seen so far each node in a layer connects to all nodes in the previous layer, and this is called a dense network. However, there are many other neural configurations that are possible and used. Arguably the most important one in image analysis is based on the convolution operation. We're going to spend a little bit of time to better understand convolutions and why they're important. Crudely, the output of convolution is a weighted sum of the inputs. The goal in performing convolutions is to extract features from the images. For example, we might want to extract edges. We'll see an example this shortly. The weights used to perform the convolution define what's called the filter or convolution operator. Different filters can be used to extract different types of features. Let's take a look at a few examples. We're going to use this MRI image of the brain for our sample image. But first, what really is a digital image? An image is made up of pixels, where each pixel contains a single intensity value. For a grayscale image like this one, the pixel defines the brightness of the image. If we take a closer look at this portion of the image, we start to see some of the granularity due to the discrete way we need to represent images. If we blow up this tip of the ventricle even further, we can clearly see each individual pixel takes on a single intensity in this five-by-five image patch. When we apply the convolution operation, we're going to be computing weighted sums of intensities in small local neighborhoods like this one. Let's take a look at a concrete example of applying a convolution filter. Say, we want to compute the horizontal gradient of an image, meaning we're looking for changes in intensity along the horizontal axis. This would correspond to finding vertical edges in the image. To do this, we could apply the convolution filters shown here, where we have negative weights in the column left to center and positive weights in the column right to center. This is going to accentuate the intensity differences along these vertical edges. Here's the result of applying the horizontal gradient filter to the brain. We can see that the vertical edges are now highlighted as white or black lines, depending on if the change in intensity was positive or negative. Zooming in on this section here, we can see that the vertical edges of the sulci and gyri are now highlighted. We start to really get a sense of how these convolutions might pick up some interesting features that would be useful for different analysis problems. Here's another example where we're looking to compute the vertical gradient. Now, we're looking for changes in intensity along the vertical direction, meaning we're going to find horizontal edges. Here's the corresponding filter, which essentially looks like the horizontal gradient filter, but rotated by 90 degrees. Applying this to a brain image gives us this result here. Zooming in again, we can now see that the horizontal edges of the sulci and gyri and the edge of the brain are now nicely highlighted. Here's a final convolution example. We're going to compute the average local intensity. For the filter, we have the same value for every element. Note that these elements sum up to one. When we perform the convolution and multiply each element with the corresponding pixels in the image, we can see we're just calculating the average intensity of an image patch. Here's the result of applying the average filter. We can see that this ends up blurring our image. This kind of filter could be helpful, for example, to reduce acquisition noise. Again, if we concentrate on this region here, we can see a lot of the detail in the original image has been lost. Now that we have a better sense of what convolutions are, let's go back to our discussion of how it relates to machine learning. Back in the old days, we would run a bunch of handcrafted filters on an image. This means we would hand-pick the weights for the convolution filters based on some prior knowledge of what kind of features might be important. Then this would be fed into a traditional machine learning classifier like an SVM. Now with deep learning, we design networks which will form both the convolutions and the classification together. We make the first part of the network have the convolutional layers where we're actually going to learn the weights of the filter according to the data. In the second part of the network, this is the classifier, which is going to take the output from the convolutional layers. The weights for the convolutional layers and the classifier model is going to be learned together based on the training data. In this way, the optimal features can be selected for the specific classification problem. This is really different from the classic machine learning setup where we essentially had to guess what might be the best features for the problem and then train a model and then iterate that process. This automated feature learning process is really one of the main advantages of the deep learning framework. Convolutional layers are not only great for extracting different image features, but they also give us many more features with far fewer parameters compared to a dense layer. Remember the dense network is a structure like what we saw in exclusive mode where each node is connected to every node in the incoming layer. Suppose we have an input of size N by N, so if we had image with 64 by 64 pixels, our input layer is going to have 4,096 nodes. Suppose the dense layer has M neurons, the output size of that layer is going to have M values and the number of parameters we need to learn is going to be N by N by M. Now, for convolutional layers, these again perform local processing, meaning the convolution output at a pixel here is based on just a small part of the input image. Again, let's say we have an input of size N by N, and with that we have M convolutional filters with filter size D by D. The size of these filters is going to be much smaller than the image size. For example, when we're looking to extract images, the filters is of size five by five. Now, the output of each convolutional filter is called a feature map. Since we have M filters and N by M pixels, the number of features we're going to have is M by N by N. But the number of parameters to learn is going to be much smaller than the number of outputs. Since each filter had size D by D and we have M different filters, we need to learn D by D by M parameters. Remember again, D is much smaller than our image size N. Now you can see that the convolutional layers are going to give us many more features, but with far fewer parameters compared to the dense layers. This expressiveness in computational savings of convolutional layers is another key factor for making them the primary workhorse in image analysis problems. Now there are many other types of neural network layers to perform other operations. For example, the long short-term memory unit, or LSTM, was designed for analyzing temporal signals, and these include feedback connections, meaning that the output of the layer is going to get passed back in as an input to those same units. Maybe an easier way to understand this is to unroll the diagram in time and we can see that the inputs X for different time points are passed into the LSTM, but also the LSTM unit receives directly some information from the previous LSTM as shown by these orange arrows. This internal memory makes LSTM really powerful for temporal processing. Just as an example, in our work, we developed LSTM based models to automatically identify autism from functional MRI time series data. Another common layer performs the dropout operation, which is important for model regularization. Essentially during training the connections between some of the nodes are eliminated or dropped, and this helps prevent the model from overfitting to noise. Another common layer is the pooling operation, which reduces the resolution of the previous layer by pooling information across different nodes. For example, we often take the average or a maximum in the pool. Finally, transposed convolutions are used to up-sample the incoming layer. Essentially, the convolution operation supplied with more nodes in the output than the input layer, so information is now up-sampled in some way. Now combining these different types of neural network layers gives rise to what we call different neural network architectures. Probably the most common architecture used in medical image analysis is based on the unit. Essentially the unit combines lots of convolutional layers, max pooling layers, and transposed convolution layers in such a way that we get this U-shaped model, where the lower levels of the U represent processing at lower image resolutions. This architecture is extremely popular in medical image segmentation, but it's also been used in many other areas such as image registration and denoising. Now there are many other applications of deep learning in healthcare domain. The one I've been referencing the most here is for a medical image analysis. Just for a few examples, deep learning approaches have been used for extracting structures from images such as the prostate from MRI. It has great potential to quickly triage cases, for example, to identify stroke from brain MRI. Deep learning can also be used to classify tumors and identify imaging biomarkers for different diseases and disorders. Another example is in language processing. For example, it could be used to automatically extract information from and analyze medical records. Finally, deep learning is also being used for the analysis of big genetic data. If you'd like to read some more about the applications of deep learning and health care, I refer you to this nice review paper in Nature Medicine. Here I'm just going to mention a few of the many different toolkits available to implement and train neural networks. One of the most popular frameworks is TensorFlow, which was developed by Google. This is probably the most popular library for production level code. PyTorch is a relatively newer library which was developed by Facebook, and this is extremely popular with the research community. Finally, MONAI is a brand new framework for developing models specifically in the medical imaging domain and is built on top of PyTorch. Finally, if you'd like to go deeper into deep learning, I encourage you to check out the deep-learning textbook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, who are some of the biggest names in the field. This textbook is freely available online at the address shown below. To summarize, deep neural network models have allowed for the automated learning of the best features to extract for a given task. This in part has really made deep learning the method of choice for solving many machine learning problems today. In the next section, Professor Papadimitriou will be back to discuss regulatory considerations for implementing medical devices with machine learning and AI. Thanks.