Today, let's talk about Convolution Neural Network. Convolution Neural Network or CNN are specific type of neural network for processing great like data such as images and time series. In house care application the CNN models are widely used in automatic feature learning and disease classification from medical images. For example, automatic classification of a skin lesion, and detection of diabetic retinopathy. In previous lecture, we introduced feed forward neural networks where all the neurons in the earlier layer connected to all the neuron in the next layer. When we have high dimensional input, for example, image was a thousand by thousand pixels, or EHR data with thousands of medico code, the fully connected neural networks will have too many connections that has too many parameters, which are expensive to learn and very difficult to learn as well from data. To make to computationally more efficient, we can force the neuron to have a small number of connections. We also want to limit those connections to local input. For example, in this example it's a left-hand side showing a two layer connection with fully connected neurons. All the input X connected with odd hidden variable H. If we think of this using CNN model or convolution, we try to connect a set of x that are next to each other to a neural. The connection are very local to the input. For example, x_1, x_2, x_3 connect to h_1 and x_3, x_4, x_5 connect to h_2, and there's no common hidden layers, there's no common neurons between far away input, for example, x_1 and x_5 do not share any hidden neurons. Can be mined, it edges associate with some parameter. By limit the number of connection to the local input, then essentially, we would use number of parameters. That's one idea for convolution neural network. We want the connection to be local. The second most important idea is weight sharing. While using local to connect this structure can significantly reduce number of parameters, there's still a lot of connections. We can further reduce number of parameters by weight sharing. The idea is, is it of having different parameter for each connection. For example, in this case, each edge associated with a parameter w_1, w_2 into q_6, for the six connections, we can actually share those connections. For example, this w 1, 2, 3 are all the parameters we need. If we want to compute the hidden neuron for a different location, we're going to reuse those parameter w_1, w_2, w_3 again at the different region. This weight sharing mechanism resembles convolution techniques in signal processing, which can be a considered as applying a future or kernel that is really a set of weights to many position in the input data. This future operation is called convolution in signal processing. To summarize, this weight sharing is really equivalent to applying a future to many position in the input data. The third idea, this one is not as important as the previous ones, is using pooling to handle distortion. In addition to convolution operation, many CNN architecture also have pooling layers outside convolution layers, which further dramatically reduce the number of parameters, and the convolution and pooling layers have this benefit of translation invariance property. The idea is if data, the two images are very similar, one has just shifted from the other by a little bit, then this pooling operation can handle that. Let me give you a concrete two example. We have this very simple input fundamental x_1 to x_5. We have two input vectors. This one, 0,1,0,0,0, and the second one is just like the first one but shifted or translated by two position. This value one will move two value [inaudible]. We have 0,0,0,1,0. This pulling operation can help the arrived the same output even when you have this type of translation. The idea is if you're applying convolution operation over this time we input, which is really about weight as some at different location using this [inaudible] , you will get two different outputs for this two input. One is having h_1 equal to w_2 and h_2 equal to 0. The second one will have 0 at h_1 and h_2 equal to w_2. Clearly, there are two different output without this pooling operation. But if we apply the pooling operation at the n, for example, this max pooling, which is really taking maximum value over this pulling filter, in this case, taking a max over this two values, you will get the same output which is max of w_2 or 0. Pooling operation hand those small distortion in the input data. To summarize, if we have a convolution neural network, the main benefit is it can handle great like structured, images, time-series. It utilize a specialized operation: convolution operation and pulling operation. Advantages of CNN over a fully connected neural networks are the following. It have sparse interconnection and the parameter sharing can reduce number of parameter and it can handle translation invariance. Next, let's learn the basic CNN structure. CNN structure is truly, you just stack a set of convolution, pooling, and finally, fully connected layers together to form a specific architectures. There are many different CNN architectures. We'll talk about a few in this lecture. But all of those structures, all of those CNN architecture we have this building blogs: convolution layers, max pooling layers, another convolution layer, another max pooling layer, followed by three other convolution layers and other max pooling layers, and only we have several fully connected layers. In this part of the lecture, we'll learn what individual component means and how the input and output related to each other, and also how much computation going on when you have a specific architecture, and also how many parameters we need to learn for a given architecture.