Music In this video, we're going to discuss convolution. In this video, we'll talk about what convolution is, how to determine the size of the activation map, which is the output of the activation layer, stride parameter, zero padding. So, let's just ask the question, "What is convolution?". Let's say we have two images and, if you look closely, these two images are identical except one is slightly shifted. Let's see what happens when we convert the first image to a vector. Here's the image as a 12 dimensional vector, and the grid locations represent the areas, the black values represent where the intensity values are high. Now let's see what happens when we convert the second image to a vector. As you can see the intensity values are in a totally different location. Again we have the original image, the second image is slightly shifted. If we convert it to a vector the intensity values are in two totally different locations. Convolutional Networks help this problem by looking at relative positions of the pixels rather than absolute positions. Using the image and something called the kernel, we're going to perform convolution. The result is called an activation map. As you recall, we have a linear equation, with an output Z. Convolution is analogous to this linear equation the output will be another matrix or tensor. We have the parameter W this is known as a kernel. We have the parameter b this will be known as a bias, The star operation is known as the convolution operation. Let’s go over an example of a convolution. In PyTorch we can create a convolution object as follows, for now we're just going to stick to the case where we have one input channel and one output channel, the kernel size will be 3. We'll create an image that is 5 by 5 pixels in size. This will represent the number of images in our tensor. This will represent the number of channels. as it’s a gray-scale image, it will be set to one. We then apply the convolution kernel. Here's our image it is 5 by 5 in size. Here's our kernel it is 3 by 3 in size. PyTorch randomly initializes the kernel parameters, like the linear parameters. In the lab we initialize it with special values. Let's go over the operation of convolution, its similar to dot product, but the output will be the activation map or rectangle tensor. We start off in the top right corner of the image, and we'll overlay the kernel with that region of the image. We see the kernel values in red, we will multiply every element of the image by the corresponding element of the kernel. For the first row we'll get the following operation, multiplying the intensity value and summing the results. This process is repeated for the second row, multiplying the intensity value and summing the results. Finally, for the final row we multiply the intensity value and sumthe results. We sum those elements together, the result is the first element of z. We shift the kernel to the right represented by the different colors in red. We multiply all the elements of the kernel with the image. This gives us the second element of the activation map. We'll shift the kernel one more column and perform the exact same operation, multiplying every element of the kernel by the corresponding intensity value of the image, adding them all together. This gives us the next value of the activation map. We shift the kernel down one row and repeat the process. This corresponds to the second row in the activation map. We shift the kernel and repeat, getting the next element of the map. We continue until we reach the final column of the image. We then shift down one row getting the following value for the activation map. Preceding to the next column of the image, we repeat the process. We continue until we get to the last element of the image, we now have the output of the activation map. When we add the bias term to the convolution, we’re simply broadcasting it to every element in the tensor or matrix. Let's see how to determine the size of our activation map. Let's start off with a 4 x 4 image and we'll use the variable M to denote the size of the image. We have a 2 x 2 kernel and we'll use K to denote the size of the kernel, we'll set k equals 2. For the first iteration we're going to have one element in our activation map. We'll have to see how many units we have on the right to shift. There are M elements in the image so we can move M more times. But the kernel is of size K and the kernel cant go outside the image. As a result we will have M minus K pixels or steps, this means there's two more steps. If we include the first step, we have N-K+1 steps. In this case 3 and the same argument can be made for the vertical jumps. Let's verify pictorially. In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of the activation map. In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of the activation map. Since on the previous iteration our kernel has reached the rightmost step, we shift the our kernel one row down and to its leftmost position. Similarly we shift one row down and populate the leftmost cell on the activation map. In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of the activation map. In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of the activation map. Since on the previous iteration our kernel has reached the rightmost step, we shift the our kernel one row down and to its leftmost position. Similarly we shift one row down and populate the leftmost cell on the activation map. In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of on the activation map In the next iteration we shift the kernel 1 step to the right and populate the cell on the right of the activation map. Now we populated the whole activation map. We see our activation map has a size of 3 by 3. Another important parameter in convolution is stride. The stride is basically the amount the kernel moves. If we have a stride of one, the kernel just moves once per iteration. When we create our convolution object in the constructor, we add the variable stride. Let's determine the size when we set the stride equal to 2. To determine the size of the output with a stride value of 2 or in general, for the first stride we have 1. We have M minus K samples left, because our stride numbers larger will have less jumps, we can get the actual number by dividing the remaining number of pixels by the actual stride size. Adding the activation value for the first stride the formula is given by. In this case the size of our activation map will be 2. In the next iteration we shift the kernel 2 steps to the right and populate the cell on the right of the activation map. Since on the previous iteration our kernel has reached the rightmost step, we shift the our kernel 2 rows down and to its leftmost position. Similarly we shift one row down and populate the leftmost cell on the activation map. In the next iteration we shift the kernel 2 steps to the right and populate the cell on the right of the activation map Let's see what happens when we use a stride of 3. We have a stride of three, we plug the value into the equation we get a nonsensical value that doesn't really make sense. To solve this, we will use 0 padding. Let's review zero padding. When we create a convolution object and we add in the parameter padding, we set it equal to 1. We take our original image. This will add two additional rows of zeros and two additional columns of zeros. The new image shape is given by M prime, is simply the old image plus two times the size of the padding elements added. In this case the value is 6. Let's see if we can now perform a Convolution. We overlay the kernel on the top left cornel of the image and obtain a value of 1. In the next iteration we shift the kernel 3 steps to the right and populate the cell on the right of the activation map. Since on the previous iteration if we shift our kernel 3 more steps to the right we will reach out of bounds, we shift the our kernel 3 rows down and to its leftmost position. Similarly we shift one row down and populate the leftmost cell on the activation map. In the next iteration, we shift the kernel 3 steps to the right and populate the cell on the right on the activation map. We see how we can now perform the convolution. So to summarize in convolution we have an image, we have the kernel we use this to obtain the activation map, see the labs for how to deal with color images using Conv2d. Music