0:01

You've seen how convolutions over 2D images works.

Now, let's see how you can implement convolutions over,

not just 2D images,

but over three dimensional volumes.

Let's start with an example,

let's say you want to detect features,

not just in a great scale image,

but in a RGB image.

So, an RGB image might be instead of a six by six image,

it could be six by six by three,

where the three here responds to the three color channels.

So, you think of this as a stack of three six by six images.

In order to detect edges or some other feature in this image,

you can vault this,

not with a three by three filter,

as we have previously,

but now with also with a 3D filter,

that's going to be three by three by three.

So the filter itself will also have three layers corresponding to the red,

green, and blue channels.

So to give these things some names,

this first six here,

that's the height of the image,

that's the width, and this three is the number of channels.

And your filter also similarly has a height,

a width, and the number of channels.

And the number of channels in

your image must match the number of channels in your filter,

so these two numbers have to be equal.

We'll see on the next slide how this convolution operation actually works,

but the output of this will be a four by four image.

And notice this is four by four by one,

there's no longer a three at the end.

Let's go through in detail how this works but let's use a more nicely drawn image.

So here's the six by six by three image,

and here's a three by three by three filter,

and this last number,

the number of channels matches the 3D image and the filter.

So to simplify the drawing of this three by three by three filter,

instead of joining it is a stack of the matrices, I'm also going to,

sometimes, just draw it as this three dimensional cube, like that.

So to compute the output of this convolutional operation,

what you would do is take the three by three by three filter and first,

place it in that upper left most position.

So, notice that this three by three by three filter has 27 numbers,

or 27 parameters, that's three cubes.

And so, what you do is take each of

these 27 numbers and multiply them with the corresponding numbers from the red,

green, and blue channels of the image,

so take the first nine numbers from red channel,

then the three beneath it to the green channel,

then the three beneath it to the blue channel,

and multiply it with the corresponding 27 numbers that gets

covered by this yellow cube show on the left.

Then add up all those numbers and this gives you this first number in the output,

and then to compute the next output you take this cube and slide it over by one,

and again, due to 27 multiplications,

add up the 27 numbers,

that gives you this next output,

do it for the next number over,

for the next position over,

that gives the third output and so on.

That dives you the forth and then one row down and then the next one,

to the next one, to the next one,

and so on, you get the idea,

until at the very end,

that's the position you'll have for that final output.

So, what does this allow you to do?

Well, here's an example,

this filter is three by three by three.

So, if you want to detect edges in the red channel of the image,

then you could have the first filter, the one, one, one, one is one,

one is one, one is one as usual,

and have the green channel be all zeros,

and have the blue filter be all zeros.

And if you have these three stock together to form your three by three by three filter,

then this would be a filter that detect edges,

vertical edges but only in the red channel.

Alternatively, if you don't care what color the vertical edge is in,

then you might have a filter that's like this,

whereas this one, one, one, minus one,

minus one, minus one,

in all three channels.

So, by setting this second alternative, set the parameters,

you then have a edge detector,

a three by three by three edge detector,

that detects edges in any color.

And with different choices of these parameters you can get

different feature detectors out of this three by three by three filter.

And by convention, in computer vision,

when you have an input with a certain height, a certain width,

and a certain number of channels, then

your filter will have a potential different height,

different width, but the same number of channels.

And in theory it's possible to have a filter that maybe only looks at the red channel

or maybe a filter looks at only the green channel and a blue channel.

And once again, you notice th\t convolving a volume,

a six by six by three convolve with a three by three by three,

that gives a four by four, a 2D output.

Now that you know how to convolve on volumes,

there is one last idea that will be crucial for building convolutional neural networks,

which is what if we don't just wanted to detect vertical edges?

What if we wanted to detect vertical edges and horizontal edges

and maybe 45 degree edges and maybe 70 degree edges as well,

but in other words, what if you want to use multiple filters at the same time?

So, here's the picture we had from the previous slide,

we had six by six by three convolved with the three by three by three,

gets four by four,

and maybe this is a vertical edge detector,

or maybe it's run to detect some other feature.

Now, maybe a second filter may be denoted by this orange-ish color,

which could be a horizontal edge detector.

So, maybe convolving it with the first filter gives you this first four by four output

and convolving with the second filter gives you a different four by four output.

And what we can do is then take these two four by four outputs,

take this first one within the front and you

can take this second filter output and well, let me draw it here,

put it at back as follows,

so that by stacking these two together,

you end up with a four by four by two output volume, right?

And you can think of the volume as if we draw this is a box,

I guess it would look like this.

So this would be a four by four by two output volume,

which is the result of taking your six by six by three image and

convolving it or applying two different three by three filters to it,

resulting in two four by four outputs that then gets stacked up

to form a four by four by two volume.

And the two here comes from the fact that we used two different filters.

So, let's just summarize the dimensions,

if you have a n by n by number of channels input image,

so an example, there's a six by six by three,

where n subscript C is the number of channels,

and you convolve that with a f by f by, and again,

this should be the same nC, so this was,

three by three by three,

and by convention this and this have to be the same number.

Then, what you get is n minus f plus one by

n minus f plus one by and you want to use this nC prime,

or its really nC of the next layer,

but this is the number of filters that you use.

So this in our example would be be four by four by two.

And I wrote this assuming that you use a stride of one and no padding.

But if you used a different stride of padding

than this n minus F plus one would be affected in a usual way,

as we see in the previous videos.

So this idea of convolution on volumes,

turns out to be really powerful.

Only a small part of it is that you can now operate

directly on RGB images with three channels.

But even more important is that

you can now detect two features, like vertical, horizontal edges,

or 10, or maybe a 128,

or maybe several hundreds of different features.

And the output will then have a number

of channels equal to the number of filters you are detecting.

And as a note of notation,

I've been using your number of channels to denote this last dimension in the literature,

people will also often call this the depth of this 3D volume and both notations,

channels or depth, are commonly used in the literature.

But they find depth more confusing

because you usually talk about the depth of the neural network as well,

so I'm going to use the term channels in these videos to refer to

the size of this third dimension of these filters.

So now that you know how to implement convolutions over volumes,

you now are ready to implement one layer of the convolutional neural network.

Let's see how to do that in the next video.