0:00

Hello, and welcome back.

Â In this week we're going to go over the basics of neural network programming.

Â It turns out that when you implement a neural network there

Â are some techniques that are going to be really important.

Â For example, if you have a training set of m training examples,

Â you might be used to processing the training set by having a four loop

Â step through your m training examples.

Â But it turns out that when you're implementing a neural network,

Â you usually want to process your entire training set

Â without using an explicit four loop to loop over your entire training set.

Â So, you'll see how to do that in this week's materials.

Â Another idea, when you organize the computation of, in your network,

Â usually you have what's called a forward pause or forward propagation step,

Â followed by a backward pause or what's called a backward propagation step.

Â And so in this week's materials, you also get an introduction about why

Â the computations, in learning an neural network can be organized in this for

Â propagation and a separate backward propagation.

Â 1:09

For this week's materials I want to convey these ideas using

Â logistic regression in order to make the ideas easier to understand.

Â But even if you've seen logistic regression before, I think that there'll

Â be some new and interesting ideas for you to pick up in this week's materials.

Â So with that, let's get started.

Â Logistic regression is an algorithm for binary classification.

Â So let's start by setting up the problem.

Â Here's an example of a binary classification problem.

Â You might have an input of an image, like that, and

Â want to output a label to recognize this image as either being a cat,

Â in which case you output 1, or not-cat in which case you output 0,

Â and we're going to use y to denote the output label.

Â Let's look at how an image is represented in a computer.

Â To store an image your computer stores three separate matrices

Â corresponding to the red, green, and blue color channels of this image.

Â 2:10

So if your input image is 64 pixels by 64 pixels,

Â then you would have 3 64 by 64 matrices

Â corresponding to the red, green and blue pixel intensity values for your images.

Â Although to make this little slide I drew these as much smaller matrices, so

Â these are actually 5 by 4 matrices rather than 64 by 64.

Â So to turn these pixel intensity values- Into a feature vector, what we're

Â going to do is unroll all of these pixel values into an input feature vector x.

Â So to unroll all these pixel intensity values into Feature vector, what we're

Â going to do is define a feature vector x corresponding to this image as follows.

Â We're just going to take all the pixel values 255, 231, and so on.

Â 255, 231, and so on until we've listed all the red pixels.

Â And then eventually 255 134 255, 134 and so

Â on until we get a long feature vector listing out all the red,

Â green and blue pixel intensity values of this image.

Â If this image is a 64 by 64 image, the total dimension

Â of this vector x will be 64 by 64 by 3 because that's

Â the total numbers we have in all of these matrixes.

Â Which in this case, turns out to be 12,288,

Â that's what you get if you multiply all those numbers.

Â And so we're going to use nx=12288

Â to represent the dimension of the input features x.

Â And sometimes for brevity, I will also just use lowercase n

Â to represent the dimension of this input feature vector.

Â So in binary classification, our goal is to learn a classifier that can input

Â an image represented by this feature vector x.

Â And predict whether the corresponding label y is 1 or 0,

Â that is, whether this is a cat image or a non-cat image.

Â Let's now lay out some of the notation that we'll

Â use throughout the rest of this course.

Â A single training example is represented by a pair,

Â (x,y) where x is an x-dimensional feature

Â vector and y, the label, is either 0 or 1.

Â Your training sets will comprise lower-case m training examples.

Â And so your training sets will be written (x1, y1) which is the input and

Â output for your first training example (x(2), y(2)) for

Â the second training example up to <xm, ym) which is your last training example.

Â And then that altogether is your entire training set.

Â So I'm going to use lowercase m to denote the number of training samples.

Â And sometimes to emphasize that this is the number of train examples,

Â I might write this as M = M train.

Â And when we talk about a test set,

Â we might sometimes use m subscript test to denote the number of test examples.

Â So that's the number of test examples.

Â Finally, to output all of the training examples into a more compact notation,

Â we're going to define a matrix, capital X.

Â As defined by taking you training set inputs x1, x2 and

Â so on and stacking them in columns.

Â So we take X1 and put that as a first column of this matrix,

Â X2, put that as a second column and so on down to Xm,

Â then this is the matrix capital X.

Â So this matrix X will have M columns, where M is the number of train

Â examples and the number of railroads, or the height of this matrix is NX.

Â Notice that in other causes, you might see the matrix capital

Â X defined by stacking up the train examples in rows like so,

Â X1 transpose down to Xm transpose.

Â It turns out that when you're implementing neural networks using

Â this convention I have on the left, will make the implementation much easier.

Â So just to recap, x is a nx by m dimensional matrix, and

Â when you implement this in Python,

Â you see that x.shape, that's the python command for

Â finding the shape of the matrix, that this an nx, m.

Â That just means it is an nx by m dimensional matrix.

Â So that's how you group the training examples, input x into matrix.

Â How about the output labels Y?

Â It turns out that to make your implementation of a neural network easier,

Â it would be convenient to also stack Y In columns.

Â So we're going to define capital Y to be equal to Y 1, Y 2,

Â up to Y m like so.

Â So Y here will be a 1 by m dimensional matrix.

Â And again, to use the notation without the shape of Y will be 1, m.

Â Which just means this is a 1 by m matrix.

Â And as you influence your new network, mtrain discourse, you find that a useful

Â convention would be to take the data associated with different training

Â examples, and by data I mean either x or y, or other quantities you see later.

Â But to take the stuff or

Â the data associated with different training examples and

Â to stack them in different columns, like we've done here for both x and y.

Â