0:00

In this video, I want to start telling you about how we represent neural networks.

Â In other words, how we represent our hypothesis or

Â how we represent our model when using neural networks.

Â Neural networks were developed as simulating neurons or

Â networks of neurons in the brain.

Â So, to explain the hypothesis representation

Â let's start by looking at what a single neuron in the brain looks like.

Â Your brain and

Â mine is jam packed full of neurons like these and neurons are cells in the brain.

Â And two things to draw attention to are that first.

Â The neuron has a cell body, like so, and moreover,

Â the neuron has a number of input wires, and these are called the dendrites.

Â You think of them as input wires, and these receive inputs from other locations.

Â And a neuron also has an output wire called an Axon, and

Â this output wire is what it uses to send signals to other neurons,

Â so to send messages to other neurons.

Â So, at a simplistic level what a neuron is, is a computational unit that

Â gets a number of inputs through it input wires and does some computation and

Â then it says outputs via its axon to other nodes or to other neurons in the brain.

Â Here's a illustration of a group of neurons.

Â The way that neurons communicate with each other is with little pulses of

Â electricity, they are also called spikes but that just means pulses of electricity.

Â So here is one neuron and what it does is if it wants a send a message what it

Â does is sends a little pulse of electricity.

Â Varis axon to some different neuron and here, this axon that is this open wire,

Â connects to the dendrites of this second neuron over here,

Â which then accepts this incoming message that some computation.

Â And they, in turn, decide to send out this message on this axon to other neurons,

Â and this is the process by which all human thought happens.

Â It's these Neurons doing computations and

Â passing messages to other neurons as a result of what other inputs they've got.

Â And, by the way, this is how our senses and our muscles work as well.

Â If you want to move one of your muscles the way that where else in your neuron may

Â send this electricity to your muscle and that causes your muscles to contract and

Â your eyes, some senses like your eye must send a message to your brain while it

Â does it senses hosts electricity entity to a neuron in your brain like so.

Â In a neuro network, or rather, in an artificial neuron network that we've

Â implemented on the computer, we're going to use a very simple model of

Â what a neuron does we're going to model a neuron as just a logistic unit.

Â So, when I draw a yellow circle like that, you should think of that as a playing

Â a role analysis, who's maybe the body of a neuron, and

Â we then feed the neuron a few inputs who's various dendrites or input wiles.

Â 3:14

And the neuron does some computation.

Â And output some value on this output wire, or

Â in the biological neuron, this is an axon.

Â And whenever I draw a diagram like this,

Â what this means is that this represents a computation of h of

Â x equals one over one plus e to the negative theta transpose x,

Â where as usual, x and theta are our parameter vectors, like so.

Â 4:31

Finally, one last bit of terminology when we talk about neural networks,

Â sometimes we'll say that this is a neuron or

Â an artificial neuron with a Sigmoid or logistic activation function.

Â So this activation function in the neural network terminology.

Â This is just another term for that function for

Â that non-linearity g(z) = 1 over 1+e to the -z.

Â And whereas so far I've been calling theta the parameters of the model,

Â I'll mostly continue to use that terminology.

Â Here, it's a copy to the parameters, but in neural networks, in the neural network

Â literature sometimes you might hear people talk about weights of a model and

Â weights just means exactly the same thing as parameters of a model.

Â But I'll mostly continue to use the terminology parameters in these videos,

Â but sometimes, you might hear others use the weights terminology.

Â 5:34

What a neural network is, is just a group of this different neurons strong together.

Â Completely, here we have input units x1, x2, x3 and once again,

Â sometimes you can draw this extra note x0 and Sometimes not, just flow that in here.

Â And here we have three neurons which have written 81, 82, 83.

Â I'll talk about those indices later.

Â And once again we can if we want add in just a0 and

Â add the mixture bias unit there.

Â There's always a value of 1.

Â And then finally we have this third node and the final layer, and

Â there's this third node that outputs the value that the hypothesis h(x) computes.

Â To introduce a bit more terminology, in a neural network,

Â the first layer, this is also called the input layer because this is where we

Â Input our features, x1, x2, x3.

Â The final layer is also called the output layer because that layer has a neuron,

Â this one over here, that outputs the final value computed by a hypothesis.

Â And then, layer 2 in between, this is called the hidden layer.

Â The term hidden layer isn't a great terminology, but this ideation is that,

Â you know, you supervised early,

Â where you get to see the inputs and get to see the correct outputs, where

Â there's a hidden layer of values you don't get to observe in the training setup.

Â It's not x, and it's not y, and so we call those hidden.

Â And they try to see neural nets with more than one hidden layer but

Â in this example, we have one input layer, Layer 1, one hidden layer, Layer 2,

Â and one output layer, Layer 3.

Â But basically, anything that isn't an input layer and

Â isn't an output layer is called a hidden layer.

Â 7:29

So I want to be really clear about what this neural network is doing.

Â Let's step through the computational steps that are and

Â body represented by this diagram.

Â To explain these specific computations represented by a neural network,

Â here's a little bit more notation.

Â I'm going to use a superscript j subscript i to denote the activation

Â of neuron i or of unit i in layer j.

Â So completely this gave superscript to sub group one,

Â that's the activation of the first unit in layer two, in our hidden layer.

Â And by activation I just mean the value that's computed by and

Â as output by a specific.

Â In addition, new network is parametrize by these matrixes, theta

Â super script j Where theta j is going to be a matrix of weights controlling

Â the function mapping form one layer, maybe the first layer to the second layer,

Â or from the second layer to the third layer.

Â 8:34

This first hidden unit here has it's value computed as follows,

Â there's a is a21 is equal to the sigma function of the sigma activation function,

Â also called the logistics activation function,

Â apply to this sort of linear combination of these inputs.

Â And then this second hidden unit has this activation

Â value computer as sigmoid of this.

Â And similarly for this third hidden unit is computed by that formula.

Â So here we have 3 theta 1 which is matrix

Â of parameters governing our mapping

Â from our three different units, our hidden units.

Â Theta 1 is going to be a 3.

Â 9:35

Theta 1 is going to be a 3x4-dimensional matrix.

Â And more generally, if a network has SJU units in there j and

Â sj + 1 units and sj + 1 then the matrix theta j

Â which governs the function mapping from there sj + 1.

Â That will have to mention sj +1 by sj + 1 I'll just be clear

Â about this notation right.

Â This is Subscript j + 1 and that's s subscript j, and

Â then this whole thing, plus 1, this whole thing (sj + 1), okay?

Â So that's s subscript j + 1 by,

Â 10:21

So that's s subscript j + 1 by sj + 1 where

Â this plus one is not part of the subscript.

Â Okay, so we talked about what the three hidden units do to compute their values.

Â Finally, there's a loss of this final and

Â after that we have one more unit which computer h of x and

Â that's equal can also be written as a(3)1 and that's equal to this.

Â And you notice that I've written this with a superscript two here,

Â because theta of superscript two is the matrix of parameters, or

Â the matrix of weights that controls the function that maps from the hidden units,

Â that is the layer two units to the one layer three unit, that is the output unit.

Â To summarize, what we've done is shown how a picture like this over here defines

Â an artificial neural network which defines a function h

Â that maps with x's input values to hopefully to some space that provisions y.

Â And these hypothesis are parameterized by parameters

Â denoting with a capital theta so that, as we vary theta,

Â we get different hypothesis and we get different functions.

Â Mapping say from x to y.

Â