0:00

In this figure, we're going to get a geometrical understanding of what happens

Â when a perceptron learns. To do this, we have to think in terms of a

Â weight space. It's a high dimensional space in which

Â each point corresponds to a particular setting for all the weights.

Â In this phase, we can represent the training cases as planes and learning

Â consists of trying to get the weight vector on the right side of all the

Â training planes. For non-mathematicians, this may be

Â tougher than previous material. You may have to spend quite a long time

Â studying the next two parts. In particular, if you're not used to

Â thinking about hyperplanes and high dimensional spaces, you're going to have

Â to learn that. To deal with hyperplanes in a

Â 14-dimensional space, for example, what you do is you visualize a 3-dimensional

Â space and you say, fourteen to yourself very loudly.

Â Everybody does it. But remember, that when you go from

Â 13-dimensional space to a 14-dimensional space, your creating as much extra

Â complexity as when you go from a 2D space to a 3D space.

Â 14-dimensional space is very big and very complicated.

Â 1:35

Assuming we've eliminated the threshold, we can represent every training case as a

Â hyperplane through the origin in weight space.

Â So, points in the space correspond to weight vectors and training cases

Â correspond to planes. And, for a particular training case, the

Â weights must lie on one side of that hyperplane, in order to get the answer

Â correct for that training case. So, let's look at a picture of it so we

Â can understand what's going on. Here's a picture of white space.

Â 2:35

We're going to consider a training case in which the correct answer is one.

Â And for that kind of training case, the weight vector needs to be on the correct

Â side of the hyperplane in order to get the answer right.

Â It needs to be on the same side of the hyperplane as the direction in which the

Â training vector points. For any weight vector like the green one,

Â that's on that side of the hyperplane, the angle with the input vector will be less

Â than 90 degrees. So, the scaler product of the input vector

Â with a weight vector will be positive. And since we already got rid of the

Â threshold, that means the perceptron will give an output of what?

Â It'll say yes, and so we'll get the right answer.

Â Conversely, if we have a weight vector like the red one, that's on the wrong side

Â of the plane, the angle with the input vector will be more than 90 degrees, so

Â the scalar product of the weight vector and the input vector will be negative, and

Â we'll get a scalar product that is less than zero so the perceptron will say, no

Â or zero, and in this case, we'll get the wrong answer.

Â 3:49

So, to summarize, on one side of the plane, all the weight vectors will get the

Â right answer. And on the other side of the plane, all

Â the possible weight vectors will get the wrong answer.

Â Now, let's look at a different training case, in which the correct answers are

Â zero. So here, we have the weight space again.

Â We've chosen a different input vector, of this input factor, the right answer is

Â zero. So again, the input case corresponds to a

Â plane shown by the black line. And in this case, any weight vectors will

Â make an angle of less than 90 degrees with the input factor, will give us a positive

Â scalar product, [unknown] perceptron to say yes or one, and it will get the answer

Â wrong conversely. And the input vector on the other side of

Â the plain, will have an angle of greater than 90 degrees.

Â And they will correctly give the answer of zero.

Â So, as before, the plane goes through the origin, it's perpendicular to the input

Â vector, and on one side of the plane, all the weight vectors are bad, and on the

Â other side, they're all good. Now, let's put those two training cases

Â together in one picture weight space. Our picture of weight space is getting a

Â little bit crowded. I've moved the input vector over so we

Â don't have all the vectors in quite the same place.

Â And now, you can see that there's a code of possible weight vectors.

Â And any weight vectors inside that cone, will get the right answer for both

Â training cases. Of course, there doesn't have to be any

Â cone like that. It could be there are no weight vectors

Â that get the right answers for all of the training cases.

Â But if there are any, they'll lie in a cone.

Â So, what the learning algorithm needs to do is consider the training cases one at a

Â time and move the weight vector around in such a way that it eventually lies in this

Â cone. One thing to notice is that if you get a

Â good weight factor, that is something that works for all the training cases, it'll

Â lie on the cone. Ad if you had another one, it'll lie on

Â the cone. And so, if you take the average of those

Â two weight vectors, that will also lie on the cone.

Â That means the problem is convex. The average of two solutions is itself a

Â solution. And in general in machine learning if you

Â can get a convex learning problem, that makes life easy.

Â