We're going to look at two basic ways of applying rules. This is the -Nearest Neighbor is the simplest. To me the most fascinating rule maker. So one of the trying to say this person will buy or not buy, or this product is a best seller or best seller, is to simply compare it with previous customers or previous people saw, who were very similar to the person near plane to classify other object classifier. This is considered a non-parametric classifier because there's no model, and so there are no parameters to this model except to measure the distance between a given object and objects you already had in your database, but it works well and it's not very profound, it's easy to understand, and to me it's one of those baseline classifiers. Like if I have to do better, let me at least beat the -Nearest Neighbor, and I'll tell you a lot of data, it's very difficult to beat this classifier. Let's take this particular example, I would probably put this in this show. Say you have captured a lot of digits, handwritten digits from samples, from various people, and you can see, as a human being, you can immediately see these are the ones, the twos, threes fours, and for a machine, is very difficult actually to understand that all those are sixes, and all those are eights, there were those funny looking zeros. But a human being can given an image within a minute or within a few seconds immediately not just say this is a four, but actually point out to the four it is most similar to. See if you look at it, it is very similar to that four. So your mind is using some classification methodology of comparing what you seeing and what you've previously seen, and say this is very similar to that one four. But again, if I force you to pick three things that it's similar to, you will pick the entire group of fours rather than that one single four. So this is called the 1-Nearest Neighbor classifier because you are picking it to pick the one image to which the given image is very similar. So what is the closest neighbor? We can build on this idea to think of objects in two-dimensions, and actually the number of dimensions will depend on how many features we have in the data, but let's say we gave you this data, and then give you a new point, and let's say this point was here. You look at the nearest neighbor to it and you say, "Okay, the nearest neighbor is pink, and therefore we think this object should be classified as pink." Let's extend the logic. I said, "No, why one?" I could make a mistake. It is like reaching for the nearest thing closest your hand. So let's look at two neighbors. You'll get in trouble now, because one neighbor is pink, one is blue, and we don't know what to do, and that's why it's a question mark over there. With even number of nearest neighbors, this might always be a problem. Okay. Now, I said, "No, let's look at three nearest neighbors." Now, I think we got two blues and one pink, and we think most probably this is a blue, I don't know why that's a green, but that should have been a blue. So we say, "Okay, out of the three neighbors, the majority are blue, so I think this is a blue point." Eight nearest neighbors. This figure it always confuses me, but you can see that are eight, out of the eight, five are pink and three are blue, and again, using a majority vote, and you say this point can be labeled as being pink rather than blue. So you get the idea of that. So immediately, I hope we ask ourselves, "What is the value of? How many nearest neighbors?" So you can solve one neighbor, two neighbor, three neighbors. See what is the struggle. Obviously, if you use too many neighbors, if you use all the points available, then it will simply work for the class, which is the most in that dataset. So if all the points in the dataset most of the pink and if you use as a very large number, it's quite likely you will simply classify everything as belonging to the most frequent class, which is not bad, which is another baseline which you would like to beat. If you use something very simple, you might be making a mistake that it could be a randomness until you're going to be making an error. So this is the trade off we have as we increase the model complexity of our accuracy in the data in which we are training our algorithm generally goes up. What it means is the more and more complex a model of a fitting the data goes up, so your training set accuracy goes up. But we may be overfitting on the training set. It is possible then when you take this model, and you find the right for it, and then you try to apply it, let's say to a validation set, you will find that initially, the accuracy goes up then it goes down. So ideally, we would like to stop where even though the training accuracy is still going up, the validation accuracy start screen down. That's a trade-off. In this specific instance, the trade off it's a little bit more complicated. Okay. So what does it do? So here are different pictures as you can see as we increase the value of. If a point falls in the blue shaded area, and is equal to 3, it will be classified as blue, and if it falls in the pinkish shaded area, it'll be classified as pink, because the three nearest neighbors will be pink or blue accordingly. Let's say I increase to five. If I increase to five, you start noticing that slowly the area which is blue is becoming more regular, and anything that falls in blue will be classified as blue, anything in pink will be pink. When on equal to 7, you see now of very regular pattern, maybe you like that, because it's capturing most of the blues correctly. So you say, "What happens if I increase even further?" If I make even further, as I told you, let's go to 11, 13, 15, and 17. It becomes a line, and now you're misclassifying maybe a lot of points. But the boundaries start looking much more regular. So one thing we do observe that as the increases, the noise from errors we are making goes down, because the law of averages, if you look at it more and more points around you, that's possibly by mistake a big point has come in and blue neighborhood it gets canceled out but your chance of making an error keeps going up. So that's what we will see, that the accuracy will start going up initially, and then it will start tapering off.