These are nice names, and you having done a course on data tools should know some of them. So, there are four names which may be worth remembering. True positives, true negatives, false positive, false negative. True positives are those, the model predicts Yes, and the actual value is Yes. True negative is the model predicts No, actual value is No, it's not spam. False positive is model predicts Yes, right? But actually, it is No. And so it predicts Yes in DISagreement with the outcome, right? So, it's a fake positive, it is actually No predicting as positive. In this case, the actual outcome was Yes, and the model is saying No, and we call that a false negative. So, the negative regard was false, right? So, we will measure these, and there are some nice cool names for doing that. In the next slide I will show you, we will measure these in different ways. So, take the confusion matrix on the validation data. There are three things we measure: Precision, Recall, and Specificity. Precision mean, how precise and mine in the predicted values, okay? So, what it's saying is it just keep your focus here. What it is saying is, I got 2 out of those which I say are spam, 202 of those are spam, but I classified 245 emails as spam. So, my precision is 82%, so I am precise 82%. 18%, I'm spraying all over the place. So, precision refers to the predicted values, you may like to remember it that way. Recall/Sensitivity/True Positive Rate, and the best way to think of it is what is the true positive rate. True positive rate is, look at this, I had a total of 268 Positives, right? Out of which, the model identified only 202, so my True Positive Rate is 75%. And similarly, what is the True Negative Rate? It's 90%, which means the negative part I'm doing well, which is what we wanted to do, right? 379, which was false, we got it right. Out of a total of 379 + 43, which is 422. These in total are used to get an idea of what I'm getting right, and what I'm getting wrong. Now, one other measure we use is called Risk. So, what is Risk? Well, it's trying to in Rattle when you just select Risk, and Validation, and evaluate part of Rattle and hit Execute, right? You would either produce a graph which looks like that. So, how do you read this graph? It's a little complicated and let me try to explain. Let's say, I arrange all the emails randomly, okay, and I picked 50% of them. If I pick 50% of them, right? I should be able to find 50% of the spam, because random selection of 50% of the emails will contain 50% spam. So, if they examine them, I'll be wasting my time, "which is spam? Which is not spam?" Instead, what I do I sort the data, the emails in the order of decreasing probability they have spam. So, the first email is got the highest probability and the next is the less. That's why the axis on the x is called the case law. So, instead of randomly looking at 50% of the data, I look at the sorted ones. Obviously, if I look at the sorted ones, if you look at it right here. I will probably catch 82% of the spam emails. So, for example, you want to do an audit of the tax returns or something like that. You run it through a program which tells you what's a probability that there is something wrong with it, sort them, and then pick up only those which are of a high probability. The question is, how much better are you doing? So, if you look at the shape of these curves, if you randomly just pick up emails, you can know better, and do better than the diagnosis. So, if you sample at 80% you will get 80% of the spams. If you sample 100% you will get 100% of the spams. Instead, if you sample on based on the probability of it being spam, you might like to stop somewhere before this tapers off. And saying at that point I would like captured some bits of spam with the least effort as possible, right? So, it gives you an idea the tradeoff between, how many cases I should examine and what is the benefit of it? The better is your method, the sharper is your curve, as you can see. What this means is, if you just sample the first several case loads, you are kept catching most of the spam in the emails. And the rest can just escape on their own. So, this graph, the soft-sharp, the more it looks like this than this, the better a classifier it is. Finally, the last measure we use is called Lift. The lift is the ratio we get to the ratio we would record if we randomly sampled. So, for example, here it is, right? Let's say, I caught I looked at 40%. If you had randomly sampled 40% of the emails, I would have caught 40% of the spam. But you're actually capturing 80%, because you have sorted it based on the probabilities. And therefore, your lift as 2 which is 80 divided by 40, if you sample the first 40% of emails, okay? That's all there is to it, the more you do and the more you stare at these charts, the more you understand about that. I should add this one other thing which Rattle allows, and you can also see is it allows you to score the data which we have done repeatedly. And you can look at the predicted class, stare at them, see some of them which are outrageously wrong. And see how we can improve the models at some point human intelligence may be used to create a score and see how to improve your classification. These are the worst cases, we're getting these wrong all the time, right? So, what is that, and why is it doing it? And is there a way I can correct it, right? Or maybe we should add some information which allows the spam filter to figure out that this is not a spam.