So here, I have prepared a very simple notebook that reads some data, and that's essentially the same dataset. So, you can see here I have the class variable males and females, that's the sex attribute, then I have the height and the weight. What I will do next is I will select the features and the labels from this dataset and I'll plot them. So you can see that that's exactly the same dataset that I showed you in the previous slides. So we have the height, the weight in females and males here. What I will do now, is using my knowledge on bayesian inference to program a classifier. Now, there are many different implementations of the naive bayes. There is one in scikit-learn. There is one in SystemML as well. Almost every machine learning package will provide an implementation of naive base. It's really common, very useful, and so on. But because this is advanced machine learning training course, I decided to give you the internals of how these algorithms work and show you that it's not that difficult to write one from scratch. So, we'll use an algorithm naive bayes classifier algorithm from scratch here. But if you have a more complex dataset, if you have something more flexible, then all you should probably go with something like a SystemML or a scikit-learn or so on depending on the volumes of your dataset. So, the next thing I do here is I split my data into training and test sets so that I can measure the generalization, see what my actual accuracy is, and then I have written here this method called getPriors, and what it does is well it computes the priors for each class in my labels. It goes over the dataset. So essentially, I'm sub-sampling the data into two subsets; males and females and I count the number of occurrences. I count how many observations are of each class and then divide them by the number of samples in the dataset. So, this gives me the prior, like we did in the example. If you look at the outputs of this method, you can see the priors, we have, what is this, 0.5 for the males and 0.49 for the female, so pretty close. Now, the next thing we'll do is we will run this method called fit. And what I do here is I actually, for each unique class in the dataset, I compute the statistics, I compute the mean and I compute the standard deviation, which I can get the variance from. And I do this on the training data. So, let's do this and see what we end up with. All right. So we have here, the first class and we have the mean of the height, and we have the standard deviation of the height, we have the mean of the weight and the standard deviation of the weight. And then for the other class, we have the same; height, mean, and standard deviation. The next thing I do is I define the likelihood. So, I have this getLikelihood function here and it accepts an X which is my new data feature index. Now, because here I didn't drop the weight, I have an array with the statistics for each attribute. So this method basically is asking me for which feature you would like to compute the likelihood; is it for the height or the weight. So, zero will be height, one will be weight. Then it expects the model which is this dictionary here with the statistics and it also wants to know a class name for which class I am computing the likelihood. And I also have a function here called getPosterior which does what? Well, essentially computes the posterior. Sorry, I will go back to likelihood for a second. You can see here that once I have the new data; the mean, the standard deviation I'm using the Gaussian formula to compute the likelihood. So the posterior is, well essentially, best I used the likelihood and I used the priors to compute the posterior for each class and that's how it all works. So, let's say because I now have the statistics, I have the priors, let's say that I have a new observation which is a height of 69. I would like to get the likelihood for this new evidence. Assuming that the class is zero, and our computed likelihood, I had to define my X first, I'll compute the likelihood and I get something like 0.117, that's the likelihood of this data coming from the population of class zero. Then I'll do the same for the second class, for class one, and I see here that the likelihood is much smaller. So if I'm to make a prediction, based on the height, I would say that this person is a male. If I reduce the height, let's say something like 55. What is the likelihood now that this observation came from class zero. Very, very small. You see that's then to the power of minus six. What's the likelihood for this coming from class one? Much higher. Right? Maybe I selected the really short individual. Yeah, that's better. So you see that the probability here now. Now you can see it clearly. The likelihood here is much smaller than the likelihood here because this individual is shorter. It's more likely that the data came from the female population. Now that I have the likelihood, then I can compute the posteriors. I can use my maximum posterior approach and that's what I do here. This classify function essentially computes the posterior. Now for the new data and select the one the class maximizes it. And I'll run this, get predictions for my test set for my unseen data, and now I can look at the accuracy which is 77 percent, which is not too bad at all. Given that these classes here overlap and also we have some invalid data. So, this is how we can implement things based from scratch and use it for classification. Actually there is one other point that I would like to make because so far we've been talking about bayes theorem and rules of probability and likelihood and bayesian inference but I didn't actually tell you why it is called naive bayes.