I want to walk you through a little R script we wrote for you, the illustrator Rattle doesn't have it, but I'm going to run it in R for you. First of all, we're going to install the packages for you. We're going to read the data of jokes. We have 100 jokes and I know you don't have time to read all of them, but they're rated by 5,000 users. This is the seed I set for controlling the randomness so that you get the same result as we do. This is the maximum rating anybody has given for the joke which is 9.9 and some people hit some jokes and that gets a rating of minus nine. So, it's the scale of between -10 and 10. Then we will do some histograms. This is the unscaled histogram of the ratings. This is the scaled histogram and then we will show for each jokes the ratings different people had given. This is just data description. Now, here it is. We will use only the first 2,000 users, our analysis, otherwise, it'll take a long time to run. Then we will create the recommendation model using this package which is already available to you called a Recommender function. Using the function recommender, we will do the prediction. First time, we will do user- based collaborative filtering. Here is an example of how you can use R in future. You can simply read what the model does and this model is for user based collaborative filtering. It says, give me the data, tell me what filtering you want to do, I'll do it for you. Then you can see the predicted rating for the first 10 jokes. But changing UBCF to IBCF, one little change from user-based, it'll become item-based. We will predict and we can see the ratings. We can compute the error in the prediction because we have set aside some data. You will see we have set aside 10% of the data. We will see how good our prediction is on the jokes we have set aside and you will find the error. Let me just do it for you and we'll come back. So, let me pull up the R script directly. Here you go. Here's the jester data script and we have left the cool thing at the end of it for you to try. Here is the data, let me just walk you through it, first clean whatever you have. If there is nothing that's fine, go to the back. I'm going to install the package, I'm going to skip the install because I already have it. I'm going to run the package. I'm going to recommend a lab which is the R package we want to use. Here's the data which says, I loaded the data. Here is 5,200 rating matrix, 5000 users, 100 jokes and then we are going to set a random seed and then we will ask you to--this is asking to get the maximum rating, the minimum rating and then you can get the ratings. This user 2841 has given 81 ratings, that's what it's saying. Now, it gives you the jokes rated which are the values and it will drop the mean rating provided by this user and this user has got a decent sense of humor and average rating is 3.85. This is a histogram, and this is the normalized and scaled histogram and this is the histogram of count the jokes. So, how many jokes has a person rated? Very few have rated all the jokes as you can see and many of them have read quite a bit of these jokes and this is the distributional average ratings. Some jokes are really bad at minus four and some jokes--I think many people have rated them as high and now we have trained the model. We have selected only 2,000 users and here, we have split the data 90% into training, 10% into validation and here is the user- based recommender. That's the model. You use this model, so first you create a model using the function called recommender, which is user-based. You apply this model on the data you held back to compute the prediction and here are the predicted values of the first 10 jokes in your data set. You do the same thing for item-based predictions and you will then predict the first 10. Now, remember for this validation set, I know the ratings for this user. So, I can get the accuracy by comparing the user-based and the item-based predictions. Here is the error. So, basically it says, the root mean squared error for UBCF is slightly lower than item-based collaborative filtering. Just to show you that this item works, here's something cool you can do yourself. Remember the takeaway for this part of the segment is A it's fun, B there are lots of unexplored models and lots of unexplored meta methods that you can try. It's fairly easy. Once you go, at this stage, maybe as you practice more and more, you'll be able to understand these models. I want to end this with this idea. We created a script and in this script as you keep going through it, it will ask you to read the jokes. First of all, it'll ask you to write your name. For your name, I'll just type it as Sridhar. Now, it'll ask you to rate 15 jokes. What is your rating for joke number one? Five. What is the rating for job number three? I hope you'll read the jokes. What is the rating for job number four? So, it'll ask you to rate and if you can read those jokes and the first 15 jokes and put your ratings for them. What it is going to do is interesting. I'm going to enter random ratings. I've not recorded my ratings accurately and I have joke number 15, I've finished. That's fine. It says one of your values is not correct. That's okay. Now, it adds my ratings to the original data set and it creates--basically, it takes my ratings and adds it to the original data set, and then it is coming up with the evaluation and from that it is defining the evaluation scheme as before, 90% train and 10% validate. Then it will do a user-based predictive recommendation system and it does a prediction. There it is predicting, hopefully, it has predicted. These are the predictions. What it is doing, is it is not creating predictions for all the jokes in the data set based on your ratings for the 15 that you gave. You can additionally see that the first 15 values are what you entered. Even out of these 15, it took six as given and used the remaining nine to compute the error. I hope you enjoy reading the jokes and running these models, I hope it is fun and hope it has been as much fun for you doing this course as it has been for me to teach it. In this module what we've done is, I know it's been a long module, but I thought we ended up with looking at more nuanced ways of looking at model performance. Error is one thing, but you really sometimes are interested in the error in a particular category and so we need to come up with better ways of defining and measuring error. Then I took you through two fun examples. One is, how do you recommend something to buy based on what they've already put in the basket? Then the trick really was, how to do it efficiently and nothing else. The recommendation systems we've talked with the collaborative filtering is a whole- you could spend days running it, but I thought it'll be fun because the model is available. All you need is to be able to add your information or information of a new user and you can get predictions from that. So, as you collect more and more data about a person, your recommendation systems give feedback. At this stage, I think you have learned a lot about data tools. I hope you know about R and you're able to at least interpret the commands as a byproduct of this. The main thing to take away here is that tools are part of this entire process of classification or regression so you should never be afraid of adding tools to your data kit. It's like the number of tools you have as they grow and as more tools get invented and some old ones go away, you can add more. But at this point, you should feel comfortable reading about the models, the methods and trying them out yourself. So, best of luck and hope you have fun.