So, I want to end the data sets module, with another fairly cool innovation, which you will probably see in practice, which is called collaborative filtering. Why is it called collaborative filtering? It's called collaborative filtering, because it's filtering based on what others have bought or what you have bought. It's done in a very interesting way and a combination of the two, that's why the collaboration part. So, here's the thing you want to predict ratings and create personalized recommendations, which we all need some, but, for books, for songs, for movies, and the data may be very sparse, and so, how do you overcome that? When you go to some popular libraries, you may see other books you may enjoy, other movies you may like. Customer who bought bananas, also bought, I can't believe it, muffins and French bread. It seems to be that, you want to create recommendation systems like that. Again, the idea here is, it's a completely different paradigm from regression or classification. You are trying to recommend something, based on some kind of association rules. There are two ways of doing it, and there is a way of combining both of them. I'll explain to you the basics of these two. One is based on users, so this is the person to whom we want to recommend, this person has already bought salad and pizza. We find a person similar to this person and see what this person has bought. He found this person has bought salad, pizza, pizza salad, and some kind of funny drink, and so we recommend this wonderful drink to this person. The idea being, find people who are similar, and recommend what additional items they bought to you. This paradigm is called, Content-based. If you bought a soda, find other drinks which are similar to it, maybe you bought a ginger ale, and then you want to find other drinks which are very similar to that. Based on the content, you want to recommend it to the user. So, if you saw romance and thrillers, and then you want to find romance and thrillers, and recommend back to the person. You found classical music which is not very fast, slow, and composed by somebody, and you want to use that idea to recommend other music to this person. That's called content-based filtering, and there are ways of combining the two, you can read a lot about it. Because this is the fun part, of the last part of this course, how will this done first? Very broadly, but then I'll give you one example. So, how do you do user-based collaborative filtering? What we do is, we create a neighborhood, so think of this being the person, and you find other persons who are close to this person. Let's say in KNN, K equal to three neighborhood, and then look at the ratings of objects which are not being bought by you, but for whom they have ratings, and take the average ratings of users in the neighborhood, and then look at the high ones, and predict it for you. So, basically find the people nearest to you, find how they've rated some other objects, the ones which you have not seen, and take an average and recommended them to you based on the order of magnitude. Maybe take the top two or three. How do you do item-based filtering? We want to predict the ratings of items like, i2, i3, i4, i6, and i7, for a particular user. So, we create a similarity matrix of all items, based on their features. The methods of creating of similarity matrix, you may like to remember, are things like, correlation, cosine similarity. If you read a bit you will see. So, you have some method of saying, how similar one item is to another. So, let's say, I choose the three largest entries, but I don't store all of them. So, you can see there are bolded items. Those are the biggest values stored in this matrix. Let's take item three, the ones which are most similar to item three are, two, five, and eight. Item two has not yet been rated, as you can see by this user, so I cannot do much about it, I can't use it. But user is already rated five and eight. So, I don't know how this user will like three, but I know five and eight are very similar, for which this user has provided ratings, and I want to find the ratings for item three. So, I take the rating of item five, I take the rating of item eight, and take an average of them. I can actually take a weighted average, so basically I can say item five, and item three have a distance of 0.4, that means similarity measure. Item three and item eight are similarity measure of 0.5. What I can do is, this is the rating of item five, this is the rating of item eight. So, here is rated four, for item five, five for item eight. I take a weighted average, 0.4 times four, plus 0.5 times five, divided by 0.4 plus 0.5, and I say that is the recommendation of the rating, for item three. So, the idea being, you find the items which are closest to the item you want to rate, for which the user already provided the rating, and use that in some weighted average distance measure, to compute the ratings of an item, which the user has not rated. Then look at the highest of them, and maybe you want to apply that to the user, and go back to the user and say, why did you buy this? I think with a little bit of imagination, you will see how to combine a and b. Because item-based filtering can be used to fill in values for every user, and then you can use these values, to go back and do user-based collaborative filtering.