Hi, in this part of the capstone project you'll add a new kind of recommendation that will help make recommendations better for you than using simple averages. These recommendations will use information about your own ratings, or for any rater given their ratings. Average ratings may be flawed. Let's look at these ratings for four movies. Mission Impossible, the Martian, Pitch Perfect 2, and Star Wars: The Force Awakens. Using average ratings for the movies as is shown here treats each rater equally. But for recommendations for me, Morgan's evaluations might be better than Jessie because I might be closer to Morgan in terms of movies I like. For you Sam's ratings might be better for creating recommendations this is the idea behind a different kind of average or recommendation when called collaborative filtering. The idea is to create recommendations specific to a user or rater rather than the same recommendations for all users. To do this your weight raters differently valuing those raters who are more like you in calculating averages. To create collaborative recommendations you need to make a few modifications to the averaging method you've already written, you'll need to find raters more like you and use their ratings then you here will be a parameter to the method used to create recommendations by using rated averages. For example, Chris and I may both dislike the movie Lucy. And both like the movie Rain Man. So Chris is close to me in some way because we have similar taste in movies. On the other hand, Sam might like Lucy when I didn't. Sam didn't like Rain Man but I did. So Sam's rating should carry less weight than Chris's ratings because Sam and I don't see things in quite the same way that Chris and I do. So I'll value Chris's ratings more than Sam's in creating a new weighted average for getting recommendations. If Chris likes the movie A Beautiful Mind and I haven't seen then it might be time to think about watching it. This is the general idea behind the new program you'll be writing. Let's look at how to calculate these weighted averages. There are two conceptual changes to the code you've already written that makes recommendations based on averaging all user ratings. In the table below, our ratings are from three raters, Chris, Sam, and Morgan. The first changes to only use ratings from raters close to me. Or to the person for whom recommendations are being made. The number of close raters is a parameter. So you might use N equal to ten to use ten close raters. The second change is the weight rating by a measure of how close a rater is to me, or to the person getting recommendations. Let's look at this idea in more detail. Which of these movies has the highest average? So it's the most recommended movie from me. The Fly has an average rating of seven from the two raters Chris and Morgan who rated it. Spider-Man has an average rating of six. The Butterfly Effect has an average rating of seven and Beetlejuice has an average rating of 7.5. Given these averages, I should watch Beetlejuice. It has the highest average rating. But Chris might be more like me than Morgan, so I should value Chris's ratings more. This will change how we calculate averages to get recommendations. Let's look more closely at calculating the weighted averages. We'll use the closeness weight for each rater in creating averages for the movie ratings. As you can see in the table, Chris' weight is 20, Sam's is ten, and Morgan's is five. We'll show how to calculate these weights next. For now we'll use the weights in creating average recommendations. We'll multiply each rating by this closeness weight. And calculating averages. Not every movie will get a rating from each rater. We'll use the weighted averages to get a recommendation specific to me, or to any rater whose closest ratings are used in calculating averages. In calculating an average for The Fly, we multiply eight by 20, since Chris's weight is 20, and Chris's rating is eight. Sam didn't rate the movie, so no value from Sam. For Morgan, we get Morgan's weight of five,times Morgan's rating of six, to get 30. This gives a weighted average of 95. That's different from the unweighted average of seven. Spider Man's weighted average is 66.67 after multiplying each rating by the rater's corresponding weight. The Butterfly Effect has a weighted average of 83.3. And Beetlejuice has a weighted average of 60. Given these weighted averages, it looks like we should watch The Fly. Note that the best movie used an unweighted average is Beetlejuice. And this is the lowest rated movie using weighted averages. To calculate this weighted average, we need to calculate a weight. How close a rater is to me, or to some particular rater. We'll represent each rater by a vector of ratings to discuss how to calculate closeness. The vector's conceptually just a list of ratings for each movie. For example, here are seven ratings by a rater named Sam. To help with this explanation, we're including movies that are not rated by Sam. These are represented by zeros. In the programs you write, ratings are stored only for those movies actually rated by Sam. Chris is shown with seven ratings, although Chris has only rated four of the movies, the non-rated movies are represented by zeros. My seven element vector shows I rated six movies, let's look at how these vectors are used to calculate a similarity weight. We'll walk through the calculation for seeing how close I am to Sam. We multiply the ratings for each movie Sam and I both rate. Sam rates this movie a five, I rate it a six. The product is 30. The next movie we both rate has a seven by Sam and a four by me, so the product is 28. For this movie Sam gives an 8, I give a 4, the product is 32. The last movie we both rate wasn't liked by Sam, who gave it a one. I gave it a six. The product is six. So the similarity weight between Sam and me is the sum of these values, 30 + 28 + 32 + 6, which is 96. The weighted similarity for Chris and me is calculated the same way. We have three movies we both rated. We calculate the sum of 12+42+54, which is 108. So I'm closer to Sam than I am to Chris since the weight is a measure of closeness. This calculation is actually a dot product, a measure of mathematical closeness in a vector space. It's good to know there's a mathematical foundation for how we're calculating weighted averages. In this case, we simply calculate the sum of the product of each movie two raters rate in common. In our actual calculations, we'll need to adjust the ratings to adjust the scale of one to ten, where a rating of one means really really don't like a movie, and a ten means really really like a movie. We want the ratings that are on a scale of 1-10 to work when we determine closeness by calculating dot products. We want raters who are close to rate movies similarly. Both like or both dislike for example since this closeness is a measure of similarity. If we simply multiply ratings how do two raters that rate a movie with a one and a two compare to those who give ratings of eight and nine? If we multiply we'd compare two to 72 a huge difference, but these raters have very similar taste. Giving a one and a two is the same as giving an eight and a nine in terms of similarity. Both raters really dislike a movie with a one and a two, and both raters really like a movie with an eight and nine. These pairs of ratings should contribute equally to a measure of similarity but they don’t. We’ll center the ratings by subtracting the middle rating of five from each one. So rather than using one and two, we’ll use (1-5) and (2-5) or -4 and -3. For the ratings of 8 and 9 we'll use 8 -5 and 9- 5 or 3 and 4. We get a product of 12 for both centered ratings, and thus we get that the ratings are equally similar. In this example, we'll show centered ratings by subtracting five from each one. Ratings that were originally zero are shown with an asterisk. We won't use those in calculating a similarity score. For example, here are seven ratings from Sam showing centered with their original non centered ratings. You see that zeroes in the original are represented by asterisk in the centered ratings. Remember that in the programs you write, ratings are stored only for those movies actually rated by Sam. Chris is showing with seven ratings, all positive because Chris likes movies. My seven element vector shows I rated six movies. Let's look at how these vectors are used to calculate a similarity weight. We'll walk through the calculation for seeing how close I am to Sam. We'll multiply the ratings for each movie Sam and I both rate. Sam rates this movie as zero, I rate it a one, the product is zero. The next movie we both rate has a two by Sam a minus one by me so the product is minus two. For this movie Sam gives it a three I gave it a minus one the product is minus three. The last movie we both rate wasn't liked by Sam who gave it a minus 4 I gave it a one the product is minus four. So the similar weight between Sam and me is the sum of these values, zero plus minus two plus minus three plus minus four, which is minus nine. The way that similarity for Chris and me is calculated the same way. We have three movies we both rated. We calculate the sum of minus three plus 12 plus four, which is three, so I'm closer to Chris than I am to Sam, since a rate is a measure of closeness. Remember that in the original non-centered ratings I was closer to Sam. So this makes a difference. You can see in the ratings that Sam and I don't really agree. When I like a movie, Sam doesn't. And vice versa since all the products are negative. This technique is standard in calculating similarities but it's easy to forget. Similarity ratings will change if ratings on an all positive scale like one ten to aren't centered in this way. Let's look at the Java code for calculating these weighted similarities. To find the raters, or near to any rater, we'll call the method get similarities, that you'll complete for this capstone. The parameter ID is the rater for whom similarity ratings will be calculated. A class, RaterDatabase, will supply access to each rater, given a rater's ID. This class is similar to the MovieDatabase class you've already used. The Rater Database class also supplies access to all raters. Just as the movie database class supplied access to all movies. Though the movies could be filtered. In this loop you'll call another method to calculate the dot product between me and the rater, r. This dot product value will be paired with rater r's ID and a rating object and added to the array list being returned. Before we turning the array list, the code will sort the list so that first rating is that of the rater with the highest rate, the one closest to me. We can do that by calling collections.sort and passing the comparator that's part of the job that you took collection class. This comparative reverses the order of the rating compare to function so that list will store highest values first. Once you've got the weights for each rater, you will be able to calculate a weighted average to get recommendations. This method is similar to getAverages, but this one particular rater whose ID is a parameter. First the weights for all raters are calculated by calling the method getSimilarities we've just discussed. As with the getAverages method, this code loops over raters, in the getAverages method the loop was over all raters and we checked to see if each rater had rated the movie whose average was being calculated. Here we loop over just those radars who are close to me. Those for whom weights are stored in the array list named list. We use only the first numerators entries in list. Where numerators is a parameter. The idea is to use just the top ten or 20 or 100 Raters who are closest to me. You'll need to be careful in ensuring there is no bad indexing and getting ratings from list. After accumulating a weighted sum the weighted average will be added to the array list being returned. Just like the code and the unweighted getAverage method. You will return the list of movie ratings. You may want to sort it first. After getting collaboratively filter recommendations you will need the right code to present these to the user. You'll need to decide whether you should provide recommendations for movies already seen. This is a list of recommendations for me, based on ten ratings I created for movies I've seen this year. Movies I rated are shown with asterisks in the output. These ratings might help me calibrate the results. Since I liked these movies, seeing them in the list of the top 15 makes sense to me. Even though I don't need a recommendation to see them, since I've already seen them. Should we print more than the top 15? Should we print all the recommendations? Should we print the weighted average? You'll also decide whether you should print more information than just the movie's title. You could print the year, the genre, or more. You could generate HTML output to display the recommendations in a webpage. Have fun finding recommendations.