At this point, you're probably wonder what Amazon specifically does to rank its product list. Amazon actually follows some secret formula that combines at least four elements that we can identify here. First is the average rating, which we've seen used many times in this lecture. Second is the Bayesian adjustment by review population. So, for some comparable family of products, however they define that. Some Bayesian adjustment being applied on top of that average rating. Third is the recency of the review. So how recently was the review made? And fourth is the reputation score of the reviewer. So this is a quality measure to see how good that review or how truthful, meaningful that review likely is to be. The exact formula is not known outside of Amazon. But we may be able to reverse engineer how these rankings are generated through an example. So let's consider the top 20 Thermoses from January 2013. And in order to do that, we're going to write a search query of Thermos and go to Home & kitchen > Kitchen & Dining. It's just a sequence of filters that you enter onto Amazon. And then at the time the top 20 Thermoses in January 2013, showed this when you combine all the results. Here's the Thermos number here of, one through 20. This is the number of reviews for each of the Thermoses. So the first one had 210, for instance. Then the number of five star, four star, three star, two star, and one star reviews. Then this average review, which is shown on Amazon itself, and then the average rating, which we can compute just by taking the number of stars and dividing by the total number of reviews. So from this list we can quickly extract two rank-ordered lists. First is the ranking by Amazon, which is just the way that list is sorted, one, two, three, four, five, six, all the way through 20. Second is the ranking by average rating. Which is, in the case of Amazon, rounded to an average customer review. So, this is the ordering, we're not going to do this computation here, but feel free to take the table and do the Bayesian adjustment computation. And so this order, 17 followed by 15 followed by one two four three, which basically means that in this order, the 17 would go first. Then 15, then one, then two, then four, and so forth. So you can see that there's quite a bit of difference between the way that Amazon's ranking this from one to twenty, and how it would've been ranked by average rating. So the ranking based on average rating has many differences from Amazon's ranking, as we can see. For instance, Amazon's placing items 17, 15, 18 and 19 much lower than the averages would indicate. So, if you look at item 17, for instance, item 17 actually has the highest average rating in this whole list, if you look at it, and it's placed all the way down at 17 out of 20 and similarly with 15, 18 and 19. They have higher ratings than those products around them at least. And so we would expect them to be ranked higher if we were just looking at the average rating. Second is that Amazon places 15 or. Second is that Amazon places five, eight, 11, 14, much higher than their averages would indicate. So if we look at, five, for instance, this is placed much higher than what we would expect because it's actually one of the lowest ratings on this list. So let's try and reverse engineer what factors might have contributed to the actual ranking. From our discussion already, the following techniques may help. First is Bayesian adjustment. The population size of the ratings matter. The raw rating scores must somehow be weighted with the population size. Second is the recency of the reviews. You can imagine that perhaps some of the reviewers rated their Thermoses as soon as they purchased them, and gave them high ratings because the products appeared durable, though sometimes items deteriorate quicker than anticipated. So, some people who had the products initially may have rated them high, and then maybe they saw that they deteriorated quickly, so then people who had waited longer and reviewed them, gave them worse ratings. Third is the quality of the reviewers or reviews. There's four categories of this. One is the reputation score of the reviewer, reviewers of high reputations we might think should be given more say than the average customer review for product. Second is, the quality of the review, which we can measure, maybe in terms of the length. Or, whether it contains certain keywords within the review and the text reports of the review. Third is the review of the review. Higher review scores indicate that, customers found a review helpful and accurate. So, if we have that 150 people out of 160 people found it helpful. That's a very high fraction of people who upvoted rather than downvoted it. And fourth might be the timing of the reviews, specifically in terms of, detecting, [SOUND] reviewer spam. Detecting review spam. So if we look at, Some intricate details with the timing of different reviews and how often they come in, whether we have a slew of negative reviews at some point. It could be from competitive products, for instance. [BLANK_AUDIO]