Welcome to Lecture four of Networks, Friends, Money and Bytes. And today's topic is Netflix. Let me first warn you though, that throughout the course of twenty plus one lectures, if you plot the length of the video lecture, okay? Against these twenty lectures, they go up and down. The average might be 75 minutes across all the modules, okay? So, they may go up and down if I interpolate. Now, this Lecture four is actually, going to be by far, the longest lecture. You will see that we have quite a few modules and each module is kind of long because we have to introduce both the concepts of collaborative tutoring, okay? Individualized recommendation by scaling out the system, the network as well as the mathematical language of more optimization theory. Again, if you plot the level of mathematical difficulty involved, or the amount of symbolic operation involved against these twenty lectures. So, they go up and down quite a bit actually, you know? I'll say that, this lecture is perhaps, by far the most mathematical one. You, if you can pass lecture, you probably can face any of the remaining mathematics throughout the rest of the course, okay? So, to some degree we are climbing up a hill today, and this hill is both conceptually literally involved, and mathematically would require some hard work as climbing up a real hill would demand. But, if you can climb through this hill, then the rest of the course should be actually much easier, especially on some of the lectures coming up. But, having said that, let's first review what we had in the past three lectures. We saw three beautiful equations, and they are very useful equations too. The first one was in Lecture one, DPC, Distributed Power Control used in 3G CDMA data networks. And it says that, for each transmitter of logical pair, i, the transmit power, P, at the next discreet time slot, t + one should be whatever it is right now, P sub i of t, times a gang parameter and that parameter is the ratio between the target SIR gamma, and the currently achieved SIR on this length. Now, this does not change over time as far as this algorithm is concerned, it is a constant. And this varies over time as different people change their transmit power level. And we saw that this is essentially, an implicit feedback signal telling you whether you should increase or decrease your power. And the second equation that is beautiful and useful we saw in Lecture two can be summarized as following, that it is the payoff function for the i-th bidder in an auction, U sub i is a function of all the bidders' bids, just like the SIR is a function to all the transmit powers, the vector b. And, it is the difference between your private independent evaluation, V, and the price you have to pay, which depends on all the bids. And this function, the price, which is not the power now, the price, as a function of all the bids. The shape of this function is determined by the designer of the auction. And for different kind of function P of b, you will induce different kind of behavior by the bidders. They will bid different, bi, stop the optimal according to some metric, b, for example, whatever maximizes the utility. Now, of course, this is assuming you actually win the allocation. If you do not get item then utility is zero. This definition of payoff function, which also highlights the fact that you can design the auction and then that will induce different bidding behavior, is the second equation. The third one, which we touch upon last lecture, can be written as the following. Simply the vector pi star transpose equals pi star transpose times G. This is the Google matrix that we defined last time, and it is defined in such way that the corresponding eigenvector for responding to the largest eigenvalue of one can be uniquely defined and efficiently computed, okay? So, this defines that Google PageRank score, which further leads to the rank ordering of the web pages. Now, if you count how many times DPC is used in the mobile world. If you count how many times auctions used in many contexts including the online space, if you count how many times the Google PageRank is used every time you search on Google, you can tell these three are powerful equations. They are simple but they are very, very widely used. So, now, we're going to continue this track of four lectures. We've talked about power control, distributed protocol. We talked about second price option, we talked about PageRank. And now, collaborator filtering for recommendation, like on Netflix. And these four lectures also introduce the language of optimization, then game, then graph, and our learning theories. Now, the backdrop is Netflix. To those of you who are in the US, you know Netflix very well, started it's DVD rental business a while back, actually, in 1997. So, the story says that, the founder of Netflix was tired of paying fees to the DVD rental stores. If you remember those days, there were a lot of those on the street corners. And if you don't return the DVD in time, or the VCD in time, or the tape in time, then you pay extra fees. He says, well, that's not very appealing to me. So instead, he says, you can keep the DVDs as for as long as you want, okay? As long as you keep paying the monthly fee, then that's actually great thing for the company. You don't return, I don't send you new ones, okay? So, you can keep paying the monthly fee without getting any new DVD. This idea is going to show up again in a completely different context of sliding window control in TCP, congestion control. That would be in Lecture fourteen. But, is effectively the same concept. You don't return, I don't give you more. And by 2008, in the US, there were about nine million users of Netflix DVD rental. And then, that year or so, Netflix started trying out something quite new of using the internet to deliver feed. Internet as the medium instead of the postal service to deliver video. So, the content is stored in some video servers, is sent through the internet and may be wireless networks to your internet connected devices, which could be TVs or set-up boxes, but also could be game consoles, smart phones, tablets, PCs and so on. And by 2011 spring, it's got 23 million customers. And, in summer of 2011, it was counted that about one in every four, a quarter of the bits going through the internet that month was Netflix traffic. That was huge, okay? Now, of course, video is a, a very big files and therefore, if you count by volume, you're going to get big numbers of percentage. But still, one in four, that's a lot. And then, in September 2011, you may remember that Netflix tried to split into different, two different businesses separating DVD rental, online streaming and then they later put them back together but the billing still what became separated. Now, later what we will look at the technology network involved for multimedia streaming over the internet in Lecture, I think, seventeen. We look at Netflix, and Amazon, Prime, Google, HBO, Go, IPTV and so on, look at the differences across the models. Today, this Lecture four, the focus is, however, on the social network dimension of recommendation system, okay? You and I, the customers on Netflix, also form a network. These 23 million and, bigger number now, they form a social network. They don't directly interact, but the way that they watch and rate movies will be used and leveraged by Netflix to make better individualized recommendation. Think about it this way, you're on open online platform, okay? Whatever that platform might be. And there are, maybe, 100,000 of you. By scaling out, scaling up a social network, we actually can then scale down the individualized recommendation because we get to see other like minded people like you how they behave, what they like and don't like. And we have much more data. So, even if you don't provide a huge amount of activity on the platform, we get to see others. And this is the basic idea of Collaborative Filtering, CF. Now, there are many kinds of a recommendation system. For example, on Amazon, you may notice that each time you refresh the homepage after you log in, they will recommend different kind of products to you. That's largely based on what's called content based filtering. Basically, if you purchase certain kind of products before, like certain kind of electronics, then they will recommend similar electronics to you in the future. If you buy a book written by Steven Hawking, then they will try to recommend new books by Steven Hawking. That is called content based filtering, okay? Look at what you bought, what you'd liked, and then recommend accordingly. Youtube, we will see later in a few lecture's time the way they make recommendations is largely driven by the so-called co-visitation counts. Briefly speaking, it means that it will look at how many users watch these two videos back to back, okay? And then if so, then we put a score to the co-visitation count between these two videos, say A and B. And then, this will give you a way to start quantitatively decide, what video should you recommend. We'll come back to this, later. In Pandora, if you have used that, it is a mostly expert-driven recommendation of mix-up music. But, as a consumer you get to thumb up or thumb down Just like in some discussion forums, you get to vote up or down. Now, Netflix does not want to use expert-driven movie review, even though there's no shortage of movie reviewers, professional ones out there. It instead, uses a collaborated filtering. The idea is that not only we'll look at what you like, but we also look at what similar people similar to you liked and watched. The question is, how do we define similarity between people, or flipped the other way around between movies. Before going further, let's first more rigorously define this recommendation system. A recommendation system is very helpful feature, okay? For stickiness of the consumers for inventory control and so on and so forth. Now, in the case of Netflix, you can think of this as a, say, a black box. There are inputs, outputs. What are the inputs? First of all, the user ID, okay? That's your log in. We're going to index user by u. Second, movie ID, we're going to index each movie by i. And then, there will be of course, a rating which is really a number drawn from one, two, three, four, five stars, and we call this r sub ui, okay? R for rating, of user u or movie i, And, the time that this happened, this review was entered, it's tui. Of course, there's also the review of text, okay? And, we're now going to talk much about text review in today's lecture. So, we've got u and i, and rui, and tui. How many rui's and corresponding tui's are we talking about here? Actually, a lot. Think of millions of users and tens of thousands of movie titles. Of course, only a few percent of the users actually watch all the movies out there. And, or any substantial amount of movies out there. And, among all the movies that you actually watched, you likely will only rate a few of them. Some people like me, actually, rate none of them, I just don't like rating movies. But, a lot of people rate. But, they don't necessarily rate every single movie that they watched. Despite that, we're talking about the order of billings of rui's for Netflix. Now, it's a much challenging problem, actually, when you just have very few rating in your database, the co-star problem. But, Neflix doesn't have that problem now. Alright, those are the inputs. What does the recommendation system do? What is the output there? The output, primarily of course, is the predicted rating, lets put a r hat ui, okay? Now, in the case of Netflix price, they actually know the true rui. They just don't tell you, the competitor into the price, competition. In that case, you actually know the ground truth, then you can compare your prediction r hat ui and the actual rui. But, of course, in the real operation of Netflix, they don't know the ground truth that's why they want to recommend you to watch their movie, okay? So, you are going to just have to believe that if it works well in the test set where you know the ground truth as if it's going to work well, and other part of the system too, okay? So, this r hat however, could be a fractional number, it could be a real number, for example, 4.2 star. You cannot rate 4.2 star but the prediction might say 4.2. So, you can interpret that as maybe 80 percent of the chance this user u will rated this movie i with five star, and maybe twenty percent of chance he'll rate it as four star. Maybe that's an interpretation of number 4.2 This r hat to be also be going above five or go under one, but then we have to clip it so that if it is above five, it's five, if it's under one, we just call that one. All right. How do we compare two recommendation systems? What is the metric of [inaudible] that we're going to use to do the comparison to say, look, this recommendation system is better than the other one. What do you mean by better? Well, there are three levels of understanding here, going from most relevant but least tractable to most tractable but, perhaps, least irrelevant among the three. The first level is there, of course, eventually, what I care about is Netflix is customer satisfaction. If I recommend you to watch some movie, do you lack that recommendation in the end? Okay. That actually is just very hard to gather data. The second level is, all right, at least I want to know what is the prediction effectiveness, okay? If I recommend some movie, do you even watch it or not? Let alone you like it or not? Again, that's hard to collect. So, we're going to look at a proxy. Just like in Google PageRank. Eventually, what really matters is, does the search inquirer actually find the resulting search listing useful or not? But, that ultimate test is very hard to quantify and collect data about. So, they use the this, PageRank to say, well, I'm going to say, this is the importance of nodes. Here, we are going to just say, let's quantify the notion of prediction error here by looking at r hat ui and rui, okay? Again, how do you know rui? In real life, you don't. But, when you compare different recommendation systems, you can hold some known data as ground truth and then use that as a test data set, okay? Well, of course, I would look at the difference between the two. Maybe this over shoots, maybe this is under shoots. So, it can't just be the difference. It could be the absolute value, okay? It could also be the L2 norm, the square of the difference, okay? So, if I overshoot by one star or undershoot by one star after squaring the all positive numbers, meaning, positive error, okay? Term. So, I'm going to square them. And, how many do I have? Let's say, I've got a C of, of those ui pairs. Well, I would like to base my comparison of recommenders on, okay? So, I'll have to sum up across those ui pairs, and the C of them, this squared difference. But, since you squared the things, you actually changed the scale to bring the scale back down, people often put a square root. And that is so-called Root Mean Squared Error, RMSE. This effectively, so-called L2 norm of a vector. The vector being the error vector, the difference between r hat and r across all these ui pairs. You're going to take a look at the each entry of this vector square it, then average it and then take the square root. Of course, when you minimize an error, whether you minimize the square root of the error or without a square root the solution won't change, okay? So, min square error or root min square error, either one, can be used equivalent as the objective function that you want to minimize. All right. So, now, we're looking at this RMSE or MSC as the quantitative metric describing how wrong your recommendations are across this set of test points, and we'll use that as a proxy for these eventual goals. So, our job is not to say, give me the inputs, okay? U and i, and rui and tui, I'm going to find a way to make recommendations and give you an output r hat, okay? I will train this black box by looking at known r and known, and, and the resulting t hats. How do I train them? By minimizing the RMSE. Then, I say, alright, good, I'm done with training this black box. Now, throw at me some questions where I don't know the ground truth r. Maybe you, as the examiner, knows the ground truth r. Well, good, then keep it to yourself and I'm going to make a guess r hat. Then, you check your ground truths, see how good I am. That is the spirit of Netflix price. It was announced in October 2006. And, the challenge is to say, can you make a really good recommendation system? I'll give you data to train your recommender. Now, can you make it work really well, better than what I currently have, with I meaning, Netflix.