In this lecture, we're going to introduce the concept of supervised learning and we're also going to contrast supervised versus unsupervised learning with a few different examples. So supervised learning is the process of trying to infer from labeled data what is the underlying function that produced the labels associated with the data. So that sounds like a bit of a mouthful. So let's try and break down what that actually means. Precisely speaking, it means something like the following. We're given a labeled training data which takes the following form. It's going to be a set containing several pairs of instances. Each of those instances contains some data and a label that we're trying to predict. So we're really trying to do when we do supervised learning is to infer the function that explains the relationship between the data and the labels. How did we get to those labels from that data? So maybe a real example will make it a bit clearer. Imagine we wanted to build a movie recommender to do something like this. We're given a bunch of movies and we'd like to estimate which of them I would rate the highest. So how does that look in the framework I've described so far? For instance, what are the labels? The labels are just going to be ratings. That's what we're trying to predict, okay? So there'll be ratings that others have given to each movie and ratings that I have given to other movies. There is all the different values we'd like to predict in order to build a movie recommender. The quantities we'd like to estimate as our outputs in other words. On the other hand, what is the data for this problem? What are the values we're going to use to try and make these predictions? They're going to be basically features about the movie and the users who evaluated it. So features could include things like movie features such as the genre, action, rating or length of a movie, they could also include user features such as the user's age, gender, and location. Okay, so when I say we want to build a movie recommender, which is going to be a function from data to labels, what I really mean in this instance, is that we'd like to build a function and it goes from user features like demographics and movie features like genre, length, project classification and tries to predict the star rating. So those are our data and those are our labels in this instance. So let's think about how we might try and solve this. Let's look at a few different solutions and contrast which of them are, which ones of them are not supervised learning. For the most basic way we might try and build a movie recommender is to base it on something like prior knowledge. Let's use some domain expertise that tells us what types of movies are likely to receive high ratings or what types of users are likely to give high ratings. So here's a really, really simple example of code we might use to solve this. This code basically says, let's look at the MPAA rating for a movie and if the MPAA rating matches the user age, they'll give it a high rating, otherwise they'll give it a low rating. It's an overly simplified example but it should give you the idea. Basically, we've hand coded a solution based on our prior knowledge of how we think people might rate movies. Okay, so that's a function that goes from movie features like MPAA rating and user features like demographics and it predicts a star rating for the movie. Is that an instance of supervised learning? Well, not really. We did write down a function that goes from user and movie features to ratings, but it's not supervised learning in a sense of that function wasn't based on the data. We never actually looked at the data to see if that is the correct function for estimating ratings based on user and movie features, since it would not be an instance for supervised learning. Okay, let's try a more complicated solution. In this case, maybe we will look at movie features in the form of a plot synopsis and user features in the form of my social media posts and we'll say, if the words that appear in a plot synopsis are similar to the words that appear in my social media posts, maybe that would mean that I'm likely to give this movie a high rating. So I should recommend the movie to me which is most similar in terms of its synopsis to the texts that I've posted on social media. Okay, that's a somewhat more sophisticated solution. Is that an instance of supervised learning? It's a little bit closer to what we want in the sense that this solution is actually relying on the data. We did look at the data in the form of users features and movies features or synopsis, in order to build this function. So we did look at the data, but note that it never actually looked at the labels. We never actually looked at anyone's ratings from their rating history to determine whether this function mapping similarity between synopsis and social media posts is really a good one or not. So it's not supervised learning yet. Okay. So a better solution maybe would be to identify attributes like actors and genres that are associated or correlated with positive ratings and then, we'll recommend movies that exhibit those attributes. Okay. So this would be supervised learning because we're directly trying to study correlations or associations between features in the data and the labels. We're looking at the relationship between the data and the labels we're trying to predict. So that exactly follows definition of what we called previously supervised learning. Okay. So those are three different solutions to this problem. One based on prior knowledge, one based on looking at relationships between user movie features, and one looking for correlations between features and labels. The first two were not supervised learning, the third one was. So if we can do supervised learning is there any reason we would not? Let us look at a few of the advantages and disadvantages of each of these three approaches that we have proposed. We're starting with solution one. Well, I think the disadvantages were fairly clear. Basically, we made these very strong assumptions about the relationship between users and items. We didn't base those assumptions on our data, we just assumed people give ratings that are aligned with whether the movies MPAA rating matches the user's age from a demographic information. That may or may not be true. We didn't look at the data to come up with that solution and similarly the solution couldn't adapt to new information that arrived that might contradict that. It does have an advantage though, which is that it requires no data which sounds like a very odd advantage. But in practice, if you are trying to build a movie recommendation system from scratch, you might not have any users yet. So you wouldn't be able to collect any data. You've never seen anybody rate a movie before but you still need to make recommendations for people. If in that case you wanted to build a movie recommender system, you wouldn't be able to take this data-driven supervised learning approach. All you can really do is try and hard code a solution based on your prior knowledge and maybe you could improve upon that later. Okay, what about solution two where we were trying to identify similarity between users wall posts and movies plot synopsis? This has many of the same disadvantages as the previous solution. Again, it depends on possibly false assumptions. We never actually looked at the labels. We looked at the data in the form of user features and movie features, but we never actually confirmed that having wall posts similar to a movie synopsis results in a good rating rather than a bad one and it also may not be able to adapt easily to new settings. It might work okay for movie synopsis, how do we use this for a different type of recommendation like book or song recommendation? It does have again a similar advantage to what the first solution has. This one is a little bit tougher, it does require data. We need to know things like movies synopsis and users wall posts, but we don't require labeled data. We don't need to know anything about ratings. So perhaps if you're coming up with a movie recommender and you asked users to sign in using Facebook, you'd be able to access the wall posts and come up with some slightly informed recommendations even though you've never seen them rate a movie before. Okay, so the third solution, where we tried to identify attributes that are associated or correlated with positive ratings and use that to make our predictions. It has the disadvantage that we really require a specific type of dataset in order to make useful recommendations. We need a possibly large collection of movies that have historical user ratings associated with them. If we have that though, it's very advantageous to apply this type of approach. We can then directly optimize the measure we care about which is rating prediction and we can easily get adapted to new settings and data. Okay, so I've described supervised learning approaches. More broadly speaking, learning approaches are attempting to model data in order to solve a problem. So supervised learning which I described is trying to directly model this relationship between input and output variables so that the output variable can be predicted as accurately as possible. Unsupervised learning is just looking for patterns and relationships in the data, but it's not necessarily trying to solve a particular predictive task. So somehow the second solution we saw today is a little bit closer to unsupervised learning and that we were trying to look for relationships and structures in the data users who are similar to particular movies but we weren't trying to solve a predictive task directly like trying to predict the rating and we also saw a third instance. The first approach of hard coding things is just not learning at all. Okay, so to summarize this lecture, we basically introduced the concept of supervised learning. We tried to contrast supervised versus unsupervised learning and also non learning-based approaches and we've described the relative merits of each. So on your own, I would suggest thinking about a related problem like predicting a movie's box office revenue and trying to come up with ways you might solve it based on supervised versus unsupervised learning as well as what features you might use to do so.