In this lecture, we will start with an introduction to personalization on the web, and I'll specifically dive into recommender systems. Recommender systems, also sometimes referred to as recommendation systems, are systems that attempt to predict what items or product or content are of interest to a consumer or a customer, based on some information about the customers, such as their user profile or their past purchases or the ratings. The most common example of a recommender system is what is known as a collaborative filter. It is a system that recommends using taglines or phrases such as customers who bought this also bought this, or customers who viewed this product, also viewed that product or people like you bought these other products. These recommendation systems are unique because they add value to both customers and to firms. For consumers, they help them learn about new products and they help them sort through large choice sets, meaning when you have lots of choices, they help them identify the most relevant choices. For firms, they help in converting Browsers to buyers. They help in cross-selling products, and they help increase loyalty by providing a customized or a personalized browsing experience. There are many examples of this, for example, if you've used Amazon.com or other similar retailers, you're used to a message such as people who bought this product, also bought these other products or on Google News, you might be used to seeing personalized news recommendations or on YouTube or Netflix, you might be used to seeing personalized video and movie recommendations as well. If you look at the designs of these systems, at a high level, there are two main types of designs that are used in the industry. The first is, what are known as content-based recommenders, and these are systems that attempt to find other products of interest to consumers based on some information about product attributes, so for example, if you like a product, the systems look at the attributes of these products that you like and it find other products of a similar nature, an example of that is Pandora's music recommendation system. The other design is what is known as a collaborative filter. A collaborative filter doesn't actually go deep into the product attributes, instead, collaborative filters recommend items based on what others are consuming, so they try and find other people who have similar tastes and they recommend what else they might like, so for example, people who bought this also bought this. Let's dive into both of these examples and look at them in greater detail. First, content-based recommenders, as I mentioned, they tend to focus on content attributes or product attributes, and they tend to find other products with similar attributes, so an example of that is Pandora, which is online music service, and Pandora essentially recommends and play songs that are likely to be of interest to consumers. Pandora emerged from a project that was known as the Music Genome Project. Essentially in this project, many artists, a hundreds or perhaps even thousands of artists listen to millions of songs and rated the songs along multiple dimensions or attributes. For example, you might say that a particular song is getting a high score in terms of electronica influence, but it might have a low score in terms of how rhythmic the song is, also known as the rhythmic syncopation, or it might actually get a low score in terms of major key tonality, these are different musical qualities of a song. A different song might get a high score in terms of rhythmic syncopation, but might get a low score in terms of electronica influence, but get a high score in terms of major key tonality, and so on. In this example, I just mentioned three different musical attributes of musical qualities of a song, which is electronica influence, rhythmic syncopation, and major key tonality. But in practice, Pandora has over 150 different attributes for any given song, and given these attributes and given a database of a large set of songs and with very deep information on the musical qualities of each song. The goal of Pandora is to now start recommending songs. The way that works on Pandora is a user comes in and starts by indicating that there's some song they like. For example, I might log into Pandora and indicate that I like the song ''Thunder'' by Imagine Dragons. Pandora looks into its database and finds other songs with similar musical qualities, and then it recommends those songs. For example, when I logged into Pandora and indicated that I like Thunder by Imagine Dragons, Pandora next played ''Ride'' by Twenty One Pilots, and it specifically said that it recommended this song because it has a dub production, it has a reggae feel, it has an acoustic rhythm piano, it uses a string ensemble and it has major key tonality. These are the attributes of the song Thunder by Imagine Dragons that it also found in Ride by Twenty One Pilots. As you can see, this is based on deep knowledge about the product of the content being recommended, and this design can only be used when you have a lot of metadata about music or products in general. These systems can adapt, so for example, if I listen to a song and give it a thumbs down, in other words, indicate to Pandora that I do not like that song, then Pandora can incorporate that feedback and learn, and this is where learning comes in and adjust on the flier and show us different songs that are closer to our preferences and that are different from the songs we dislike, and so this is a key portion of the learning that is inbuilt into these algorithms. The other design that is different from the content-based design is the collaborative filter. Now, collaborative filtering does not require very deep knowledge of the products being recommended. They don't require attributes of songs or other products. Instead, they're based on information on what other people are consuming. For example, Amazon's people who bought this also bought that is based on collaborative filtering. In fact, when Netflix originally launched its streaming service, initially the design was based on collaborative filtering. It essentially grouped users into different persona's based on their ratings and viewing patterns, and then made suggestions or recommendations to people based on what other people like, essentially what other people with similar persona like. Now, last.fm is an online music service that uses collaborative filtering to make recommendations. The way this would work is that if I go to last.fm and indicate that I like Thunder by Imagine Dragons, now, last.fm does not necessarily have deep knowledge of the musical qualities of Thunder. Instead, it would look at what other users have liked the song Thunder. Once it identifies other such users, it looks at what other songs these users have liked, and it recommends their songs. These are the two main designs. Within collaborative filtering, there are many variations of that design. For example, there is a design known as an item-to-item collaborative filter. Essentially, this design recommends songs to users based on what others are consuming, but the input to this design is a specific item that you've indicated that you like. For example, I started by saying, I like Thunder by Imagine Dragons, and that is the input that is used by the system to start recommending songs. Therefore, that design is an item-to- item collaborative filter. The other design essentially uses information about the user as an input and not a specific item. This design is known as a user-similarity based collaborative filter. Essentially what this design does, is it looks at all our past history and looks at all the products that we have liked in the past, and finds other people with similar preferences. I mentioned that Netflix used a design previously where it essentially created persona's of people, and then found other people with similar persona and recommended what else they like. That is a user-similarity based collaborative filter. Either way, both these designs don't require deep knowledge about the product being recommended. As a result, they're very easy to build and also very cheap to build. That's why they are also very popular. The other aspect of their popularity is also that they're quite effective in practice. All of us have are used to seeing recommendations on retail sites like Amazon or on sites like YouTube, and we know the influence choices we make. Indeed, collaborative filters designs, although they're much simpler than content-based designs, are equally effective and therefore are more popular because of the simplicity of the design. At the same time, there are some challenges with building these systems. Whether you're building a content-based recommender or a collaborative filter, you have to contend with how much data you have, and you need enough data to start making recommendations. The collaborative filtering in particular leads a different data set than content-based recommenders. Content-based recommenders need lots of information about the attributes of the products being recommended. In contrast, a collaborative filter uses information on what other people are consuming, and so it requires a lot of information on what others are consuming. One challenge in practice is that data might be sparse, given user might have only rated, say, five or six items in a catalog of millions of different videos or songs of a company, and therefore, now the company has to figure out how to recommend songs to people or videos to people based on these limited recommendations. These systems have to figure out, for example, how are two users similar to each other, even though they have each rated just five songs in common, and these five songs might actually be very different songs with very little overlap. Sparse data is one problem. The other problem is what is known as a cold-start. In other words, a cold-start is essentially the problem of how do you start making recommendations to new users when you have no information about their past choices of interests or preferences. Also how do you recommend new items that have not yet been purchased by other people or not yet being rated by other people, but have been just added to the catalog and you'd like to start making recommendations. That is another challenge. There are many design challenges. Data scientists will spend a lot of time thinking through some of these design challenges. But this is a field that is quite mature, and there are some very good answers to these questions. It is not very complicated today to build these systems. Companies also have the option of using a third party system. If you don't want to build it yourself, there are third party companies that provide product recommendations as a service that companies can incorporate. In summary, there are many designs of recommendation systems. The two most popular ones are content-based designs and collaborative filtering designs. These are very different designs in practice. Both are quite effective, and there are different trade-offs with these designs. In later lectures, we'll actually explore some of these trade-offs a little bit more.