In the previous course, we looked at embeddings from the standpoint of a feature cross. But embeddings are useful for any categorical column. To see why, let's look at embeddings from a different standpoint. Let's say that we want to recommend movies to customers. Let's say that our business has a million users and 500,000 movies. That's quite small, by the way. YouTube and eight other Google properties have a billion users. For every user, our task is to recommend five to 10 movies. We want to pick movies that they will watch, and will rate highly. We need to do this for a million users and for each user, select five to 10 movies from 500,000 of them. So what is our input dataset? Our input dataset, if we represented it as a matrix, is one million rows by 500,000 columns. The numbers in the diagram denote movies that customers have watched and rated. What we need to do is to figure out the rest of the matrix. To solve this problem, some method is needed to determine which movies are similar to each other. One approach is to organize movies by similarity using some attribute of the movies. For example, we might look at the average age of the audience and put the movies in a line. So the cartoons and animated movies show up on the left hand side and the darker, adult-oriented movies show up to the right. Then we can say that if you liked The Incredibles, perhaps you're a child or you have a young child, and so we can recommend Shrek to you. But Blue and Memento are arthouse movies, whereas Star Wars and Dark Knight Rises are both blockbusters. If someone watched and liked Blue, they are more likely to like Memento than a movie about Batman. Similarly, someone who watched and liked Star Wars is more likely to like The Dark Knight Rises than some arthouse movie. How do we solve this problem? What if we add a second dimension? Perhaps the second dimension is a total number of tickets sold for that movie when it was released in theaters. Now, we see that Star Wars and The Dark Knight Rises are close to each other. Blue and Memento are close to each other. Shrek and Incredibles are close to each other as well. Harry Potter is in-between the cartoons and Star Wars and that kids watch it, some adults watch it and it's a blockbuster. Notice how adding the second dimension has helped bring movies that are good recommendations closer together. It conforms much better to our intuition. Do we have to stop at two dimensions? Of course not. By adding even more dimensions, we can create finer and finer distinctions. And sometimes these finer distinctions can translate into better recommendations, but not always. The danger of overfitting exists here also. So, the idea is that we have an input that has n dimensions. So what is n in the case of the movies that we looked at? 500,000, right? Remember that the movie ID is a categorical feature and would normally be one heart encoding it. So, n = 500,000. In our case, we represented all the movies in a two dimensional space, so d = 2. The key point is that d is much much less than n, and the assumption is that user interest in movies can be represented by d aspects we don't need a much larger number of aspects to represent user interest in movies.