1:01

The real question that arose was, what can we do to improve matrix factorization

Â techniques to bring back, what I'm loosely going to call content attributes or

Â you could even call meta data.

Â We know that content preferences are real and might help improve performance.

Â We also know that one of the weaknesses of matrix factorization is that you end up

Â with really hard to understand profiles, that the dimensions of

Â the factorized rating matrix are rarely understandable.

Â And so you were telling me about some ideas for, how we can think about this

Â problem a little bit more generally, why don't you lead us through those ideas?

Â >> Sure, yea.

Â So when you think of the basic problem typically mostly

Â people think about the user versus movie.

Â There's a ratings matrix and that's how your data looks like.

Â But if you really look at the data more carefully, you'll realize that you'll

Â probably have some demographic information or other information.

Â For the individual users you'll have information related to the movies,

Â like the cast, the reviews, and other things, right?

Â And if you just focus on the user versus movie ratings matrix,

Â that one data matrix, you are missing out on these other attributes,

Â the site information and the content that is there.

Â So a more general way to sort of thing about is simultaneously take

Â into account all these multiple dimensions as you are doing the factorization.

Â And of course, the data looks much more complicated.

Â So as you can see sort of in this schematic diagram.

Â The yellow thing is your usual user versus movies ratings matrix.

Â But the users are sort of at the bottom and

Â they have demographic information jetting out in blue.

Â And then the movies are along the columns, and then so

Â there's reviews associated with the movies and there's movie cast and so on.

Â So this is how your data really looks like, so

Â the goal really is instead of just focusing on that yellow matrix.

Â Can we take into account all of these as we are trying to do the matrix

Â factorization?

Â And that's sort of this multiple dimensions, and

Â doing a collective matrix factorization.

Â >> And so then, we're going to start by modeling

Â the way these various dimensions come together.

Â Why don't you walk us through this in a graph representation?

Â >> Right, so one way to think about this kind of a data is on the left hand side

Â we have some things like viewers, actors, movies and review words.

Â So this is sort of what will be used for the reviews.

Â So these are the entities in some ways.

Â And the relationship between sort of a viewer and

Â a movie is a ratings matrix right.

Â And now if you pay attention that if a viewer is actually rating

Â individual actors in a movie, that will be a viewer actor movie ratings.

Â It will be multiple matrices, where every ratings matrix corresponds to one actor.

Â Now, if you look at the costing matrix,

Â that's a relationship between the actors and the movies.

Â So, thatâ€™s a matrix connecting movies with the actors participating in it.

Â Then movies and review words are connected by their reviews,

Â viewers are connected to demographics.

Â So, this relation graph essentially with entities on the left-hand side and

Â relationships on the right-hand side.

Â Sort of, and you can draw it based on what kind of data you have.

Â So that it's fully laid out, you know what kinds of things you have.

Â The righthand side things are usually matrices, the lefthand side thing

Â are indices of those matrices of the sides of the matrices.

Â >> So just as an example, if we were going to add in genres.

Â We could have those as entities and we might have a relationship that movies have

Â genres but we also might have that viewers have expressed a preference for genres.

Â >> Exactly, that's a perfect example.

Â So we'll have one entity on the left hand side which will be genres and

Â the movie and genre will connect to that matrix with just.

Â And similarly for viewers, we can have that as well.

Â >> Right, and as you pointed out,

Â this can allow some fairly sophisticated relationships.

Â So, one of the things that our typical systems don't allow is saying g I

Â liked Sandra Bullock in a particular movie and

Â of course I'm in the blind side but I didn't like her in a different movie.

Â >> Sure.

Â >> And that type of deep rating relationship of

Â course requires some challenging interfaces to get people to express it.

Â But if you're mining reviews, you may very well get that data as a byproduct.

Â >> Yes, and that'll be sort of a very rich data.

Â So, in this case, it'll be the viewers are actually rating the individual

Â actors in movies and that's difficult to do, but

Â they often express those thoughts in their reviews as well and we can extract that.

Â >> Wonderful.

Â So before we actually go through this graph,

Â one of the things we're going to talk about is that you've actually done this.

Â And probably be useful to take a few minutes to just talk through,

Â what does it mean to take multiple matrices like these and

Â factor them together so as to come back with a better model?

Â >> Right, so what we usually do is that is let's say you have an entity like a movie.

Â And then when you're looking at just the movie versus user matrix,

Â the ratings matrix, you get a latent factor or representation for each movie.

Â Now when you have multiple such entities and relationships,

Â the movies will still have this latent representation

Â like as a vector as you said those dimensions are hard to interpret.

Â But every movie will have sort of a maybe five dimensional vector representation.

Â 7:51

That's nice because it's optimal but

Â it only counts two dimensions worth of information.

Â If we now come out and say well what I have is users to movies and

Â then cast to movies and we co-factor these.

Â We're going to come up with different set of dimensions that allows

Â both matrices to share the latent dimensions so

Â that a movie in one has the same vector approximately.

Â >> Yeah.

Â >> Has a movie in the other.

Â >> Yeah.

Â >> It sounds like part of what we're going to end up with is

Â a dimensionalization that's not as perfect for either matrix alone but that

Â jointly is better and gives us the ability to bring those two matrices together.

Â >> Exactly that's perfectly correct.

Â And we simultaneously approximating both these matrices and that representation for

Â every movie will be different for had we only looked at one of those matrices.

Â >> So one of the questions obviously is, so does it work?

Â So why don't you tell us how it works?

Â >> Yeah. So I think in the literature now,

Â there are many different models trying to do this.

Â What we have shown in the plot over here.

Â So our comparisons between PMF.

Â Which stands for probabilistic matrix factorization.

Â 9:08

This is the paper that became popular during the Netflix competition,

Â the PMF paper.

Â And GPMF is a variant that we cooked up, which is a generalized version of

Â PMF which can take into account these additional dimensions and factors.

Â So in this particular comparison, as we are increasing the rank.

Â We are showing how the performance improves where you have,

Â just like Joe said, that you have the ratings matrix.

Â And you also have the movie versus cast matrix and

Â you're trying to sort of capture that.

Â So as you give more rank,

Â as you increase the rank you can see that the performance of this more fancy model.

Â The GPMF actually keeps improving because it can take advantage of this additional

Â information that's available in the cast and it can align that and

Â can understand what types of movies the user likes.

Â What kinds of people work in similar types of movies and so on.

Â So there are been sort of a parallel development in machine learning,

Â data mining, and other areas.

Â So if you look for things like collective matrix factorization,

Â there's a whole bunch of models that go by that name.

Â Which try to accomplish similar things.

Â And you should be able to find, you know, enough ideas on how to this can be done.

Â But this is just one such idea that we worked on.

Â And it definitely showed improvement.

Â >> Well it's neat, because it shows both things on the same graph.

Â It shows one that this is always better.

Â Even though we're not doing the single perfect metrics factorization,

Â having more data gives us a better result.

Â >> Right. >> But what we're also showing is that

Â the total amount of information in the system Is increasing because

Â we're able to take advantage of more dimensions.

Â >> Right. >> And

Â obviously what PMF is showing is it gets worse is that itâ€™s over fitting.

Â >> Yes, exactly.

Â >> Because it's used all the information it could possibly use.

Â >> Exactly, its sees the [INAUDIBLE].

Â >> So the one other thing that comes out nicely from this is a little bit better

Â hope of having some forms of content understanding and bring up both of these.

Â From your work you had talked about how you could find cast clusters by

Â identifying clusters of cast members.

Â That had nearby vector representations, >> Right.

Â >> Of this representation.

Â >> Right.

Â >> Can you tell us a little bit about that?

Â >> Sure, yeah, so

Â in the example that we are talking about you have a user by movie matrix.

Â Which is like usual ratings matrix and the movie by cast matrix,

Â which is sort of the new thing we threw in and

Â then we are sort of looking at the joined factorization.

Â So you are going to get this latent factor representation for every actor or actress.

Â And then you can cluster them based on these five dimensional or

Â ten dimensional representation.

Â You can do just run around something like [INAUDIBLE] or

Â do something more fancy to find clusters among the actors.

Â So we looked at some of the results and some of it made sense so

Â we found one cluster where mostly the science fiction actors got together

Â a lot of Star Trek, you know Apollo 13.

Â Ed Harris and Nimoy and so on.

Â All of them grouped together.

Â We found another cluster which is largely actors in the 40s through 60s.

Â You know, Cary Grant, Humphrey Bogart and people like them.

Â So so some of these actually,

Â if you carefully start poring over the results, they make more sense and

Â there's a better interpretability as to what may be going on over here.

Â >> Well, part of the thing that makes this kind of analysis neat is that

Â it can sometimes help you either back or refute intuitions.

Â And so- >> Yeah.

Â >> You look at the bottom there and

Â Paul Newman actually did most of his acting after many of those

Â people had stopped. >> Right.

Â >> But there's this sense that he was a throwback to the type of actor

Â of an earlier era.

Â And what we can see is that if you look at the data of the movies

Â that are liked by people and the actors that are in them.

Â When we bring all of this together sure enough.

Â He sits there along side Cary Grant and

Â Humphrey Bogart as opposed to some of the actors who might have been later and

Â part of a different generation stylistically.

Â >> Yeah, that's true.

Â >> And so none of this necessarily means that you have

Â an easy time again describing what the dimensions are.

Â >> That is true.

Â >> That's always going to be a challenge with matrix factorization techniques.

Â But you might have an easier time explaining

Â some attributes of a movie that somebody would like because

Â you can express it in terms of some of these content spaces.

Â >> That is true and in some ways these clusters are post processed versions of

Â those latent factors.

Â So we did a clustering on those factors and then this is somewhat interpretable,

Â but you're right that the dimensions of those latent factors are still difficult

Â to interpret.

Â >> So one last question, where is this type of technology in

Â terms of the continuum from an idea in the research lab,

Â out to everybody uses it in all of their systems today?

Â >> So my sense is, and this is more coming from academia is that

Â this has been explored quite extensively and there's lots of ideas out there.

Â I think one challenge in adopting this fully Is that,

Â first of all, as you throw in more types of entities.

Â The data sparsity problem is exacerbated.

Â You have just many more of these large sparse matrices.

Â And then the scale of the problem increases.

Â So you have to sort of come up with tractable algorithms which scale to

Â the right things.

Â >> So I think that's where things are,

Â people are navigating their way through these, is my understanding.

Â >> So it sounds like it's a technology to keep an eye on.

Â >> Yeah.

Â >> And it wouldn't be surprising if we start seeing some of the industry

Â leaders jumping into these ideas,

Â to try to ratchet their recommenders up a little notch.

Â >> Yeah, I think so.

Â >> Well, wonderful.

Â Thank you so much for joining us.

Â >> Thanks for having me.

Â >> And we'll see you again soon.

Â