In this video, we'll take a look at how you can use TensorFlow to implement the collaborative filtering algorithm. You might be used to thinking of TensorFlow as a tool for building neural networks. And it is. It's a great tool for building neural networks. And it turns out that TensorFlow can also be very hopeful for building other types of learning algorithms as well. Like the collaborative filtering algorithm. One of the reasons I like using TensorFlow for talks like these is that for many applications in order to implement gradient descent, you need to find the derivatives of the cost function, but TensorFlow can automatically figure out for you what are the derivatives of the cost function. All you have to do is implement the cost function and without needing to know any calculus, without needing to take derivatives yourself, you can get TensorFlow with just a few lines of code to compute that derivative term, that can be used to optimize the cost function. Let's take a look at how all this works. You might remember this diagram here on the right from course one. This is exactly the diagram that we had looked at when we talked about optimizing w. When we were working through our first linear regression example. And at that time we had set b=0. And so the model was just predicting f(x)=w.x. And we wanted to find the value of w that minimizes the cost function J. So the way we were doing that was via a gradient descent update, which looked like this, where w gets repeatedly updated as w minus the learning rate alpha times the derivative term. If you are updating b as well, this is the expression you will use. But if you said b=0, you just forgo the second update and you keep on performing this gradient descent update until convergence. Sometimes computing this derivative or partial derivative term can be difficult. And it turns out that TensorFlow can help with that. Let's see how. I'm going to use a very simple cost function J=(wx-1) squared. So wx is our simplified f w of x and y is equal to 1. And so this would be the cost function if we had f(x) equals wx,y equals 1 for the one training example that we have, and if we were not optimizing this respect to b. So the gradient descent algorithm will repeat until convergence this update over here. It turns out that if you implement the cost function J over here, TensorFlow can automatically compute for you this derivative term and thereby get gradient descent to work. I'll give you a high level overview of what this code does, w=tf.variable(3.0). Takes the parameter w and initializes it to the value of 3.0. Telling TensorFlow that w is a variable is how we tell it that w is a parameter that we want to optimize. I'm going to set x=1.0, y=1.0, and the learning rate alpha to be equal to 0.01. And let's run gradient dissent for 30 iterations. So in this code will still do for iter in range iterations, so for 30 iterations. And this is the syntax to get TensorFlow to automatically compute the rotors for you. TensorFlow has a feature called a gradient tape. And if you write this with tf our gradient tape as tape f. This is compute f(x) as w*x and compute J as f(x)-y squared. Then by telling TensorFlow how to compute to costJ, and by doing it with the gradient taped syntax as follows, TensorFlow will automatically record the sequence of steps. The sequence of operations needed to compute the costJ. And this is needed to enable automatic differentiation. Next TensorFlow will have saved the sequence of operations in tape, in the gradient tape. And with this syntax, TensorFlow will automatically compute this derivative term, which I'm going to call dJdw. And TensorFlow knows you want to take the derivative respected w. That w is the parameter you want to optimize because you had told it so up here. And because we're also specifying it down here. So now the computer derivatives, finally you can carry out this update by taking w and subtracting from it the learning rate alpha times that derivative term that we just got from up above. TensorFlow variables, tier variables requires special handling. Which is why instead of setting w to be w minus alpha times the derivative in the usual way, we use this assigned add function. But when you get to the practice lab, don't worry about it. We'll give you all the syntax you need in order to implement the collateral filtering algorithm correctly. So notice that with the gradient tape feature of TensorFlow, the main work you need to do is to tell it how to compute the cost function J. And the rest of the syntax causes TensorFlow to automatically figure out for you what is that derivative? And with this TensorFlow we'll start with finding the slope of this, at 3 shown by this dash line. Take a gradient step and update w and compute the derivative again and update w over and over until eventually it gets to the optimal value of w, which is at w equals 1. So this procedure allows you to implement gradient descent without ever having to figure out yourself how to compute this derivative term. This is a very powerful feature of TensorFlow called Auto Diff. And some other machine learning packages like pytorch also support Auto Diff. Sometimes you hear people call this Auto Grad. The technically correct term is Auto Diff, and Auto Grad is actually the name of the specific software package for doing automatic differentiation, for taking derivatives automatically. But sometimes if you hear someone refer to Auto Grad, they're just referring to this same concept of automatically taking derivatives. So let's take this and look at how you can implement to collaborative filtering algorithm using Auto Diff. And in fact, once you can compute derivatives automatically, you're not limited to just gradient descent. You can also use a more powerful optimization algorithm, like the adam optimization algorithm. In order to implement the collaborative filtering algorithm TensorFlow, this is the syntax you can use. Let's starts with specifying that the optimizer is keras optimizers adam with learning rate specified here. And then for say, 200 iterations, here's the syntax as before with tf gradient tape, s tape, you need to provide code to compute the value of the cost function J. So recall that in collaborative filtering, the cost function J takes is input parameters x, w, and b as well as the ratings mean normalized. So that's why I'm writing y norm, r(i,j) specifying which values have a rating, number of users or nu in our notation, number of movies or nm in our notation or just num as well as the regularization parameter lambda. And if you can implement this cost function J, then this syntax will cause TensorFlow to figure out the derivatives for you. Then this syntax will cause TensorFlow to record the sequence of operations used to compute the cost. And then by asking it to give you grads equals tape.gradient, this will give you the derivative of the cost function with respect to x, w, and b. And finally with the optimizer that we had specified up on top, as the adam optimizer. You can use the optimizer with the gradients that we just computed. And does it function in python is just a function that rearranges the numbers into an appropriate ordering for the applied gradients function. If you are using gradient descent for collateral filtering, recall that the cost function J would be a function of w, b as well as x. And if you are applying gradient descent, you take the partial derivative respect the w. And then update w as follows. And you would also take the partial derivative of this respect to b. And update b as follows. And similarly update the features x as follows. And you repeat until convergence. But as I mentioned earlier with TensorFlow and Auto Diff you're not limited to just gradient descent. You can also use a more powerful optimization algorithm like the adam optimizer. The data set you use in the practice lab is a real data set comprising actual movies rated by actual people. This is the movie lens dataset and it's due to Harper and Konstan. And I hope you enjoy running this algorithm on a real data set of movies, and ratings and see for yourself the results that this algorithm can get. So that's it. That's how you can implement the collaborative filtering algorithm in TensorFlow. If you're wondering why do we have to do it this way? Why couldn't we use a dense layer and then model compiler and model fit? The reason we couldn't use that old recipe is, the collateral filtering algorithm and cost function, it doesn't neatly fit into the dense layer or the other standard neural network layer types of TensorFlow. That's why we had to implement it this other way where we would implement the cost function ourselves. But then use TensorFlow's tools for automatic differentiation, also called Auto Diff. And use TensorFlow's implementation of the adam optimization algorithm to let it do a lot of the work for us of optimizing the cost function. If the model you have is a sequence of dense neural network layers or other types of layers supported by TensorFlow, and the old implementation recipe of model compound model fit works. But even when it isn't, these tools TensoFlow give you a very effective way to implement other learning algorithms as well. And so I hope you enjoy playing more with the collateral filtering exercise in this week's practice lab. And looks like there's a lot of code and lots of syntax, don't worry about it. Make sure you have what you need to complete that exercise successfully. And in the next video, I'd like to also move on to discuss more of the nuances of collateral filtering and specifically the question of how do you find related items, given one movie, whether other movies similar to this one. Let's go on to the next video