Welcome to the Python code for Module five or lecture five. Here, what we are doing is we are, as you recall it in lectures, we are now going to go through exact same exercise that they did, but utilizing optimization both gradient free and gradient base, and then you compare the results with regressions. It's just a simple exercise in that the regression could be also be written as an optimization problem. Here, as usual, we load packages, the data processes, clear exact same data, I don't even have to go through that. What we are doing is we are using actually, first, I go through Gradient Descent and for just visualization purposes as I did in the lecture, I'm looking at the two-dimensional case. First, I build the for, I believe if I'm not mistaken, I'm doing five against two, then I'm showing you the surface for the objective function, that's the surface for the objective function. Now, I'm looking at the contours for that surface from the top and then have said the contours simply means what you're seeing here, they're all at the same level. That means every point on this, they all have the same level, they're all equal. Now, what I'm doing is, I'm going to, for the optimization, I am using Gradient Descent, as I said, and is a batch Gradient Descent, that means I'm utilizing the entire data. You can make it mini-batch, I mentioned this one earlier, you can do a stochastic Gradient Descent. I'm just doing here batched, that means I am utilizing the entire data for it. I'm using for the learning rate, I'm starting from a very small learning rate and assuming max iteration of 300 exactly consistent to what we had during our lecture in our slide. The starting point is minus a 100 and minus 100, that's exactly what we had as well. As I'm running this, that's what you see. The results would be that you've seen that you start from this point and be coming in. But, as you remember, that would be actually premature stop. That means because the learning rate is too small, it did not converge and you seeing that this red line which suppose to go through this bunch of data didn't actually reach it because, as I said, it was a premature stop. Now, we do exact same thing, these are exactly the parameters you remembered in the slides. It stopped, but no way it was optimal, it reached 300 iterations without reaching the optimal points. Now, in this next trial, I'm using actually learning rate which is a bit higher, exact same starting point. This term, actually as you see, we stop at 264. That means we reach even before 300 and you see that this red line is telling us that that's the best possible line that you can pass through the data, five versus two. I repeat this fun for, and these numbers are exactly what she remembered in the slides as well. Now, I'm increasing the learning rate going to now 0.2. Just seemed that I can push it to convert sooner. The answer to that is yes, 140 actually you see that they achieve in just a 140, exact same line, exact same coefficients. You've seen there sitting here. I'm pushing it a bit higher. Now, I'm going to learning rate of 0.3. You will see that, actually, the start, even though it's converge in just 95, but you see that it could have actually diverged. Now, and what I'm doing is, in the next trial, as you remember the slides, I'm pushing it a bit higher but intention in order to visualize it and making sure that it doesn't go too much our bound, I'm stopping it after 50 iterations. When you do this, you'll see that's exactly the case. As you've seen that here, we go outside and it start actually completely converging and no way that this red line is close to the data that we had actually starts diverging too much that you will see that it's almost impossible to see the data here. This learning rate is nowhere actually the rate that we should start. One needs to be very careful what learning rate we work with. Of course, there are optimization toolbox that they come up with adaptive learning rate like Adam or many of them out there. But it was important for us because I just wanted you to see even for a very simple problem like what we had, which was a linear regression, if you want to use Gradient Descent, you need to be very careful what the learning rate you are using. Also, knowing that if the learning rate is not optimal or is too small, you may need actually many number of iterations or there was, you would come up with a premature stop which is no way is optimal. That's exactly consistent to what we had in the slides in the lecture. I mean, you can do exact same thing for 30 verses 15. Then, remember the only reason I used two rate was for visualization purposes. You can actually extend this onto one rate against maybe two or three other rates. But if you do it that way, you cannot visualize it anymore. Thank you.