Let's use one example to illustrate that general principle. Let's say we want to minimize this particular function. Now our x is belonging to R-2. We have x-1 and x-2. In that case again, if you take a look at these particular function, we would still be able to somehow do some analysis and see, okay, the optimal solution is again at 0,0. So if you cannot see it, then that's not a big deal, all you need to do is to do some kind of arrangements and then it can be proved. If you cannot do that, that's fine because that's not the important part of this example, just take it as given. We want to somehow do some iterations and tried to move towards 0,0, let's see whether we may do that. Before that because later we need to do a lot of gradient so let's first prepare it, let's first get the functional form of gradient. We're going to differentiate this function f with respect to x-1, we get this. With respect to x-2 we get that. Well, so that's the gradient. First, let's say we somehow start at 2,3, so 2,3 is here and we may evaluate these functional value so we plugging 2,3 to f, we're going to get 10. Step 1 is that we want to get our gradient, so you're plugging 2,3 to your gradient, you get 4,4, 4,4 is here so we move in the opposite direction, which is to the left bottom direction. Now we need to solve your a-0. For a-0, we pretty much solve this minimization problem, so this means we need to express the functional values along this direction as a function of a. How to do that? Well, you are x-0 is something you know, your x-0 is 2,3, and then you'll minus a and then you are gradient at x-0 is also something you know it's 4,4. If that's the case, then after you do this, you're going to get 2 minus 4-a and the 3 minus 4-a that's how you get a distinct. That these two elements are the coordinates of x-1 and x-2 along your improving direction as a function of a. You may evaluate these particular points objective value by plugging in 2 minus 4-a and then 3 minus 4-a into the original function f, so you're plugging that back to f. Then you're going to do some arithmetic. You're going to say, okay, so this is going to be 4 times 2 minus 4-a square, minus 4 times 2 minus 4-a times 3 minus 4-a, and then plus 2 times 3 minus 4-a square. All of these is a function of a, so you'll do some arrangement, do some simplification, eventually you'll get this. Then once you get this, you'll see that, okay, this may be soft, we may get an optimal solution for a, and the this one-half. Maybe you want to ask, how may we solve this problem? Well, when this is a single one-dimensional problem and that you understand that your a square has the positive term, then using your high school math, you know this is a quadratic function and it has an upward curvature because you are a square, the second-order term has a positive value. If that's the case, all you need to do is some kind of first-order derivatives, and then you are done with that one. Or you may make is something like some of your high-schools maths techniques, it would be something like this. It will say, okay, so this function would be 32, a minus one-half square and then with some constant terms because you have this and that. Pretty much this function again may be minimized at one-half, so that's still something you know how to do. As long as it's a one-dimensional function, is not too difficult. In the worst case, you do a numerical search along this direction, and that's still fine. You will get this one and then we know how to move to our x-1. Our x-1 would be x-0, which is this one, minus a-0, which is one-half multiplied by the gradient for 4,4. Then the calculation leaves us to 0-1. That's pretty much the point here, according to our figure. Don't forget to take a look at your objective value. Your F of x-1 now becomes 2, which is much better than the original 10. Along this direction, you move to the best you may do, and then that's going to give you some improvements. We evaluate the gradient at x-1. The gradient can be found as negative 4 and 4 and its norm is 4 square root of 2. This is quite large, so maybe we should do further. Now we may do Step 2, and if we do Step 2 very quickly we would see that our second gradient here is negative 4 and 4 so it's here, so we're going to move along the opposite direction. Then we now need to solve the next step size. Again, here we take x-1, which is 0-1, and the minus a times the gradient here, and the gradient here is again negative 4,4 according to our calculation. This gives you this particular expression for the coordinates and then you're plugging that to f you get this one. Again, you may solve it and get your a-1 should be 1 over 10. This is the best you may do in this iteration. You are going to take 1 over 10 as your value for a-1 and then move for one iteration. That's how you move to your next point, which is 2 over 5 and 3 over 5. Then, you again get some improvement, you go from the height 2 to the height 2 over 5. You evaluate the norm. Again, the norm of your gradient, f of x-2 would be here. Therefore, this particular value is going to be 4 times square root of 2 divided by 5. This is still somehow large. This is not a very small number so maybe we should go further. If you do some more iterations for example, if you go again from x-2, maybe you want to do some exercise by yourself, and then you will see that you will move to your next point which would be x-3. I'm not going to tell you what would be the values for x-3, but I can give you a hint that for x-3, your second coordinates would again be 0. You will still have a point that lie on the x-2 coordinate. Maybe you want to try that by yourself and then eventually you will see that you will keep doing all this until you reach the 0,0.