Now let's take a look at our second example. Still, we want to minimize a function on the two-dimensional domain. Here that means we have two decision variables, x_1 and x_2. You may see that the function now looks a little bit more complicated, but still, it's a second-order function. We may still first get our gradient which is here. I'm not going to go through the details, but I guess you may just help yourself. Then we are searching for a point that satisfies the fact the gradient is zero. Let's first do some analysis to see which point would satisfy the condition for your gradient to be zero. Pretty much you are solve these two-by-two linear system. Gaussian elimination, whatever, you are going to get negative two, negative one as the unique point that may make the gradient zero. This can be done, and then later we will show you how the search process eventually converge to this first-order point. In fact, we may show you that this negative two, negative one is indeed globally optimum for this particular F function. This is possible to be shown. But here we're going to ignore this part because we don't have enough tools to talk about this. We just want to use this example to show you how the numerical derivation, numerical calculation for gradient descent may be done, and also how it converges to a first-order point. Let's say we start at zero, zero, then the gradient may be found by plugging zero, zero to the gradient function. It gets two zero, and then we need to solve for an A. To solve an A, again, we play the same trick. Here X zero is zero, zero minus A times the gradient. You get negative two A and the zero. So you're plugging this negative two A back to your F function. You are going to very quickly see that you obtain this one. Somehow you want to find A to minimize the function, four A squared, minus four A. Very quickly, you may see that this gives you an optimal solution, which is one over two. In that case, we are going to have our X one which comes from zero, zero, minus one-half, times two zero, and then the result's negative one zero. So that's a very typical iteration. We may do this thing again. There you may try yourself to see or you get another gradient vector and then your move again along the opposite direction. You now need to solve for A again. So you do all the things again and I'm not going to do this for you, no. We're going to try to minimize eight A square, minus four A, minus one, and this gives us an optimal A one which is one over four. So if that's the case, then again here, what you may do is to find your next solution. So just to remind you that when we want to apply gradient descent in this particular manner, always, you need to formulate a sub-problem, which is also a minimization problem. In that case, you want to find the step size that can minimize your height, minimize the objective value. If that's the case, we always get a one-dimensional problem, even if originally you have undimensional, you have indecision variables. But after you try to do this, your initial point must be some values. You subtract A times your gradient, another several values. Somehow the outcome is still a function with A only, so you may just optimize over A. This is a problem always is a one-dimensional problem. For a one-dimensional problem doing the optimization in some cases still hard, but at least for this quadratic examples is not so hard. We know how to do that. Whenever your function is quadratic, you take a look at the coefficient. So it has an upward curvature, that means your first-order solution is optimal. Maybe for example here, you do one first-order derivative, that's how you get A to be one over four, or you play your high school trick that's also something you know how to do it. Anyway, I'm going to delegate derivation to you and then let's move to the next page. The interesting thing is that if you depict the search route that you move from X zero to X one to X two we are going to see that, X zero is here, X one is here, X two is here and if you also depict our gradient F. For gradient F, you actually have two functions, and if you depict the fact that the so-called first-order condition or the gradient must be zero. If you see that, you are going to see that you get two straight lines, one is here, one is there. The two straight line is nothing but setting your gradient to zero. That gives you a two straight line. If you want an intersection that's here is negative two, negative one. The thing is that pretty much we know that what our algorithm is doing is to search only one direction once a time. Why do we know that? Because pretty much, every time when we are doing the gradient thing, you're going to see that for each of the gradient here, here, or here, one of the elements must be zero, so that means you either going this direction or that direction or that direction. Also when you move along one direction, for example, here, you obviously would stop at negative one zero. Why is that? Because along this direction, if you consider this as a one-dimensional problem, then for this particular curve, negative one zero must have the smallest functional value. Because this is the point where your first-order derivative is zero. Along the second direction is the same thing, you must stop at negative one, negative one half. Why is that? Because along that direction, the blue line is always the place for you to have the first-order derivative or the gradient zero and that's the minimum location. By knowing all the things, we don't really need to carry the next iteration by using the original formula. We actually may predict that X three must be on this particular line, it must be negative 1.5, negative 0.5. Also for X four, it must be here, X five must be here, X six must be here, X seven, X eight. Eventually, you will really get to your optimal solution, your desired intersection. Here I tried to show you some properties of your gradient descent search. So you not only just know how to do the numerical derivations, you also can see that when you do the search, you really stop at the minimum height point. Also, maybe you have also noticed from the previous two examples. When you do gradient descent, you pretty much always go in the right end goes. In the previous example is something like this, they are also right angles. Somehow this is true. If you always try to move to the best point you may do when you do gradient descent in each iteration. But of course, that's also something that I'm not going to prove here. So at least, you know one thing. For gradient descent, you know the general strategy, you know how to implement it, you know how to go do all the iterations. Then you will have one tool that may help you solve or at least generate a solution for nonlinear programs.