Hi everyone. Welcome back to operations research. This is our second course, we're talking about algorithms and today we want to talk about some new things; the Gradient descent and the Newton's method. Pretty much, they are algorithms for solving non-linear programs. First, I will talk about the introduction, and then we will talk about the gradient descent method, and then talks about the Newton's method. We want to solve nonlinear programs now. Previously we have talked about the simplex method which solves linear programs, we have talked about the branch and bound which solves integer programs. Now, is the time for nonlinear programs. Here we again want to rely on numerical algorithms for obtaining a numerical solution which somehow that means we want to run several iterations so that eventually we get to an optimal solution. We want to get the values for all those parameters. To do that, we are going to first to get the values and we want to construct a concrete problem. Once we have that, we will try to decide some kinds of algorithms that eventually gets the values for [inaudible] variables. Just like Lowe's algorithms you have learned, it typically has the following properties. First, typically it has an iterative process. In many cases we have no way to get to an optimal solution directly, we need to move to a point in one direction and then starts the next iteration search starting from that point. It would be very similar to a simplest method, which goes from vertex or goes from a corner point to another corner point to another corner point. This is an iterative process. Somehow, for nonlinear programming, most of the algorithms also consider that strategy, is typically a repetitive process, in each iteration, we are pretty much doing the same thing. If you think about the simplest method, well, is a repetitive strategy, is typically a greedy strategy. In each iteration, we looks for some best thing that is achievable in that iteration. Again, if you think about the simplest method, somewhat similar. Somehow I don't know where is the global optimal solution, but I know I wronged my neighbor, I'm going to go to one that is better than mine. Later we will also see the same property. Then finally this may be new for nonlinear programming. We apply some kind of approximation strategy, which means, in general, if you think about a nonlinear programs, it would be something like this or if you think about a multi-dimensional problem, it may look even more weird. Weird functions cannot be tackled, cannot be solved in some sense. What do people do, is that we will rely on first-order or second-order approximation. That means at a point, maybe we think about it's first-order approximation or at a point we thinks about it's second-order approximation. If that's the case, we would focus on the approximation nonlinear function or a quadratic function because their behaviors are better. We will be able to decide some ways to deal with them. You will see how to do that. Certainly we have some limitations. Typically, we want to solve NLP which means we want to find an optimal solution. But for NLP because nonlinear problems are much harder than linear problems in general, our typical strategies may fail for many reasons. First, it may fail to converge. We mentioned to you about that, the simplex method guarantee to find an optimal solution which is somehow means the simplex method guarantees to give you a feasible solution. It stops in all cases, it at least report something. But for NLP, maybe is does not converge. You go to here, there, there, you do all iterations but you don't converge to a single point. If that's the case, you even don't get any feasible solution. We say an algorithm converges to a solution if further iterations do not modify the current solution a lot. That means we get to one point, where further iterations don't really give you a useful improvement. If that's the case, we may stop and say okay, this is optimal. But sometimes an algorithm may fail to converge at all. Later you will see some examples. Also, an NLP algorithm, in many cases may be trapped in a local optimum. That's a serious problem for general nonlinear programs. That means the starting point matters. One very simple example is here, let's say you want to minimize this particular function. It requires the domain to be continuous and connected. This may not be always true, but it is true in many cases for most algorithms. That somehow means we are unable to deal with nonlinear integer program. For special type of nonlinear integer programs, we have some way to deal with that. But a general nonlinear integer program is just too hard to solve. If you later learn gradient descent and the Newton's method, I guess you would also agree once the domain is considered to be an integer or once your feasible regions are integer points, then gradient descent Newton's method will fail. We will point out these difficulties, but the remedies are beyond the scope of this course. Some of these issues requires advanced theories, something like convex analysis or whatever. But for today's lecture, we will not talk about them, we will just tell you the general strategy for gradient descent, the general strategy for Newton's method, how they may be implemented, and that somehow helps you to get some ideas about how general nonlinear programs may be solved. Here, we typically cannot guarantee an optimal solution, we cannot guarantee the solution to be globally optimal. Maybe it's locally optimal, but we cannot guarantee globally optimal. We will show you the difficulty, we will show you a general strategy but we will also show you if you really want to be an expert in this field, they are so many other things that you may learn and you may try. That's about nonlinear programming.