[MUSIC] Okay, welcome back to Operations Research. Today, we're still going to focus on algorithms. And today, we will change our focus from linear programming, integer programming to nonlinear programming. Okay, so we all know that nonlinear programs are quite different from the integer one or from the linear one, because here we don't have integer constraints. So let's focus on linear programming and the nonlinear programs and do some discussions. We all know that if you are solving a linear problem, then everything look like lines, straight lines, right? So every time when you have a feasible region, it looks like a polygon. And because your objective function is also linear, you basically push toward this direction or that direction or whatever. So that's why your optimal solution must lie somewhat of corners. But if you talk about nonlinear problems, then it's definitely different. Your feasible region may be, well, I don't know something like this, right? It can look like in any shape. Moreover, your optimal solution may not exist on the boundary of your feasible solution. Because you may be minimizing the square root of something or maximizing the square of whatever. So, the optimal solution might lie anywhere. Also, it may be here, there, there, any possibilities. So that's going to create a lot of difficulties about solving nonlinear problems, okay? So, it's difficult, but we still need to have some ways to deal with it. So people still do this prior to solving nonlinear problems by doing some very basic observations. So let's try to see a little bit. So first, let's say I have a nonlinear problem like this, okay? So this is our x, our decision variable, and this is y, our objective value, okay. So if you try to minimize this problem, then basically inside your feasible region, this may be your optimal point. So if that's the case, you may say, okay, let's start from this slide and go through the whole feasible region, and go to that side. Once I complete this linear search, I know where is the optimal solution. So that's okay, if you have only one variable. But if you have thousands, millions of variables, this does not work, okay? So people gave a better idea. The idea is that, okay, if I want to solve a problem, where the objective function goes up and down, then what may we do? We will take a look at where we are, let's assume I'm looking at this point. Suppose I'm here, I'm at x1, okay? I will ask myself, if I go to the left, what will happen? If I go to the right, what will happen? In this particular case, if I go to the left, I would know I'm going to decline, right? If I go to the right, I know I'm going to climb. So, how do I know whether I'm going to be lower or be higher? It depends on the derivative, okay? If we take the derivative at this particular point, then we have some hint about what's going to happen if I move to the right or if I move to the left. For example here, because the derivative is positive, we know if I move to the right, I'm going to increase and so on and so on. So if I try to minimize myself, I know I need to go to the left, so that's the idea. If I want to go down because I want to do minimization, then I asked myself. Around all my feasible directions, where should I go to reach my target as soon as possible? And that really depends on the derivative. Because we want to solve a multi dimensional problems, there are too many variables, right? So a single derivative does not work. We're going to have derivatives along all directions. That's pretty much gradient. So gradient is a term used in vector calculus. If you are not familiar with that, that's fine, because in the lecture we will give you the basic definition and the basic illustration. So we want to use gradient to help us decline. So the algorithm we're going to tell you is called gradient descent, okay? Gradient descent is a very famous very fundamental algorithm to solve nonlinear problems. It is used in all kinds of operations research problems, machine learning problems, and so on and so on. So this will be one of our focus for this week. But sometimes gradient descent is not powerful enough, because your problem may be too complicated, the formatting looks so weird. So we are going to teach you another method which is called Newton's method, okay? So obviously Newton's method has something to do with Newton, or the one some people say is one of the most genius person in the world, blah, blah, blah. Newton developed a way to solve nonlinear equations. For nonlinear optimization, there are some places that we may borrow Newton's idea, and the basic idea is the following. If you are trying to solve this problem, okay? The problem that we have on the graph. When you are having this gradient or the slope, basically, it's a first order approximation. So some people say, well, maybe we may use a second order approximation, then that's Newton's method. So I will leave the details to the later videos, so I will stop here. But basically, gradient descent is a first order approximation and the Newton's method is just second order approximation. Basically, they are still the same, having the same idea, but you will see what's the difference. So sometimes, gradient descent works better, sometimes Newton's method works better. We will have some brief discussions about this. So later we will solve nonlinear programs from that start.