>> In this module, we are going to talk about nonlinear optimization. We will review unconstrained optimization, and then briefly talk about constrained optimization and in particular, Lagrangian relaxation and its two applications. One on some utility maximization problem and another one on a mean-variance portfolio selection problem. Let's start with unconstrained nonlinear optimization. What does unconstrained mean? It means that I'm allowed to choose any vector that I like to try to minimize a function. To keep everything general, we will assume that I've got a function f of x, and what is x? X is a vector, in Rn. It's a vector with n different components. It's a unconstrained problem so I can minimize my f of x, the function, over the entire space. It turns out that for non-linear optimization, we have to differentiate between two kinds of minimum problems. Ordinarily, we just want to find a vector x which takes the minimum possible value. So, here's a picture. I've got a one-dimensional function. Here's my x, here's my function f of x. Now, when I'm trying to minimize this function f of x, all I need is this point. This is my point x star the minimum. It's the global minimum. It's the best point that I want in the entire real line. But as we'll go through this lecture, note, we will see that, when it comes down to nonlinear programming, we try to find these points by looking at derivatives. By taking first derivative, which will give me the gradient, or the second derivative, which will give me the Hessian matrix. Derivatives, unfortunately, can look only in the neighborhood of a point. They can't go look far away, because derivatives are defined by taking limits of points coming arbitrarily close. So, in order to be able to work with derivatives, we have to have another notion, which is that of a local minimum. So, this point here is a local minimum. Why is it a local minimum? Because if I just go in a neighborhood of this point, if I look for points that live in this little interval around here, then for this little local neighborhood, this point is actually optimum. It is true that if I leave neighborhood I get other points that are better, x star over here is a global optimum point. And therefore, that's better than this local optimum point or at least as good as the local optimum point. But within this interval, I can't get any other point. So, for nonlinear optimization, we have this notion, two different notion. Global optimum point, that's where we want to get to. Local optimum point, which are locally optimum, that's what we can get because we use derivatives. So now, we are going to describe what are the conditions that we need in order to get a local minimum. Remember, local means only in the neighborhood. It means that these criteria are given by taking derivatives. A point is a local minimum if the gradient at that point is going to be 0, meaning that, what's a gradient? It's a vector, remember this function? Is a function from Rn, meaning that it's got a vector x, which has n components. Therefore, the gradient is simply the partial derivatives. I take the function, I take the partial derivative with respect to the first component, the partial derivative with respect to the second component, partial derivative with respect to the nth component and stack it up as a vector. When I say that gradient is equal to 0, I mean that every component is equal to 0. Here's a simple example. Let's take F of x is equal to 2x squared x1 squared plus 3x2. So, the partial of f with respect to the first component, is going to be 4x1. The partial of f with respect to the second component is going to be equal to 3, because the x2 goes away. So, the gradient is going to be 4x1 and 3. So, when is it equal to 0? That means, that x1 must be equal to 0. So, for any local minimum, it must be the case that x1 is equal to 0. But that's not sufficient. We have to take a matrix of second derivatives, and that matrix must be positive, semi-definite at any local minimum, and positive definite, if you want to be sure, that, that point is a local minimum. How do I construct this matrix? I take the partial derivatives and stack them up as a matrix. First component, remember, for any matrix, this is going to be the one, one component. So, I take the partial derivative with respect to x1 and take another partial derivative with respect to x1. This point here is 1, 2, so I take the partial derivative with respect to x1 and then a partial derivative with the respect to x2. Just a, just to recall that if a function is twice differentiable, meaning you can take two derivatives then it doesn't matter the order in which you take the derivative. Whether you first take it with respect to x1 and then you take it with respect to x2 or vice versa. And we'll see an example in a moment. You stack them up on the matrix, that matrix must have all non-negative eigenvalues. We won't get into the detail of eigenvalues because we don't really explicitly need it. They can be easily computed in MATLAB. You can just give a command called eig, it will tell you the eigenvalues of the matrix. If all the Eigenvalues are non-negative, then it'll turn out that it's a local minimum. If it turns out that all the eigenvalues are strictly positive, then it's definitely a local minimum. Now, functions that are going to be useful are called convex functions and when we use these functions, we are going to remind you in the course. But the picture that I have, I want you to keep in mind is the function is convex if it looks like this. I take any two point and I draw the straight line joining those two points and the function. The straight line lies above the function. So, this is a convex function. Here's a function that is not a convex function because if I take these two points and join them, that's not lying above the function. So, convex functions are particularly nice if I'm trying to minimize a convex function that I don't even have to check the Hessian. I just get the gradient, set it equal to 0 and I'm done. We'll walk you through to two examples of what is going on here. So, first example is unconstrained nonlinear optimization. Here's the problem that I want to solve. I want to minimize over x in our two, two components, this function. X1 squared plus 3x1 x2 plus x2 cubed. I first take the derivative and set it equal to 0. I get 2x1 plus 3x2 as the partial derivative with respect to x1. I get 3x1 plus 3x squared as the partial derivative with respect to x2. I set it equal to zero, and I solve it. The first equation tells me how x1 and x2 are related. I plug that into the second equation. That gives me a quadratic, which has two roots. It turns out that the two roots are x equal to 0 or x equal to minus 9 over 4, and 3 over 2. So, the first component is minus 9 over 4. The second component is 3 over 2. If I take the partial derivative, the second partial derivative, I take this one and take its partial derivative with respect to x1, I get 2. I take the same and I take the partial derivative with respect to x2, I get 3. For this one, if I take the partial derivative with respect to x1, I get 3 but I should have known that already because it doesn't matter the order in which I take derivatives. So, the off diagonal-terms are always the same. If I take this,[UNKNOWN] take the second derivative with respect to x2, I get 3 times 2, 6x2. If I plug x equals to 0, which means x2 is equal to 0, I get this matrix. That's not positive definite, so it's not a local minimum. It has actually one positive eigenvalue and one negative eigenvalue. If I put x equal to minus 9 over 4, 3 over 2, it turns out that this matrix is positive semi-definite, meaning it has non-negative eigenvalues and, in fact, it's positive definite, meaning that it has all positive eigenvalues, and therefore, I know that this is a local minimum. That's all that we need to know as far as this course goes about unconstrained nonlinear optimization. Take the gradients that are equal to 0, figure out what's happened to the Hessian, or the second-order matrix. Next, we want to take this idea and apply to constraint problems. So, here's a constraint problem. A typical problem like this will show up in a utility maximization problem. I have two different things that I could do. I could either invest in 1s, one particular business or another particular business and I have a total wealth of 12. So, x1 plus x2 equals 12. If I put money into the first business, I get 2 times log 1 plus x1 as my return. If I put it in the other business, I get 4 times log 1 plus x2 as the second return. Now, log, logs are going to have diminishing returns so eventually, even though I put money in the second one, which gives me incrementally the better return, I'll end up getting lesser and lesser and so at some point, the first project will become more competitive. Now, the constraint tells me the total amount of money that I have is just 12. If this, if there were no constraints, I would solve this problem easily. But this constraint makes my problem harder. It's a convex problem because log is a concave function and trying to maximize a concave function with respect to some linear constraints. And this is a convex problem so in theory, this is easy. So, in order to get rid of the constraints, what we do is we multiply it by a variable and add it to the objective. So, I've got 2 log 1 plus x1, 4 log 1 plus x2. This is just the objective. Here is my multiplier and it's, it has a mean which is sometimes called the Lagrange multiplier because Lagrange was the first, first mathematician who came up with using this idea. Try take the Lagrange multiplier, v, multiply it to the constraints x1 plus x2 minus 12, what's happened to the minus 12, I moved the 12 on to the other side that becomes my constraint and subtract it from the objective. Now, I throw away the constraints. Remember, we had done something very similar to this in the module linear optimization. We dualized the constraints, multiplied them by a quantity that is a multiplier which had to have some sort of particular sign. In this particular case, I threw away the signs as well. We can be anything and then I threw away the constraints, I relaxed and it just optimized the objective, in order to get the dual linear problem. I'm doing the same thing here. I've got a nonlinear objective, I'm subtracting a multiplier times the constraint and then, I'm going to throw away the constraints and pretend that this is an unconstrained problem. In order, I know that this problem is convex so in order to get the optimum point, all I have to do is find the gradient and find its solution. So, I take the gradient of this Lagrangian function. From here, I get 2 over 1 plus x1. From here, I get minus v. That's the partial derivative with respect to x1. Partial derivative with respect to x2, I get 4 divided by 1 plus x2 minus v, from this one, and that is equal to 0. If I solve this, I get x 1 is equal to 2 or v minus 1, x2 is equal to 4 over v minus 1. How do I get this v? I know that the optimal solution, x1 plus x2, must equal 12, so I plug it in to the equations, and I end up getting an equation that says, 6 over v must equal 14, or v must equal 3 over 7. Once I know the value of v, I plug it back into the expression for x, I get x is equal to 11 over 3 and 25 over 3. So intuitively, it makes sense, you're going to invest more in the second, second opportunity, because it has a higher return. But as you scale up that return, as you scale up your investment, you have diminishing return, so at some point, it becomes profitable to go to the first one, and the correct balance is in the ratio 11 is to 25. And look how easily we were able to solve for this problem by including a Lagrange multiplier, taking the gradients, and then solving it to find what x1 and x2 is. The last piece of this module will show us how to apply this for a particular problem, which comes up over and over again in financial engineering, which is portfolio selection. Now, what's going on here, I've got a bunch of assets, so my x belongs to R to the n. It has n different assets, so it's a vector with n components, what does this constraint say? If I take the vector of all ones and multiply it to x, I get 1. This multiplication just gives me the sum of j going from 1 through n of xj. So, if I add up all the components of my portfolio, I have 1 dollar to invest. And what do I want to do? I want to choose a portfolio that maximizes my return minus a risk aversion parameter lambda times the variance. This is the return or the mean return on my portfolio, v transpose xv, gives me the variance of my portfolio. And I want to do the portfolio x, which sums to 1, and which maximize the, the risk adjusted return. Again, the constraints makes this problem harder, so I'm going to take the constraints, multiply it by a Lagrange multiplier, and put it into the objective. Here's my objective, mu transpose x minus lambda times x transpose vx minus v times 1 transpose x minus 1. Again, a convex problem, it has no constraints, I'm going to take the derivative and set it equal to 0. If I take the gradient with respect to x, I get mu from here. Just to make this thing a little bit clearer, let me get rid of this. This mu is coming from the gradient of that quantity. Minus 2 lambda vx is coming from the gradient of the second quantity. V times 1 is coming from the gradient of this quantity. I set it equal to 0. Now, I solve for x. What is x? X is 1 over 2 lambda v inverse mu minus v1. What do I do? I take this minus 2 lambda vx term onto the other side so this minus becomes plus. And I take the 1 over 2 lambda, divided, take the inverse of the v, multiply it onto the other side, I get x. So now again, like in the simple example that we saw before, I now have x in terms of v. How do I get the v? Plug it into the constraints. 1 transpose x, must equal 1. I put the equation here, I get 1 transpose v inverse mu minus v1 must equal 2 lambda, v is just a scalar. I rearrange terms, I get v is equal to 1 transpose v inverse mu minus 2 lambda divided by 1 transpose v inverse 1. I know the value of v now, I can plug it, plug that value v here, and get expressions for x. And again, this complicated portfolio selection problem has been solved by just taking Lagrange multipliers. We will use this in the course to show how, by using this technique, we can generate efficient frontiers, we can characterize what the shape of these frontiers are, we can say that any point of this efficient frontier can be generated by a small set of mutual funds and that, ultimately, will lead to the capital asset pricing model, which is a very important methodology for pricing assets in the market.