In today's lecture, we will introduce two algorithms that may be used to solve unconstrained nonlinear programs. They would be called gradient descent and the Newton's method. We will talk about the details later. Here we are talking about unconstrained problems because this is the starting point for us to study solving nonlinear programs. If there are some constraints, the problem would be in some sense much harder. Certainly, there are some ways, but we are not able to talk about them in this particular lecture. We will focus on unconstrained problems and maybe that will give you some starting points. In the future, you may learn constraint programming. We will try to say that we want to solve this problem. We want to minimize f of x, where your x is our vector. We will assume that your f is twice differentiable. That would make the lecture easier. Certainly, you may also deal with non-differentiable functions. That sometimes happens, for example, if you have a function which contains some absolute value, then it's not really differentiable. Today we were talking about differentiable functions. We're not going to formally define what do we mean by differentiable. Basically, that means you are Functions are smooth. At any point, you may either find a first-order derivative or you may find a second-order derivative. That's our meaning for that. Our next step would be to learn about gradients and the Hessians. That's two things somehow you should have learned in your calculus course. If you have not learned that or you have forgot it, then let's take a look at what are them. For function, where you'll have an independent variables or an x-variables. If that's the case for an input. We made differentiate the function with respect to each of the input. That would give us a lot of partial derivatives. Then once we collect those first-order partial derivatives, we're going to get a gradient vector. Let's take a look at this one. If we have a multivariate twice differentiable function, then, in that case, is gradient, which is denoted as this one. We have reversed triangle and we've put that reverse the triangle in front of f. That means we are talking about the gradient. The gradient would be the first partial derivative. The second partial derivative brah up to the last partial derivative. You will collect all of them to give you a column vector, then that's your gradient. A gradient is always a vector. If your function has an input, then your gradient is an m by one vector. That's first-order derivatives. Sometimes we may do second order derivatives. If that's the case, then we may take f and do the differentiation with respect to x_1 twice or we may first do it with x_2 and x_1 or x_1 and x_2 and so on. If you take all the combinations, you're going to create a Hessian matrix of second-order derivatives. The Hessian is represented by the reverse triangle with a square. That means Hessian. A Hessian is always a n by n matrix, if you are having an input. There's a Hessian matrix. In this course, somehow we have a property showing that older Hessian matrix would be symmetric. It doesn't matter whether you do it x_1 first or x_2 first. We won't prove to you that this is true. Anyway, this is true in our lecture. Let's take a look at a numeric example. Suppose you were f have three input variables and the functional form looks like x_1 squared plus x_2, x_3 plus x_3 cube. To do the gradient, we take first-order derivatives. When we do that with respect to x_1, you may see that this one goes to 2x_1 and the for all the others you don't have x_1. That's how eventually you get 2x_1. If you do that for x_3, for example, then you don't have x_3 here. You only have x_2 remains an a here you get 3x_3 square. That's how you get the gradient. It's how you get the third partial derivatives and the collecting of them, that's how you get your gradient. For a Hessian, pretty much you go from a gradient. For this one, you are going to differentiate with respect to x_1. You're going to get two, with respect to x_2, you get zero because there is no x_2 here. With respect to x_3, you are going to get zero again for x_3, if you've told you do the same thing, you get a zero, zero, and one. For the last one, if you differentiated with respect to x_1, you get zero, for x_2 you get one, for x_3 you get 6x_3. That's how you get your hessian matrix. Pretty much you take the elements in the gradient and then differentiate each of them with respect to x_1, x_2, x_3. Again, that's how you get the Hessian matrix. This and that they are general. If you're plugging a point, then all you'll need to do is to just again plug in the point. Numerically, if you are talking about the gradient of f at the point 3, 2, and 1 then this will be 6 for the first element. This would be 1 in the second element, and the last one, it would be 2 plus 3. It will be 5. You simply plug in the values you have at that point and then that's going to create a numeric vector. Numeric vector would represent the gradient of f at that point. For hessian is pretty much the same thing. For this particular hessian, all we need to do is to plug in x_3 here. That would be called numeric values 6. The values of a gradient and a hessian, of course, would be useful later and we will see how they may be used. Before that, hopefully, you do some exercise to make sure that you know what is a gradient, what is a hessian, and how may you obtain those quantities for a given function?