[SOUND] Welcome. In this lecture you will learn how to differentiate functions that involve vectors and matrices. I assume that you already know how to differentiate functions of one variable. The results in this lecture apply to functions that are smooth enough to be differentiated more than two times. An important building block of econometrics is the linear model, y = Xb + e. In such a model, y is a p by 1 vector with observations of the dependent variable. X is a p by q matrix with p observations of each of the q explanatory variables. b is a q by 1 vector with coefficients, and e is a p by 1 vector with residuals. For a given vector b, we calculate residuals as y-Xb. The sum of the squared residuals e'e measures the distance between the dependent variable and the linear model. The smaller this distance, the better the model, so we want to determine b such that it minimizes this distance. This approach means that we see the sum of squared residuals as a function of the vector b. We can show explicitly how this function depends on b by substituting Y-Xb for e, and multiplying out the parentheses. In building block M2 we derived the expression shown on the slides. To minimize it with respect to b, you need to know how differentiation works for vectors. Our starting point here is the partial derivative. Let f be a function of two variables, x and y. We find the partial derivative of f with respect to x by considering y as a constant, so we differentiate f with respect to x, treating it as a function of x only with an unknown constant y. We do the same for the partial derivative of f with respect to y. We use a curly d, known as dau, to denote the partial derivative. Let me show you two examples. When f is equal to x + 2y, the partial derivative of f with respect to x equals 1. The partial derivative with respect to y equals 2. When f is equal to x-squared times y, the partial derivative with respect to x equals 2xy. The partial derivative with respect to y equals x-squared. The usual differentiation rules, like the product and chain rules, also apply to partial differentiation. Now, I ask you to use these rules to answer the following question. To find the answers on the slides, it is important to correctly use the chain rule. You can find the differentiation rules in a book on calculus. When f is a function of a q by 1 vector b, it is a function of q variables, labeled b1 up to bq. We find the partial derivative with respect to b i by treating f as a function f i of a single variable, b i, which uses the remaining elements of b as constants. We collect the q different partial derivatives in a column vector called the gradient. Consider the linear function f(b) with given weights a i, as shown on the slide. The partial derivative of f with respect to b i is equal to a i. Consequently, the gradient of the function f is equal to the vector a. Now, let's consider a more complicated function f(b) that equals b'Ab, where A is given. This function returns a scalar, so we can determine its gradient. To find the partial derivative, we first write f as a double summation. When we differentiate f with respect to b i, we use the product rule. We obtain the first term from the case when the index j equals i, and the second term when the index k equals i. We then switch back to matrix notation, and find a partial derivative as row i of matrix A times b plus b' times column i of matrix A. Both terms are scalars, and when we transpose the second term, we can simplify the expression to row i of the sum of A and A' times b. Finally, we get the gradient by collecting the partial derivatives in a vector. The first element uses the first row of A + A'. The second element uses the second row of A + A', and so on. Consequently, we can express the complete gradient (A + A')b. Perhaps you should take a moment to check all the steps in this derivation. Now, a question for you regarding the gradient of the functions f and g. The gradient of f is equal to a and it is defined as a column vector. Using the results of the previous slides, the gradient of g is equal to (C + C')b. Because C is symmetric, C' = C, so we can simplify the expression to 2Cb. We can go one step further to determine the second order partial derivative. We can see the first order partial derivative of f as new functions g and h for which we can again determine their partial derivatives. Since both functions have again two partial derivatives, we find in total four second order partial derivatives. We have already seen that the function f = x-squared y has partial derivatives equal to 2xy and x squared. The first function produces derivatives 2y and 2x, while the second produces 2x and 0. Notice that the mixed partial derivatives, Dau-squared f dau-y dau-x, and dau-squared f dau-x dau-y, are both equal to 2x. This is generally the case for the smooth functions we consider in this course. When f is a function of a vector b of size q, the first partial derivatives give the gradient. The function f has q-squared second order partial derivatives and we collect them in the so-called Hessian matrix with dimensions q by q. Under our conditions of smoothness on the function f, the Hessian matrix is symmetric. The gradient and the Hessian matrix play a key role in finding extreme values of a function. To optimize the function f(b), we use the common approach to optimization. We differentiate the function f with respect to b, solve the First Order Condition, and check the Second Order Condition. The First Order Condition for an optimum at the vector b* states that all first derivatives, so the gradient, should be equal to 0. This condition gives a system of q equations. We use b* to distinguish this specific vector from the general vector b. The second order condition states that the second derivative evaluated at b* so, the Hessian matrix, should be positive definite for a minimum, and negative definite for a maximum. What does positive and negative definite mean? Well, a matrix A is positive definite when x'Ax is positive for every non-zero vector x. The expression, x'Ax, is called the quadratic form. When the quadratic form is negative for every nonzero vector x, it is called negative definite. When we replace positive by nonnegative, we call the matrix A positive semi-definite, abbreviated to PST. Replacing negative by nonpositive yields negative semi-definite. A matrix for which the quadratic form takes both positive and negative values is called indefinite. Again a question for you regarding definiteness. We show that B equal to A'A is positive definite in two steps. First, let us define the p by 1 vector c = Ax. Then we can rewrite the quadratic form as c'c. c'c is a sum of squares, and since squares are always nonnegative, their sum is nonnegative too. So the matrix B [COUGH] is at least positive semi-definite. Second, we have seen in building block M2 that because the rank of A is equal to q, c can only be 0 when x = 0. For any nonzero, x, c will be nonzero too, and c'c will be strictly larger than zero, which makes B positive definite. Next, let's consider minimizing the function f = a'b + b'C'Cb, where C is a p by q matrix with rank q. The gradient of f = a + 2C'Cb. We know from building block M2 that C'C has full rank and is invertible. We also saw there that the system from the first order conditions has a unique solution with b* equal to -1/2 times (C'C)-inverse times a. To check whether this is a minimum, we need to check the second order condition. The Hessian matrix is equal to 2C'C, which is independent of b. We have just seen that C'C is positive definite, so b* is indeed a minimum. Now, I invite you to make a training exercise to train yourself with the topics of this lecture. You can find this exercise on the website, and this concludes our building blocks on matrix algebra.