Hi, I'm Jaekwang Kim from Sungkyunkwan University. From this video, we'll look at linear regression. This is the first time of linear regression. We need some preview about linear regression. With this green spot on the two-dimensional space, the linear regressions aims to make a straightforward lines on here. Today, I want to talk about the problem definition of linear regression and give some solution. As we've seen before, this is the two-dimensional space, and here is the data samples which is composed with x_i and y_i, and x_i is our input and y_i, it is output. The linear regression aims to make a straight line which fits these datasets. The goal is find the line which best fits the data. We can get a hypothesis, f(x) is your line and the hypothesis would be composed with w_0 plus w_1x_1 plus w_2x_2 plus blah-blah, w_dx_d. We can summarize it with this term. Linear regression problem can be defined like this; if we can get a hypothesis function as this and the dataset from data 1, 2, data n is like this, the x_i is our input and y_i is output. Our goal of linear regression is, find the line f(x) which best fits data D. In here, the best fit should be defined. We can make f(x_i) close to y_i. I can give you the f(x) function on here. This is true y_i value and this is the f(x_i) value, which is the hypothesis function. With this function, we can make arg min by w. Error function is based on the w value, so w is the vector value from w_0 to w_t. Error based on the w function can be rewritten like this. This is our goal of linear regression, is find the argument w minimization from this error function. In here, we can get the hypothesis function f(x) is like this, and it can be summarized like this. Then it means in this two-dimensional space, we can calculate each point and the hypothesis functions error and then summarize it. This is without total error. We can design two kinds of total errors. One is just absolute value of the difference and the other is difference with scale. This is the L1 norm, and this is the L2 norm. Our goal is minimize these error functions by w. We can rearrange the problem definition of linear regression. Our goal is find the w vector by minimizing the error function the E(w). As I said, E(w) can be defined like this, and with hypothesis f(x_i) function like this, and the data D from D_1, 2, D_n on the space. This is the linear regression problem definition. How to solve this problem? Let me introduce about the solving problem with this definition. At the point which minimizes this error function, we can differentiate with this PIDA w values. In here, from w_0 to w_d, we can different tension with each ws. With these different changes and functions, we can get the result. In case of j equals 0, we can get this, the parented equation, and in the case of greater than j is greater than 1 or equal to 1, we can get this equation. With these equations, we can easily calculate the minimize it w. From these error functions, we can make a number of t equations here and we have d variables, and we have d equation. We can solve these equations. We can expand this equation like this. With the same manner, we can expand all of the equations and then summarize it to this form. This form can be transformed to the matrix like this. We can define the matrix, A, and this coefficient, and the w would be the vector. B can be defined like this. We can simplify the equation with this very matrix equation. Our goal is to find the w. We need to add the inverse matrix of a, of the both side. Here are the another rotation of our linear regression form. Our goal is finding the w, which minimizes the error function E_w. This is very similar. Here are the difference. We just added the input or x jello and with this input, we can summarize it very simply. Our notation would be very simplified. So the matrix a can be represented with this coefficient. We can also get the w values, same thing, and bias can be notated like this simplified version. This is the different part of our new notation. Here we can make the input with our X matrix. We can also rearrange the output Y vector. With this two matrix and vector, we can make the matrix A as a definition of another notation and make the bias b vector like this. Then the matrix A can be represented with the input by X transpose X. Bias b can be represented X transport Y. The main goal is to own the w value. We can just get these matrix instead of using X transpose X inverse, and X transpose Y. To make this simplified with this equations, we may get the X value on here. We can also get the Y value like this. Here are the very simple example with linear regression. We'd input x and output y, two-dimensional space and there are data samples like this. We want to find the straight line, which is best fitted to the data. The solution set 0.840708 plus 0.834071x. In here, we can make turning one on here, and then input the X values on here. The Y vector can be right at this. We can easily calculate the matrix A using X transpose and X with this. The bias can be calculated with X transpose and Y. The w can be easily owned with this calculation. The original is this. This is the summary of this video. I just made up problem definition of linear regression and also give some solution. With this sequence, we can easily get the matrix A and vector b using the X transpose X, and X transpose Y. Thank you.