[MUSIC] So imagine we have these points scattered this way, and you can see this is a pretty cleanly produced data set where there is kind of a regression line that exists that probably describes the data pretty well. And, fact, that's not a surprise because I sort of generated to have that effect for illustrative purposes. So the process here is we're going to imagine that we split the data first into a training center test set. And that's always the first thing you wanna do if it hasn't been done for you already. And here, we just split them half and half. And we talked a couple of segments ago about various ways to do this. But here we're just going to train on one half and test on the other. And further, I've scaled the data, so that the original domain here of -10 to 60, or really, 0 to 60. And some kind of range of 0 to 600, is scaled down to variation around 0. This is actually a little suspect what I did here, because I scaled the data all at once and then split it into training and test data, and so the test data sort of influenced how the training data was scaled, which is typically a no-no. Okay, so you may want to scale them sort of separately. So how does grading descent work? In this plot what we have is the two parameters for the regression line which are theta zero and theta one. And there's the y intercept and the slope, essentially, right? These are the two parameters for a single line. This is the equation down here. I refers to the iteration number, okay? So this is really iteration zero that we're looking at or really difference between iteration zero and one. So we started this process off at this point, and we pick that point randomly, and there's various deterministic ways to choose a starting point, but in general, the fact that you have to decide a starting point is one of the weaknesses of gradient descent, okay. So we start off at a single point, and then we found the direction of steepest descent and took a step in that direction. So the cost function here that we're trying to minimize, this says that the response variable y minus the function applied to the input variable, and here the function is just a linear regression of intercept plus slope, take that difference and square it and add them all up for all the data points. Right, that's the total amount of error that you received by trying to explain all the data with this particular regression line that we started with, this first starting point. And so we compute that gradient, and I haven't shown on this slide how to do that. But you complete that gradient and jump down quite a bit, right, the error goes down a lot in this first step as we walk from here to here in this parameter space. So now we've gone from one regression line to another regression line. Here's the first regression line that we started with, and we just kind of got lucky that it already is kind of in the right direction. We may not have. It could have gone this way, all right. So the next slide, we take another step, and the error goes down a little bit more. And we take another step towards the minimal over here, toward the center. And we've got another regression line that has rotated slightly. So it's closer to what we intuitively think will be describing the data, and we can keep going with this. And keep stepping down and the regression line gets better and better and better, the error goes down, right, we're finding a local minimum in that error until we decide that it converges. When the difference between the error and iteration i and the error iteration i plus 1 is less than sum threshold, okay, that we decide that is converge. In fact, in this case, it looks like it did pretty well. What we see is sort of the minimum here. And this line looks like it describes the data pretty well, which we can see on the next slide. So again, each one of these lines, this line is perhaps the slope and intercept, right, these two parameters, theta zero and theta one from iteration one, while this next line is the theta zero and theta one from iteration two, okay. So at each step, we form a new line, test how well it describes the data, and proceed. But what I haven't said is how we figure out what direction to actually walk in nor how far to walk, right. So let's think about that. [MUSIC]