[MUSIC] Well, let's see this sparsity in action. And for these, we're gonna go back to our polynomial regression example. But instead of just doing least squared for increasing polynomial orders or looking at ridge regression, we're now gonna look at our lasso solution. Again, for different values of our tuning parameter lambda. Here we are back at our polynomial regression demo and for this lasso regression example, all we have to do is take our polynomial regression function and modify it, because we're using graph I've created in just this dot linear regression. Remember, there was this l2 penalty and l1 penalty. Well, now we fully understand what these two penalties are and we know that for ridge, we were looking at the l2 penalty. But for lasso, the l2 penalty is gonna be set to zero and instead, we're gonna focus on this l1 penalty. And in this demo, we're gonna explore different values of that l1 penalty. And I also wanna mention that here, we're specifying a solver. So Vista is one solver. We're gonna look at other ways of optimizing the lasso objective in this module, but we can of this as just a fancy version of gradient descent. So this is our polynomial lasso regression function and now we're just gonna explore a set of l1 penalty values. So again, these are lambda values going from 1 e to the minus 4, all the way up to a value of 10. And like in our ridge regression demo, we're starting with our 16th order polynomial, which was that really crazy fit in the least squares unregularized case. But now, let's think about what is going to happen in this lasso scenario. And so what we see is that when the penalty strength is really small, we don't get any sparsity at all. So we have 16th order polynomial and what we're seeing is that all 17 coefficients, cuz there's 16 orders of x plus this intercept term. So in total, 17 coefficients and all 17 of them are non-zero and that makes sense, because as lambda becomes really, really small. Remember, we default back to our least square solution where we don't have any coefficients set exactly to zero. But as we increase this lambda value, we get more and more sparsity in our solution. So the number of nonzeroes here is 14, then we get five coefficients being nonzero. And by the time we have a penalty strength of ten, we only have two of our coefficients being nonzero. So you can see very explicitly from this how lasso is leading to sparse solutions, especially as you're increasing the strength of that l1 penalty term. But now, let's just look at the fits associated with these different estimated models. So this is for our very small penalty value. This function doesn't look as crazy as the least square solution, still fairly wiggly. But remember that in lasso just like in ridge, the coefficients are shrunk relative to the ridge regression solution. So even in this case where we don't have any sparsities and none of the features have been knocked out of our model, we still have that the coefficients are a little bit shrunk relative to those of the least square solution and that's providing enough regularization to lead to the smoother fit in this case. But as we increase this lambda value, we see that we actually do get smoother and smoother fits. This starts to look like the fit that we had for our optimal setting of our ridge regression objective or the one that minimized or leave one out across validation error. But again, when we get to really large lambdas just like in ridge regression, we start to have fits that are over smoothing things. So this was a case where we only had two nonzero coefficients and we see that it's really just insufficient for describing what's going on in this case. So again, to choose our lambda value here, we could do the same. Leave one out cross validation, we did in our ridged regression demo. But the point that I wanted to show here is how we get these sparse solutions where we're knocking out, in this case, different powers of x in our polynomial regression fit. And this is in contrast to ridge regression, which simply shrink the coefficient of each one of these powers of x in our degree 16 polynomial fit. [MUSIC]