Let's now see for ourselves how important L1 regularization can be included in sparse concise models. In this L1 regularization lab, I had add in lots of spurious features and it created a fairly complex model. We're first going to train the model without L1 regularization, and then we will see if L1 regularization helps pulling the model down into a much more sparse, concise, and hopefully more generalizable form. Hi, welcome back to TensorFlow playground. In this lab, we're going to see if L1 regularization can help make our models more sparse and concise. As you can see here, this is a classification problem, where we're going to be trying to classify these two shapes. What we have here are two concentric circles, the blue circle in the middle and the orange circle in the outside. The great news is that there is no noise set. So therefore, it should be decently easy to strain. What you also might notice is there are features. We have all of our features turned on. That means, it would be a pretty complicated model. We know intuitively that this is an X squared by Y squared, or X1 squared by X2 squared kind of equation because there are circles. However, we have a whole bunch of other extra features added to this. We also have extra layers here with six neurons each. So, this is highly complex. Let's see how this thing trains without L1 regularization, set to none in this case. Now that was pretty fast. As you can see here, L1 regularization pretty much found the distribution of our data. However, you can notice that there are some inconsistencies here where some little dips here and some bulges here, isn't quite a circle. The reason for this, is because it's overfitting perhaps. So, we have way too many features and too many hidden layers was finding overfit complex function to this data. Is there a way that we can now find a much simpler model? Well, without feature engineering it ourselves, looking as L1 regularization to be able to use this. Let's see if that works. I'm going to set my regularization here to L1, okay. I'm going to start off with a new initialization and let's see how this does. Look at that. This is much better. Let's investigate it a little bit more. As you can see here, it learned a much smoother circle which is great because that goes intuitively with what we see in the data. However though, in real life, we usually don't have nice distributions like this. So therefore, we might need to use this for a lot of other processes. And there are features here, you can see we have X1 squared and X2 squared, and there weights. They're pretty much the only weights that have any magnitude left anymore. All other weights, as you can see, are grayed out with a value of zero. This then goes to the inter hidden layers here where you can see, hey, look, X1 and X2 squared are pretty much the only ones that propagate through. And they all go to this neuron in the last layer, and finally to the output. So, it's as if we only are using X1, X2 squared because they are much more predictive in our model than the other features. And due to the nature of L1 and the probably distribution, it's able to shrink it down.