Now, to get our hands dirty by playing with Neural Networks intensible playground. In this Neural Networks playground lab, we will use intensible playground, to try and build Neural Networks to learn our data. I want you to solve these problems in two ways. First, we should try training our models using manual feature engineering, where we use our own knowledge to try and guess the right combination and transformation of features to learn the data. Next, we're going to hand over the reins, to the power of Neural Networks, and add more layers and neurons using a simple set of input features to see it can perform the feature engineering itself. Welcome back to intensible playground. In this lab, we're going to see if feature engineering can outperform our Neural Networks. I have a feeling this is not going be the case. Let's investigate. All right. In this diagram here, we're trying to classify these blue and orange dots.This is a classification problem. Which you'll notice, is that they look like two concentric circles. However, in this case there's a lot of noise. So, therefore it is a lot of intermixing going on here. Okay. What I'm going to try to do is let's see how X1 and X2 to do in training. As you can see, it's really not learning very much at all. It's all kind of blurred together, it's pretty white. So, therefore it's not one way or the other, according to the scale down here, negative 101. So, it hasn't learned much. Let's see if you can do better. With feature engineering, I know this is a circle. So, I do X1 squared and X2 squared, and I try it now, let's see. Wow! Look at that. Was going to ellipse. So, that means it's almost figuring out what this function is. We know it's a circle, but however, there's a lot of noise and everything. So, it kind of offshores a little bit. Okay. Perhaps though, I can get my loss lower from 0.275 when we try to get rid of X1 and X2, the linear forms. Let's try it now. Uh Huh, 2.85. Okay. So, it looks a little more circular. However, our test loss is slightly better. Let's see now if we can do the same with Neural Networks. So, let's go back to just X1 and X2, which as we saw earlier it did a really poor job. Let's add a hidden layer. Let's add two extra domains. As you can see here, it's having quite a hard time figuring out what this function is. The problem with that is, is that there is not enough capacity in these two neurons, enough high dimensional representation to learn this distribution. So, let's pause this here and see. Let's add another neuron. Maybe this is enough capacity to learn this function. Alright. It is still not quite getting it. Maybe, look at that. Took a long time, but it's slowly figuring out that shape of the function. That's some kind of rectangular shape here. So, what this means is that we're kind of re-sailing on the cusp right here of the amount of neurons able to represent this distribution here. So, let's see if we can easier time if we just add one extra neuron. Alright. Look at that. That was much much faster. We only have four neurons here. But let's see what happens if we add many many extra neurons. Alright. Let's see. Let's put a mold of four. See what happens. That's train already. It's quite a lot slower here. There is a lot more mass to perform going through all these semi-layers. I think it's eventually going to get it. I'm a little nervous it might overfit a little bit as you can see. It is no longer as a simple circular shape. Some crazy polygon here. So, therefore it's overfitting the data and not doing so well on the test loss. Which is much higher than it used to be. Alright. Let's see some other distributions. Here, we are distribution our classic Xr. When both X and Y are positive, or negative, we have blues and when they're either r we have the orange class. Let's see if we can learn this with just X1 X2. As you can see, just like before, Iq and X2 is not powerful enough to be able to describe this function. It's basically zero all across the board. Let's see if we can figure this out using feature engineering. With feature engineering, I'm going to introduce X1 X2 because I know that's what it looks like. So let's train this. Look at that. Very nice. It's a test loss of zero point one seven. That is great. Okay. Found it really easily, and here's my weight, 0.19. That's great. Yes there's been noise so we got some wrong, but for the most part it got pretty much right. Let's see now if Machine Learning, using Neural Networks, can do a better job. So we're going to put X1 and X2 back together, and let's add a hidden layer. So once again, I'm going to try to see. I want to have a small amount as I can. So I'm going try to get this down to just two neurons and learn this. However, as you can see, it's not able to figure that out. It's not enough complexity, not enough capacity in this model. So let's pause this here and try add the third neuron. Let's try training again. As you can see here, it's having quite a hard time learning this function. Now, maybe it's just on the edge, and I have to wait a little longer to see if it will learn it. But it's kind of stuck. Perhaps another initialization maybe will fix it. Let's see. There we go. So, we tried everything, running initialization, and it will somewhat learn the function here. It looks more like like a diagonal hourglass a bit, actually. However, that's not quite the function. You can see the loss is much higher. So, let's go to four, that might do the job. Let's see. Here we are still with the arrow glass but it's becoming more and more like a series of squares. Which is what our function actually is. It's getting better. Now, let's see if I add a whole bunch more, and see if we overfit it. As you can see, it's a lot slower in it's training loss. However, those are much more square shaped. This is looking great. Let's try another distribution type. Right here we have a spiral, two spirals actually spiraling around each other. Much like a picture of a Galaxy. So, let's see if we can train with X1 X2. I highly doubt we were able to. As you can see here, it really didn't learn the distribution at all. It's basically, pretty close to zero, and it can't decide what's what. So, what we can try is now some feature engineering. Let's try. What do you think? Let's try circles, perhaps? Nope. Let's try adding these. Will be sine and cosine, or sineX1 and sinex2. It's trying. I have six raw features going on here, and it's sort of getting in. As you can see up top, it's slowly gaining here. There's a big gap here. I don't know where it's going. It's really extrapolating very strongly here. So, it's not really a greater job, it's kind of stalled out as you can see. Let's see if we can do this better with Neural Networks. Less turn this all off, and else add a hidden layer. Once we get started off with two neurons and see if we can do it. As you can see here, it's really no much better than just having pure X1 and X2. It's not enough capacity to learn this model. Let's go to three. See if you can learn. It's doing slightly better than last time with extrapolation right here. However, it's still not doing as good as just saving all six features activated, or seven features. Alright. Let's see if we can add one more neuron, another layer perhaps. Let's see if this can do it. All ready. So, you can see we have a really low train loss for the test losses and doing so well. So, it's kind of stuck. Let's try some more. Let's add some more hidden layers. Let's all put them to four. Hopefully that's enough. Let's see what we get. Alright. Both have gone down quite a bit. However, it hasn't quite made a decision since the whole screen is white. There it is. I have an inflection point. My loss is going down a lot. However, you can see my test loss is also going up. Now, it's going flat. So, this doesn't have enough capacity. So, let's go as far as we can go and add eight neurons each layer. And hopefully, that's enough to learn this very complex and noisy function. Alright. Let's try training this. As you can see, it's going very slow when it's joining this train right here. We're hoping that will finally figure out a way to make this function work. So, my train loss is going down. How about my test loss? Is going up. It's kind of leveling out, my test loss. When you're doing this yourself, your results might vary quite a bit due to random initialisations of the network. Let's try a different one. This one might be a little bit more promising maybe. Right, this is looking a little more promising. So, you can see what it is doing, it's learning these ways over here. Fill it in. What up? Looks like we overfit because now our test loss is diverging, it's not good. And there you go. So, as you can see, even with this mountain network, we are not able to learn that distribution very well. We have all these extrapolations, and broad guessing, and that's not going to do very well in our test loss. Look at this. Our test loss is going down all of a sudden. This is great. Alright. It's getting more and more of the function learned. However, it's going very slow due to how large this network is. Remember, between each one of these layers, there is 64 weights between each one. They have six layers, I mean they have six times 64, just there. Not including between my features layer and my upper layer. Where I get another eight each. Well, there you go. Look at this. This is great. So, I'm learning this function pretty well. However, there are these extrapolations, interpolations going on here, where like this orange peak goes right through the spiral. It's still getting slightly better and better over time. As you can see, the test loss is getting lower and lower. However, this shape is very much overfit. Now ready. As you can see, we were finally able to find the shapes of all of these, using Neural Networks, which is sometimes a better job, or the job at all in the case of the spiral, was able to figure out its shape.