Okay, what we're going to look at now is using deep learning on the airlines data set. The same data set that we used GLM on last week. So I've started h2o. I'm also including matplotlib so I can do some plots in a moment. Started h2o, imported the data file, and I've split it 80% for training, 10% validation, 10% for test. Everything exactly like we've seen before. And also like we've seen before I'm setting y to be the field I want to predict. So this is going to be a binomial classification. Was the flight delayed or not? And setting xAll to be the list of fields I want to use in my model. Okay, and then we bring in the H2ODeepLearningEstimator. And as always, we're just going to make a baseline model. This will take a while, so let me just jump, Into RStudio. These are the same data preparation steps. Defining the list of columns we're going to learn from. And this is the command, To make a default model. Now you will have seen exactly this code back in week one when we were learning on the Iris data set. I'm going to use h20.performance to measure how well the model did. Let's jump back, see how it's doing. Almost finished. So we'll run this to get the performance of the model on the test data. And this will show us the performance of the model and other information about the model. Shows the performance on the training data. Okay, that took 70 seconds to build that time. MSE is 0.122, LogLoss 0.372. Or if you prefer in terms of classification error, it's getting about 18% of them wrong. If we scroll down. 0.11 is the MSE on the data it's seen, on the data it trained from. And 16% wrong of the data it's trained on. So just slightly overfitted but not too bad. And keep going down. Okay, I'm just going to plot the model. And when you do this in Python, it will show you the scoring history chart. So the default epochs is 10. The orange line is the validation error. The blue line is the training error. And by default, it's using log loss on the y axis. You can do the same thing in R by doing plot, and then giving the name of your model. Okay, so our first tuning idea is, let's give it more epochs, more time, more effort. And as we looked at earlier this week, I'm also going to use early stopping. Now, I say I'm going to use early stopping. Early stopping is on by default for deep learning and these are actually the default settings. So all I'm doing is specifying them explicitly. But instead of 10 epochs, I'm giving 200. So let's run this, and we'll be fast forwarding past this in a moment. Let's just quickly look at the R version. And just as in the Python, the only change is we're specifying epochs = 200. Okay, that took five and a half long minutes. But let's see how it's done, has it improved the quality? When we run model_performance, we get told the MSE is 0.127, LogLoss 0.387. How does that compare? 0.122, 0.349. So we haven't really got a better model. Let's poke in a bit further, look at the scoring history, and we see why. From roughly 12 epochs, the model just started overfitting. The amount of overfitting is the gap between the validation line and the training line. This sudden drop at the end or the sudden jump up at the end is because when it got to the end, it looked back and found the model here at 12 was the best one, so that's the model it returned. If we print the model and scroll down quite a way and look at the scoring history, we can see what's going on. So each entry in the scoring history is when it paused the model and evaluated it on the validation data. So after half an epoch, it did that, after 7 and after 12. And this is why early stopping didn't work very well here for us. We'd have hoped that early stopping would've stopped here when it's obvious to us that the model has started to overfit. But if we come back up. We're using five stopping rounds. So this is the first, second, third, fourth, fifth. So this is the area, 25 epochs, when it's calculating its moving average. And now, we need a moving average of five in a row, That does better than those first five. Sounds a bit complicated, I know, but that's why if we wanted early stopping to stop earlier, we would have to have to change a few other parameters to say score more frequently. Having looked at that data, I probably would, but we'll carry on with the default early stopping settings. So another tuning idea. If we go from the default of 200 neurons in two layers, to 200 neurons in three layers. And we do that by specifying hidden 200, 200, 200. I'm going to stick with 200 epochs, and I'm going to stick with early stopping. This will take awhile to run, let's get it going. And we'll just pop into RStudio. To specify, we use three layers with 200 neurons in each in R. It's a simple vector. 200, 200, 200 given to hidden. Okay, that's chugging away. Let's take a break. That's finally completed, it took just over six minutes. Let's find out how it's done. Again, nothing special on the MSE, it's going to be quite a subtle improvement. In fact this error rate looks worse, the worst we've seen so far. And again, we're seeing bad overfitting from, in this case, about six epochs. So adding another layer didn't really help. Let's try sticking with two layers, but doubling the number of neurons in each layer. And just as an aside, in these examples I'm keeping the same number of neurons in each layer. You don't have to do that, you should experiment, perhaps putting a bigger layer first, or a bigger layer later on. But it's really hard to guess in advance which is going to be the best way. So we're going to try 400 by 400 and see what happens. I don't even need to show you the R code here, it's going to be c 400, 400 for hidden, everything else as we've already seen.