So let's talk about Grid Search In all the algorithms or the machine learning algorithms, there are lots of parameters. They modify the way the algorithm works. If you remember from last week when were looking at overfitting, we could increase the number of trees, or increase the maximum depth of the tree. These are the kind of parameters I'm talking about and as we saw at that time, the results, the quality of the model varies a lot depending on the parameters you set. So setting a very high value for max depth and entries gave us a bad overfitted model. The thing is, there is some combination of these parameters that gives us the optimal not overfitted model. The bad news is, working it out is more art than science. You're going to use your intuition to give you a first guess of the parameters, and then you're going to use trial and error to try and narrow in on the best set of parameters. All this trial and error is going to get very tedious if we'd have to do it by hand. And that's where grids come in, grid search. Grid search automates that trial and error for us within the boundaries that we specify. So when you want to set up the grid search, there are three or four parts of it. The first part is the model and the core parameters for that model. By core parameters, I mean the parameters that we don't want to tune either that we know are correct or that are not supported for grid search. And if we look at the documentation, this is just straight from the manual. H2O gives a list of the parameters that you can use in a grid. And this is the second part of setting up a grid search is specifying the hyperparameters. So for taking our example from last week, I might set a hyperparameter max depth. And I will give two values, 5 and 10. And I might set a hyperparameter of entries, and I don't know, 10, 50, 100, 200. The third part of setting up a grid search is such criteria. By default, H2O will use Cartesian, which I like to think of as comprehensive. It will try all combinations. So taking my max depth and entries example, I had two choices for max depth and four choices for entries giving me eight combinations altogether. So it will like eight models. If I added a third hyperparameter, sample like maybe of 0.7, 0.8, and 0.9. That's another three, so now we're up eight times three, 24 combinations. And if I set set criteria to Cartesian, it will make 24 models, it will take 24 times as long. The alternative is called random discrete. Here, I might set some kind of stopping criteria. So it will stop before it makes all 24 possible combinations. The most likely one I will use max models. So if I set max models to four, it will randomly choose four of those 24 combinations and just give me four models in the grid it returns. Another option is max run time seconds, self explanatory. I might set a limit of 30 seconds, or I might set a limit of two hours. Depends how much time I want to invest, into investigating the optimal set of parameters. There are also some control parameters called stopping metric, stopping tolerance, stopping runs. We're going to look at early stopping next week but these were very similar set up parameters but they are actually independent. And they allow you to look at the quality of the model, and if the grid search isn't producing any better models recently, then it says stop. I tend not to use this, I will tend to use max models and max run time seconds together. And the reason I will do that is because, I tend to make grid search an iterative process. So I might start off using eight combinations of models, probably using random discrete. And I'll then look at which is the best model which is the worse model, see if they're any patterns. How the worst three models all using a very high value for alpha, a very low value to alpha, something like that. If so, I will then remove them from the list of hyperparameters, maybe add some more hyperparameters narrowing in on what I'm trying to find, and run it again maybe for another 16 models. And repeat, maybe another ten models at the end. A good tip here is, you can set a grid ID when you create your grid. If you use the same grid ID for each iteration, the results get merged. So if you get lucky and actually find the best combination of hyperparameters on your first iteration, it will still be there in the grid. You will still have that model and it's very easy to compare if you're making any progress. Okay, one thing to remember. When you're using grids or doing any kind of tuning, you should only using your training data set and your validation data set or training data set and cross validation. Your test data set has been put to one side and should not be used at all, in this stage of machine learning. Only when you've finished using your grid search has finished your tuning, you think you've found the optimal set of hyperparameters, or at least that you're willing to invest this amount of time into finding. Only when find this optimal set do you get your test data right and evaluate on the test data. And you hope that you get the same results as the grid search was giving you on your test data. If not, you might have to start the whole process again because you've overfitted. Though a couple of related technologies in H2O. The first, AutoML, we looked at back in week one. Again, this is the same idea. It's trying out different sets of parameters to try and automatically find the best combination. The difference with AutoML is it's a bit more of a black box if you like and it will try different models, it will try some of the hyperparameters that you can't use in a grid, but you have less control over the direction it goes. Another related technology that we're going to look at in week six, is stacked ensembles, and you can take your output from a grid and use that as the set of models that you give to a stacked ensemble. Sometimes this is a good time saver. It gives you a good set of models to experiment with quickly. Okay, so the next video is going to show how to use grids in H2O, and we are going to use the airlines data set and the GLM algorithm.