Okay, let's take a look at how to create autoencoders in H2O. I'm going to show this in Python and flip back and forth to the R code. So I've imported H2O. I'm also going to be doing some plots. Initialized H2O, and I've loaded the iris dataset and spit off 80% as training. 121 rows for training, and whatever's left over, 29, 29 rows as test data. We're going to use just the four numeric fields. And I bring in the AutoEncoderEstimator. In R, it's done a bit differently. Let me show you. You use the h2o.deeplearning function, but then you just add autoencoder = TRUE. So create an object. I'm going to have 4 inputs. You saw them up here, those 4 inputs. I'm going to have 4 hidden nodes, and therefore, 4 output nodes. It's a trivial example to get us started. I hope you can see that if we used weights of 1 and 0, we could get a perfect score here with 4 hidden nodes. When using autoencoders, it's recommended to set activation to Tanh. What else is interesting here? I'm giving it 300 epochs. Generally, I found they need 100 to 200 before they start to overfit or early stopping kicks in. And these three lines are just for the sake of this demo, because I wanted to score after every iteration. So we get a nice detailed scoring history graph. Generally, you shouldn't set these functions. It's better to leave them as defaults. But when you're specifically investigating, studying something, or when you have small data, you may want to play around with them. So what we do is we create the AutoEncoderEstimator object, and then we call train on it. The y function is always none for an autoencoder. You remember when we supervise learning, y is the column that we want to learn. Other than that, it looks just like all the other models. Fairly quick to learn. Let's look at the score history. Just the first 4 rows. So we're getting an mse and a root mean squared error. This is the reconstruction error. This is how similar the output 4 nodes are to the input 4 nodes. And let's visualize that. So it falls rapidly. I want to see what's going on at the end there, in that tail. So I'm just selecting the last 50 values of the mse and plotting them. So you can see it looked flat here, but it's still improving quite noisily when it's gone to 210 epochs. Giving us a reconstruction error of 0.0001. Let me just jump back to the R code. So it's h2o.deeplearning, setting autoencoder to TRUE. You give the columns and the training_frame as normal, 300 epochs. These are the same values we just saw. And I'm using 4 hidden nodes and Tanh. And to plot the score history, I'm using that command. This looks like early stopping has stopped too early to me. And this is a symptom, if you like, of having set these because we're getting a score interval every single epoch. Another way of saying that is epochs and iterations are always the same value. The default score interval is five. So our five history doesn't have to be very long before it thinks it's flattened out. I'm sure it's that area there. So I'm going to increase stopping rounds to 15 for the remaining examples. Let's just run that same model again with 15. This time, it actually went all the way to 300, giving us 0.00008. As I mentioned a few moments ago, when you have the same number of hidden nodes as input nodes, theoretically, it should eventually learn to represent the input data perfectly. So let's move on to more interesting things. I've reduced the number of hidden nodes to 3 this time. Otherwise, it's exactly the same. Autoencoders tend to be quick to learn because we only have one hidden layer and it tends to have only a few neurons in it. Okay, so it's gone to 190. And you can see this is happily noisy. It does look like early stopping has stopped at the correct time. Our reconstruction error's down to three 0s, 5. Let's take another hidden neuron out and just use 2 hidden neurons, the 2 there. So what this means is we're taking the 4 input fields, the sepal length and width, and the petal length and width. And asking the autoencoder to try and compress that into just two numbers. It gave up after 160 epochs this time. Yeah, gave up earlier, and you can see it very noisy with only 0.002. So an order of magnitude more reconstruction error. So this is telling me that 3 hidden neurons looked quite reasonable, 2 is probably asking too much. But we'll go on to 1 hidden neuron. So this is asking it to summarize all 4 input fields into a single number. It went all the way to 300 epochs. That's quite unusual, but you can see we're only getting 0.01. We only got down to 0.01 error, so another order of magnitude worse. Okay, and I'll just quickly jump over to the R examples, but it's as trivial as I've been changing the number in hidden each time. Just like with any other deep learning network in H2O, we can have multiple layers with an autoencoder. So here, our 4 inputs, I've set it up as 5, 3, 5, 3 hidden layers. And it's strongly recommended that you have this symmetry. What we have is our 4 input neurons will feed to 5 neurons in the first hidden layer, will feed to 3 in the second, will feed to 5 in the third, and then out to 4 neurons in the output layer. Everything else is staying the same. Let's give it a go. So it did 170 epochs before it got wild, though, to my eye, it looks quite flat before then. And we got down to three 0s on the reconstruction error. So about the same as just having a single hidden layer with 3 neurons. But you should experiment with your own data to see what works best. The iris dataset is small and not very interesting. You can also stack autoencoders together. This is different from having multiple hidden layers because the stacked autoencoders are learned one autoencoder at a time. So I'm going to build on the model we've already made earlier, the one with 3 hidden neurons, and this is how you get the information out of it. You call deepfeatures with a dataset, and the 0 here just means extract from layer 1. There was only 1 layer. So that gives us an H2O frame with 121 rows, one row for every training row and 3 columns. If I just look at the start of it, you can see it's given us three numbers as a summary of the four columns in our training data. Okay, so we're going to take that. We're going to use this data now as our training data. You can see I've plugged it in there. And we're learning from three columns. I could have used the names DF.L1.C1, and so on, but 0,1, 2 is simpler. Otherwise, everything's staying the same, and I'm going to have 3 hidden neurons again. There we go, we're getting three 0s and a 1, so good reconstruction error. Remember, this error is the reconstruction of this data, not the reconstruction of the original iris data. Okay, let's move on to anomalies. I'm using the same AutoEncoderEstimator. But for this case, I'm going to do a few different things. I'm not trying to reduce dimensions. I'm just trying to learn the data. So couple of changes. I'm using all five input fields. I've added class in there now. I'm using 16 hidden neurons. And I'm using the full iris dataset, not just the training data. So let's run that. And this is how I'm going to extract the data. So this is our call. The highlighted bit will return a single columned H2O frame, telling me the reconstruction error. I'm running anomaly on the same dataset that I trained on. This is optional, but I don't have any other iris data, so we're going with that. Then I'm going to bind in the original data columns and return it as a Pandas data frame. And then I'm going to sort them, With the highest reconstruction error, the most error, coming first. So here's the hardest five to learn or the most anomalous five. And conversely, here are the least anomalous. So these are the most typical, if you like. It's telling me that setosa are the most typical iris, after those 150. And you notice they're all quite similar values. But we do have one unusual setosa. And it looks to me that the length and width of just about everything is bigger. So this particular record, this particular data sample, this particular iris flower, is a borderline iris setosa in the opinion of our algorithm. It's larger than most of the other setosa that it's seen. So in a financial application, these records are your most typical safe customers, and these might be your high risk customers that you maybe want to keep more of an eye on or transactions that you want to go study in more detail. Let's just hop across to RStudio, yes. So to make a multi-layer autoencoder, simply specify more hidden layers, hidden neurons. To make a stacked autoencoder, build one model at a time and call deepfeatures, and take the output of that and use it as the input of your next autoencoder. And for anomaly, you make a standard autoencoder, probably using all your available data. And then you call the h2o.anomaly function, giving your autoencoder model and the data you want to examine.