Finally, you have all the building blocks to define the Random Forest Algorithm. And the Random Forest Algorithm is a bugging of de-correlated decision trees. Here is a formal definition of this algorithm. In the input, you have a training set z. And the capital b is a number of iterations. At each iteration, you draw a bootstrap sample, z star of size m from the original training set, and then you grow a de-correlated 3tb or these Bootstrapped data set. Finally, you'll return an assemble of all de-correlated decision trees. Prediction is done exactly the same way as we described before in case of bugging, went for regression there algorithm returns an average of all predictions of all decision trees. And in case of calcification, it returns a majority vote of all single tree predictions. I want to emphasize here that algorithm generated not irregular decision tree, but a de-correlated decision tree. What is a de-correlated decision tree? What is the difference with the regular one? Here is an algorithm from the previous lesson when you learned decision trees, and the only difference is in the step number two. In decision tree, you iteratively do splits. And at each iteration, you select the best split among all the possible ones. And in the Random Forest decision tree, you do split slightly different. You select not the best split overall, but the best split among m variables. Initially, you pick a random m possible variables, and select the best split only among these m variables. The criteria for selecting the best split is exactly the same, it is the maximization of information game. The same as reducing the error after split. The only difference is that you are finding the best split, not among the variables, but the among the subset of these variables. Here are some recommendations from the inventors of the Random Forest. There are commands to use m = square root of p for classification, however, p is the total number of variables. And in case of regression, they recommend to use m = p divided by 3. And also minimum instances per node in the decision tree should be 1 in case of classification, and 5 in case of regression. I want to show you this diagram. Here are the results of training of two random force. The first variant is marked with green here, and either the variant were at each step m equals speed. It means that at each step, you grow a regular decision tree and you find the best split among all the variables. And the blue line, it is a de-correlated decision tree. In this situation, you randomly pick m equals square root of b. And all the trees can be built using different subsets of variables. As you can see at this diagram, at the initial stage, the variant is m equals square root b is worse before 20 iterations. But eventually, this variant of the Random Forest algorithm converges to the better solution. This idea of introducing extra variants to each decision tree, it really works. Okay, the summary of this lesson. Random Forest is a good method for solving general purpose classification and regression problems. But typically, it is slightly worse than the gradient boosted decision trees algorithm. Of course, this algorithm can automatically handle interactions of features because this could be done by a single decision tree. Other important feature is computational scalability. Each decision tree could be effectively grown on a computer or a cluster. In the Random Forest algorithm, each tree can be built independently on other trees. It is an important feature and I would like to emphasize it. That is why the Random Forest algorithm east is essentially parallel. The Random Forest could be trained in the distributed environment with the high degree of parallelization. Okay, as far as predictive power of the Random Forest is, on the one hand better than a single decision tree, but it is slightly worse than gradient boosted decision trees. And here you'll lose the interpretability because their composition of hundreds or thousands Random Forest decision trees cannot be analyzed by human expert.