You are currently viewing Random Forest

Random Forest

The Random Forest algorithm is one of the Machine Learning Algorithms used to solve both classification and regression tasks. It is one of the widely used supervised machine learning algorithms.

Random Forest Algorithm can be used for both Classifications as well as Regression machine learning problems. Random Forest as the name suggests can be understood as a collection of multiple decision trees. The random forest algorithm is also called a random forest classifier in Python.

At the heart of the random forest lies decision trees. Random Forest combines the results of multiple decision trees and based on the decisions (results) made by the majority of decision trees, the Random Forests algorithm takes the final decision.

Random Forest Example

Let us understand Random Forest with the help of an example: Suppose there are three decision trees and we train them on fruits data namely Peaches and Guava. Each decision tree is trained on the same data. The decision tree algorithm learns a few patterns and is now ready for making predictions on the unknown or the test data.
We pass test data and each decision tree makes a prediction of whether it is a Peach or a Guava.

Random Forest Algorithm

These predictions made by individual decision trees are then taken up by the random forest algorithm. The votes are combined, in this case, we have two votes for Peaches and one vote for Guava. Thus, the final prediction is ‘Peach’.
The combined result is also known as ensembling i.e. taking the aggregate of the results.
This is one of the reasons why the Random Forest algorithm is also known as the Ensemble Technique.

Random forest algorithm

To sum it up in Random Forest, a large number of individual decision trees operate as an ensemble. Each individual tree in the random forest splits out a class prediction and the class with the most votes becomes our model’s prediction.

Let us take it a bit far and imagine hundreds and thousands of individual decision trees that might make up a random forest, you must be wondering why is it called random. The reason is that data samples are selected randomly by the decision trees. When random selection is done it is called random sampling.
When the same sample is re-used by other decision trees it is known as a replacement. This brings us to another important term associated with Random Forest i.e.; random sampling with replacement.

The idea is that many uncorrelated models(trees) will perform better than an individual tree . It is based on the concept wisdom of crowds.

Advantages of Random Forest

  • Suitable for categorical data
  • Does not suffer from the problem of overfitting.
  • Can handle missing values better as compared to individual decision trees.
  • More Accurate Predictions
  • Flexible and high accuracy as a random forest combines the results of multiple trees.
  • Low bias as almost all parameters of the data set are tested by multiple trees
  • Low variance and can work well with large amounts of data
  • The computational cost of training is low
  • Change in data does not impact the overall output.

Disadvantages of Random Forest

  • The biggest disadvantage is their complexity, difficult to understand when the number of decision trees increases.
  • Time-consuming to create more trees and interpret results.
  • More computations are required making it computationally expensive
  • Random forest requires a lot of memory to store numerous decision trees,

Note: Random Forest is widely used in Stock Exchange, Banking, Medicine, and E-Commerce industries.

For a better explanation of Classification algorithms in Machine Learning refer to the video tutorial.

Leave a Reply