Ensemble Techniques

DATA SCIENCE TUTORIALS

Ensemble is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multiclassifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem

Two Types of Ensemble Techniques:

Bagging
Boosting

Bagging and Boosting are similar in that they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one.

Bagging is also called Bootstrap Aggregation. Random Forest uses Bagging.

Bagging

Bagging is designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

Boosting

Boosting is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance In supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones

A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

The main causes of error in learning are due to noise, bias and variance. Ensemble helps to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms.

Combinations of multiple classifiers decrease variance, especially in the case of unstable classifiers, and may produce a more reliable classification than a single classifier.

Boosting and Bagging Difference

Boosting tries to reduce bias. On the other hand, Bagging may solve the over-fitting problem, while Boosting can increase it.
only Boosting determines weights for the data to tip the scales in favor of the most difficult cases.

Similarities Between Boosting and Bagging

Both are ensemble methods to get N learners from 1 learner
Both generate several training data sets by random sampling
Both make the final decision by averaging the N learners (or taking the majority of them)
Both are good at reducing variance and provide higher stability

Making Data Science Easy for Beginners