Ensemble Methods: Part 1

The term Ensemble refers to a group of items. “The Ensemble methods are a group of learning algorithms, outputs a hypothesis for a given new data point by considering the weights or votes of their predictions”.

In this blog, I am going to briefly discuss “methods of creating Ensembles and comparing the performance of different Ensembles”.

Consider any supervised learning algorithm. A set of training examples given to a learning algorithm with some unknown function y=f(x). The y values are drawn from a set of discrete space for classification. Here I will consider classification problems to make my intuition easy and powerful.

For a given training sample algorithm make some hypothesis. Considering multiple samples or multiple classifiers we would like to get different hypotheses on data. Based on these hypotheses we will get output for a new data point.

Methods for Constructing Ensemble:

Some of the popular developing ensemble Methods are Stacking, Bagging, Boosting.

Stacking:

As the name says it is a pile of classifiers with different scoring functions. These classifiers are trained on the same training data, each classifier will make some hypothesis. These classifiers are the first level learners. The outputs of the first level learners are given as an input to the meta or final classifier.

By stacking of the first and final learners, overfitting and variance of the data will get minimized.

Bagging:

A learning classifier is trained on a set of random samples of size n drawn with replacement from the original data of size n. “The process of making such samples with replacement is known as the bootstrap aggregation”. For each random sample, the learning classifier makes the instances of it with different hypotheses. on these hypotheses, either the regression or the classification outputs are determine.

Boosting:

The boosting algorithm falls under sequential learning algorithms. “The objective of the boosting algorithm is to make the weak classifier to strong classifier by adjusting the weights”.

For example, consider a classifier C1, C1 is performing weakly or wrongly with some distribution D of the given training data. By adjusting the weights increasing the weights of the miss classified data points the boosting algorithm generates the new C1* classifier. This process happens until there are no miss classified points.

Some of the basic techniques to do on a small and beginner-friendly dataset are

Max voting:

It is a poll build by the VotingClassifier(meta classifier) among the base classifiers. For a given new data point, the base classifiers predict a respective class label for the data point. Based on the majority voting(hard voting) the Voting Classifier assigns the label for the new data point.

In case of the equal majority, the voting classifier follows the ascending order of class labels. The voting is mostly in use for the classification problems.

Averaging:

It is similar to the voting, multiple models are train on the training dataset. For a new point, averaging the predictions make by the models is specify as a predictive output.

Weighted average:

In this method, the significant attributes are assign with weights during the training. The remaining process is the same as the averaging method.

Key points to remember in ensemble methods:

The necessary condition to the ensemble classifier is to be more accurate than the individual classifiers.
Bootstrap is not a method, it is a sampling technique used in Bagging.
To decrease the variance of the model’s, the optimal method is Bagging.
To decrease bias of the model’s, the best method is to be Boosting.
The Stacking method is to increase the predictive power of the model.

Summary:

Ensembles are recognise for obtaining classifiers with good accuracy by combining the less accurate ones. This has increase the popularity of them in machine learning. This can help in getting a good rank in machine learning competitions.