Ensemble Methods: Part 2 - Pianalytix - Build Real-World Tech Projects

In the ensemble of learning algorithms part-1, I have discussed the introductory and some of the basic techniques in the ensemble methods. In this blog, I am going to discuss some of the advanced ensemble techniques in the Bagging and Boosting.

Bagging | Ensemble Methods:

The sampling of random subsets from a given training data, bagging instances or hypotheses of the learning algorithm trained on sampled random subsets. The final prediction of the problem determine by considering all the hypotheses made by the learning algorithm.

Some of the widely used bagging methods are the Bagging meta-estimator and random forests.

Bagging meta-estimator:

This is a bagging method which makes the hypothesis by taking the user given base estimator along with the training data. It makes the samples based on the parameters assigned by the user during the model training.

Some of the significant parameters for managing the size of data sampling are max_features, max_samples, in addition to them bootstrap useful to draw samples with a replacement or not.

This bagging meta-estimator can be used for both classification and regression.

Sci-kit learn provides a BaggingClassifier for the classification and BaggingRegressor for the regression.

Random Forest:

Random forest is a meta estimator that contains a collection of the decision tree algorithms. In this collection of trees, each tree is train on a random sample of training data. Each tree fits on the sample and make some hypothesis according to the hyperparameters. The setting of the parameters or constructing a meta estimator is similar to the making of decision trees.

In general, the decision trees tend to overfit with the data, by introducing the randomness among the trees, random forest decreases the variance of the given data.

The n_estimators(number of estimators) is the number of trees in the forest.
however, The max_samples parameter can be helpful to control the size of the bootstrap sample.
The values of max_depth (maximum depth of a tree) and the min_samples_leaf (minimum samples in a leaf) are important to ensure that our model doesn’t overfit with the training data.

For regression, there is a RandomForestRegressor meta-estimator which gives the average of predicted values made by all the decision trees. The RandomForestClassifier which assigns the class labels by majority voting of individual decision trees.

Boosting | Ensemble Methods:

Transforming the weak model with scoring function to a better model by observing the predictive or misclassification error is the Boosting technique. There are five mostly used boosting methods, which includes Adaboost, Gradient boosting, Lightgbm, Xgboost, Cataboost.

Adaboost:

Adaptive boosting(Adaboost), which modifies a weak model sequentially by modifying the initially assigned weights to those data points contributing to the error. This development of the model continues until there is no improvement in the error.

By combining the decision boundaries of the weak learners, the final misclassification error can be improved.

Sci-kit learn provides two handful algorithms, AdaBoostClassifier for multi-class classification and AdaBoostRegressor for regression.

Gradient boosting:

Gradient boosting generates a new model by minimizing the loss function of a weak model. This minimization carries on until there is no change in the loss function.

For example, consider a model having to predict a target value 5 for a set of given data, instead, it results in a value 3. The residual error is +2. Then these residuals are set as target variables for the next training of the model. The 2nd predicted value is 1. If u combine the 1st predicted value(3) and 2nd predicted value(1) we will obtain 4. Now the residual error is reduced by -1.

This can be useful for both the regression(GradientBoostingRegressor) and classification(GradientBoostingClassifier).

XGBoost:

XGboost(extreme gradient boosting) comparatively performs boosting faster than the other methods. In addition to gradient boosting it regularizes the data, often it is also called “regularized boosting”.

Regularization feature is the only difference between Gradient boosting and XGBoost. XGBoost provides a wrapper of gradient boosting classifiers and regressors for Python, R, etc.

Light GBM:

So far we have discussed different boosting techniques, considering our dataset is huge and all the above methods take more time. At this point, LightGBM is so handful to use.

Light GBM is one of the gradient boosting techniques that use tree-based algorithms. An important feature about the Light GBM is it uses the leaf-based growth in developing a tree while others follow level based. Though this method makes overfit on the small datasets.

CataBoost:

CataBoost is similar to gradient boosting, instead, it automatically deals with the categorical features. Encoding the categorical features requires preprocessing and computation time. There is no need for doing one-hot-encoding like other algorithms. It can easily deal with categorical data.

Summary:

To enhance the accuracy of a problem, the ensemble methods play a major role in Machine Learning. So it is recommended to use the ensemble methods in your projects.

Written by: Yeswanth chowdary

Reviewed By: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs