Ensemble learning is a kind of approach in machine learning in which we can train multiple models/predictors using the same learning algorithm for our prediction /forecasting.
Ensemble technique is one of the most powerful approach for classification and regression in the machine learning world . Most of the time we are using ensemble technique for our better and stable model
Why should we prefer ENSEMBLE TECHNIQUES and what are different type of ENSEMBLE TECHNIQUES:
In ENSEMBLE TECHNIQUES we aggregate predictions from a group of predictors which may be classifier or reggresion and most of the time the prediction is better than the one obtained using a single predictor/model. Generally ensemble techniques use decision tree algorithms to train their group of models/ predictors.
In this process we are able to reduce variances.
Then simple question comes to our mind how it’s possible:
From this diagram we can easily understand.
Let’s assume the standard deviation of each model presenting in the diagram is σ .
Then single model variance =σ2
value of model1,model2,model3 …….modeln=z1,z2,z3,z4…zn
But the expected value will be the average of all model/predictors
μ=z1+z2+z3+z4+…….zn / n here n= no of predictors
Variances of final ensemble model = σ2 / n here n= no of predictors
So for that we will get very low variances as compare to the individual models .So that only we are always trying to go with ensemble approach.
Types of ENSEMBLE TECHNIQUES:
HERE we are going to discuss about bagging and boosting:
We can use a bagging approach in two ways one is BAGGING AS CLASSIFIER and another one is BAGGING AS REGRESSION. In BAGGING we are able to build a number of models if it is a regression in that case it will take the average value of all predictors,If it is a classifier It will go for voting.
Let’s understand how bagging ensemble technique work?
From this diagram we can easily understand how Bagging works
Bagging is the type of Ensemble Technique in which a single training algorithm is used on different subsets of the training data where the subset sampling is done with replacement(bootstrap).Once the algorithm is trained on all subsets.The bagging makes the prediction by aggregating all the predictions made by the algorithm on different subset.
In this above diagram we can see two processes are there one is Bootstrapping and second one is Aggregation.
Bootstrapping is a technique of sampling different sets of data from a given training set /original dataset by using replacement .Replacement means it can take different features in different ways for different models/predictors.
After bootstrapping the training dataset,it retrains on all different sets and aggregates the result.This technique is known as bootstrapping aggregation or bagging.
ADVANTAGES OF BAGGING
1: Bagging significantly decreases the variance without increasing bias.
2: Average of the result in case of regression is not going to change much.
3: Variance decreases in higher amounts.
Boosting is an ensemble approach (means it involves several trees) that starts from a weak decisson and keeps on building the models/predictors such that the final prediction is the weighted sum of the weaker decision makers . The weights are assigned based on the performance of an individual tree.
LET’S UNDERSTAND HOW IT’S WORK:
In case of boosting each classifier gets trained on the sample set and learns to predict . Ensemble parameters are calculated in a stage wise way which means that while calculating the subsequent weight the learning from the previous tree is considered as well.
The misclassification error then feeds into the next classifier in a chain and corrects the mistake until the final model predicts accurate results.
TYPE OF BOOSTING
SIMILARITIES AND DIFFERENCE BETWEEN BAGGING AND BOOSTING
1:Both are using ensemble techniques.
2: Both are trained data sets by using random sampling.
3:Both are able to reduce variance and make the final model/predictor more stable.
From this above diagram you can understand the basic difference between bagging and
APPLICATION OF BAGGING AND BOOSTING:
We should know about where we can implement these two most powerful ensemble techniques of ml in our real life. You can use both the technique in all regression and classification problems.
1: Majority voting for agricultural land
Ensemble technique in ML is a best approach to make a better decision for any kind of problem. Rather than using a single predictor it’s better to use multiple predictors to reach out your final solution of your problem.We can use r or python to build our Ensemble(Bagging/Boosting model).
Article by: Ayosharya