Feature Selection

Want to become a master in ML?. Are you trying to obtain good accuracy with your model ?. Then it is all about the “Best Features” of our data. There might be various features or subsets in your data but every feature isn’t significant. however, Insignificant features or variables in the data could turn out to have an adverse effect on the model’s performance. 

To make our life easier, sci-kit-learn comes with the “feature_selection” module to select variables or thus subsets of our data which really contribute to the output of our data.  

In this blog, I’m gonna give a brief idea about “types of Feature Selection and their statistical background”. 

“The Pruning Of The Unwanted Data Or Generating Of Useful Data From The Given Sample Is “Feature Selection.”

Methods of Feature Selection:

  1. Wrapper method
  2. Filter method
  3. Embedded method

however, The code for the above-mentioned methods is available here.

feature selection with sci-kit-learn.

1. Wrapper method:

As the name of the method, it is a “wrapper of some external estimator and sample data”. The estimator initially trained on the whole sample data. then, It assigns weights to features based on how they are corresponding to the output of the model. Then recursively it eliminates the features with low weights or scores.

therefore, Some of the accessible wrapper methods are “ Recursive Forward Selection, Recursive Backward Elimination, Recursive Feature Selection”.

i. Recursive Forward Selection:

In this forward selection, based on the model performance w.r.t the data it will thus search for the best features. so For each iteration, it will keep on selecting features with high scores or p-value and add them to your feature subset.

II. Recursive Backward Selection:

It is the same as Forwarding selection but it works inversely, it will search for the insignificant features. For each iteration, it will try to remove features with a low p-value from the initial data sample.

III. Recursive Feature Selection:  

In this feature selection, it eliminates the features which don’t perform well to the output. It will repeat this until it finds the best features from the data. It will train the estimator on the high scoring subset, ignoring the remaining data sample.

2. Filter method:

The techniques in this method follow univariate statistical measures. The filter technique actually “measures the proxy of an input variable at a time with the target variable”. Such type of measuring the probability is easier than cross-validation and not effective to generalize the data.   

The well-known measures in the filter method are “Mutual Information, Chi-squared, Pearson correlation”.

I. Mutual Information:

Mutual information measures how much one random variable tells about another. With that, it can filter the features based on how much information they conveyed to the target variable. 

In statistical terms, It is a measure of the strength of association between variables of X and Y”. 

II. Chi-squared:

Chi-square test(χ2) measures the independence of the two categorical variables. Chi-square(χ2) returns the importance of the attributes in an order of the features in the training dataset. Those with the highest scores are taken into consideration for the training of the model.

In statistical terms, It measures the relation and number of outcomes occurring in frequency or not”.

III. Pearson correlation:

Pearson’s correlation or Pearson’s r measures the relationship between two continuous variables. The Pearson’s correlation value will be in between -1 and 1.

If the correlation value is 0 means there is no relationship, either positive or negative between those variables.

In statistical terms, It measures the linear correlation between two variables X and Y”. 

3. Embedded Method:

The embedded methods are quite different from the Wrapper and Filter methods. In this method, Feature selection and model learning happens at the same time. The embedded method overcomes the computational complexity.

This type of feature selection is performed by the algorithms having their own built-in penalization methods. “It penalizes the variables by applying the penalty to reduce overfitting and complexity”. L1 regularization is an important example of the embedded method. Most of the insignificance feature coefficients are effectively driven to zero by the algorithm. 

Summary:

If you think, Is feature selection necessary?.

Yes, For good accuracy of your model and to drop features that are not useful for training of the model. I believe there will be an uplift of model accuracy after implementing Feature selection.

Written by: Yeswanth chowdary

reviewed by: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *