Dimensionality Reduction In Machine Learning

In day to day life we are generating high dimensional data in huge amounts. Such generated data  will be in any format such as text , audio , video , images etc. eg. On mobile phones ,we take photos , we  share, like, comments, on social media platforms. Sometimes we have to share our personal information. In such a way we directly or indirectly generate data .Such  information is used for checking the behaviour of a person.

More data we generate more information we share.In real life for prediction data is important. Without data we cannot conclude.We say data is important  for analysis. What if data is huge ? Over the last two years 90 percent of the data in the world was generated. Performing analysis on a huge amount of data is challenging. There are some important features , patterns available on the data that are playing a vital role. Recognizing such features are important.  

What is Dimensionality reduction?

The term  “Dimensionality reduction”  refers to the technique of reducing input variables from a dataset without losing much information. Higher the dimensions it becomes complex to understand and visualize. Machine learning clustering algorithms work on row wise reduction and dimensionality reduction technique works on column wise reduction. When we perform EDA on a dataset, we often do imputation on missing value.

But in some cases about 80% of values are missing from columns.Such columns are not important so we delete such columns. These methods also help in reducing the dimensions.

Need of Dimensionality reduction

  1. so, It helps to reduce complexity of the dataset and reduce dimension size without losing much information.
  2. Some algorithms may not perform well if data is huge, using dimensionality reduction. it helps to increase accuracy of the model.
  3. Less data is easy to visualize.
  4. It also removes highly correlated variables.
  5. Require less time for processing and memory optimization will be less.
  6. It solves the problem of overfitting. An overfitting model works well on training dataset but fails to perform well in test dataset.

There are two methods of dimensionality reduction:

  1. Feature Selection
  2. Feature extraction 
1. Feature Selection

It is a method to keep important features and remove unwanted features from the dataset. hence, Features having relation with dependent variables only those features are used for fitting the model. 

We can perform feature selection via three methods ;

  • Wrapper methods :

Wrapper method includes selection of different combinations of features. so, In forward selection we start with a single independent feature and check for accuracy. also, Add other relevant  features to the  model. We keep on doing until model accuracy starts to drop.

Eg.If we take a dataset of employees ,and we have to predict for promotion.We have many variables such as Name,Age,Years  of experience,KPI etc.In forward selection ,we start with Years of experience and then add other variables.

In backward selection , we start with all features and remove least significant features until performance starts to drop.
  • Filter Methods : Remove any features that don’t vary widely.In some cases columns are highly correlated with each other.In machine learning we called it as multicollinearity.It is the assumption that there should be no multicollinearity while fitting the model  , because it reduces the performance. Removing multicollinearity reduces redundant features from the dataset.
  • Embedded Methods: Regularization is thus an example of an embedded method. thus, It considers performance of each feature.  Regularization avoids the risk of overfitting. However, It is use for tuning the function by adding additional penalty terms in the error function.

Examples are the LASSO, Elastic Net and Ridge Regression.

2. Feature Extraction: 

In feature extraction ,creating new features reduces high dimensional features to fewer dimensions. Features  having similar properties are combine and form one group. so, This will help in reducing the number of features and make it easy for models to perform better. 

There are many feature extraction technique:

  1. PCA-Principal Component Analysis 
  2. FA-Factor Analysis  
  3. SVD-Singular Value Decomposition
  4. LDA-Linear Discriminant Analysis
  5. MDS-Multi-Dimension Scaling
  6. t-SNE-(t-DistributedStochastic Neighbor Embedding
  7. ICA-Independent Component Analysis 

PCA  in brief

  1. PCA-Principal Component Analysis 
  • Principal Component Analysis is thus a statistical method of  dimensionality reduction technique which captures the  variance of the data, and reduces a large set of variables into group variables.
  • It comes in unsupervised learning.
  • Example : If we have a dataset of kitchen ,and we grouped them into some groups such as  vegetables ,utensils ,cooking tools, electronic appliances etc. These groups contain many variables with similar features. thus, In PCA such groups are called PC-Principal Component. Fruit-PC1 , Vegetables-PC2, Cooking tools-PC-3, Utensils-PC4 etc.
  • PCA is used when all groups are independent with each other. There should be no correlation between groups.
  • PCA required data in numeric format.

Steps to perform PCA

1. Standardize the data: 

Data should be standardized while performing PCA. therefore, Standardisation will be done by subtracting the mean and  dividing by standard deviation. After standardization data comes into the same scale.

Z=value-mean/standard deviation               

2. Calculate the covariance matrix :

For 3 dimensional data, covariance matrix will be represented as 

Covariance Matrix for 3-Dimensional Data

Covariance Matrix For 3-Dimensional Data

Covariance: measures the  correlation between X and Y

• when Cov(X,Y)=0: independent

• move in parallel direction : If Covariance (X,Y)>0

• Cov(X,Y)<0: move opposite direction

3. Find the Eigenvectors of the covariance matrix :

Eigenvectors are the direction of the line. Each eigenvector will correspond to an eigenvalue, whose magnitude indicates how much of the data’s variability is explain by its eigenvector. so, Eigenvalues explain how much variance explained by each variable. The eigenvector with the highest eigenvalue is therefore the principal component

Covariance Matrix * Eigenvector = eigenvalue * eigenvector

4. Translate the data to be in terms of components:

The direction with maximum variation left in data,  orthogonal to the  first PC In general, only few directions manage to capture most of the variability in the data. How much each PC explained variance we can see for each dataset.

Conclusion:

so, In this article we learn the importance of dimensionality reduction. how to do feature selection and feature extraction
& also Various methods of performing dimensionality reduction.

written by: mamta wagh

reviewed by: shivani yadav

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *