Regression Analysis

the statistical process of predicting the numeric value of one variable from the given values of another variables is the Regression Analysis.

The word “regression” has its origin in the works of Sir Francis Galton (1822-1911), who did research on heredity first in sweet peas. however, later on human stature, He showed that an adult offspring from either short or tall parent has the tendency to revert back to the average height of the general population. In the beginning, he used the word “reversion”, and later regression refers to this phenomenon.

In a statistical sense, in a regression problem, the variable whose value is wishes to be predict as random from the state values of the other variables, which are assume to be non-random. The random variable is termed the dependent or response variable and the other variables are termed as independent or explanatory variables.

Types of Regression analysis Model

  1. Linear Regression.
  2. Logistic Regression.
  3. Ridge Regression.
  4. Lasso Regression.
  5. Polynomial Regression.

1. Linear Regression

The objective of a linear regression model is to find a relationship between one or more features (independent variables) and a continuous target variable (dependent variable) which can be represent by a line, which can help to find other future values with relation obtained.

When there is only one feature, it is refer to as Uni-variate Linear Regression or Simple Linear Regression.

if there are multiple features, it is refer to as Multiple Linear Regression.

2. Logistic Regression 

Regression analysis always requires numeric data. When attribute are categorical, they have to be change to numeric values to apply regression analysis. Logistic regression analysis can be conducted only when the dependent variable is dichotomous (binary).

logistic regression is a predictive analysis method use to predict the relationship between two or more variables. also, With the help of logistic regression, we can also describe data.

we can also explain the relationship between one dependent binary variable and one or more variables in the data set. 

3. Ridge Regression 

one of the types of regression in mL usually use when there is a high correlation between the independent variables. This is because, when the case of multi collinear data occurs, the least square estimates give unbiased values.

But, in this type of case, the collinearity is very high, there can be some bias value. To resolve the bias value problem in very high collinearity a bias matrix is introduce in the equation of Ridge Regression. This type of regression method is a powerful method where the model is less susceptible to overfitting. 

4. Lasso Regression

Lasso Regression is the types of regression technique which performs regularization along with feature selection. It also prohibits the absolute size of the regression coefficient. As a result, In lasso regression, the coefficient value gets nearer to zero, which does not occur in the case of Ridge Regression.

Due to this, we use feature selection in Lasso Regression, which allows selecting a set of features from the dataset to build the model. In the case of Lasso Regression, we use only the required feature, and the other ones are mark as zero. This helps in avoiding the overfitting in the model. In this case, the independent variables are highly collinear, so Lasso regression picks only one variable and makes other variables to shrink to zero.

5. Polynomial Regression

In the polynomial regression technique, the relation between the dependent and the independent variable is represent by an nth degree polynomial. Least Mean Squared Method is in use to find the best fit line, which is actually not a straight line but a curve. The curve depends upon the degree of the polynomial.

Implementation

The code below is the implementation of simple Linear Regression to find out the relation between the data present in x and y array and plotting line obtained.

Application of Regression Analysis

  1. Financial markets – Regression techniques is in use to find the total amount of home loan that can be state to an application based on the applicant’s income, expenditure, assets and financial commitments.
  2. Medical Science – Estimation of the concentration of foreign bodies in the human body can be made from other attributes, such as WBC count in the blood. Any attribute which is not measured in the body but related to other measurable attributes can be estimated.
  3. Retail Industry – The number of products to be stored or ordered for a given period of time can be estimated from demand, consumption rate and/or sales.
  4. Environment – Algal bloom in water bodies can be estimate from temperature, rainfall and other influent data into water bodies.
  5. Social Science – The crime rate of a locality can be estimated based on the unemployment rate, average family size, income and other factors.

Summary

however, The regression analysis technique is in use to build a model according to the kind of data available. It is currently being used in many fields like house prediction and its implementation in future is going to increase even more.

Written By: Prateek Sharma

Reviewed By: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *