Linear Regression | The First Step Towards Prediction! - Pianalytix

Do you want to know what Regression is?

Let’s understand it together. Regression is the construction of an efficient model where the goal is to find the dependent attribute from its relationship with the independent attribute, the value of the dependent attribute is either real or continuous.

Now let’s come to the linear regression.

Linear regression is one of the very basic forms of machine learning models where we train the model to find the relationship between the input variable and the target variable.

As the name suggests, linear regression means that there are two variables which must be on the x-axis and the y-axis and there exist a linear correlation between them.

A simple linear regression is expressed mathematically as

y = b0 + b * x

and you might have recognized this formula as we have studied this in the schools while studying about the equation of straight lines. Now let’s go through these variables and coefficients one by one.

So the first variable is y and it is the dependent variable which is something we are trying to figure out, for instance how the person’s salary changes with the years of experience that he has or what grade does the student get depending on how much time he is putting in for the studies.

Now the next variable that we will be talking about is the independent variable x. In this case of simple linear regression there is only one independent variable. We are assuming that it is causing the dependent variable to change. Sometimes the dependent variable might not be the direct factor but it’s still there and there might be an implied association between the two and we need to figure out that particular association sometimes.

Next up is the b which is the coefficient of x and it tells how much change in x will affect the value of the output variable which is y.

And now the last one which is b0, the constant. We can understand it as the distance from the x-axis at which the line cuts the y-axis.

So now, let’s understand linear regression by taking an example.

Let’s suppose some company’s HR calls you and you are the Data Scientist and he has given you some data and asked you to build a model which will help him during the salary negotiation with the new recruits. So the dataset which he has given you has the year’s of experience and salary of the employees working in the company. Based on this data you need to build a model which will predict the salary of the new employee.

YEARS OF EXPERIENCE	SALARY
1.1	39343
1.3	46205
1.5	37731
2	43525

Right now we cannot predict anything from the data available to us. We need to split this data into a training set and test set and build a model on the training set and verify it by predicting the values from the test set.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

The verification is done so that we can check whether the model is working fine and whether it has been trained enough to predict the values, salary in this case, when the new inputs are being provided.

The training of the linear regression model is done by importing the linear_model module from sklearn library. This will help in the training of the model.

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

If we visualize the data from the training set it will look something like this.

plt.scatter(X_train, y_train, color = ‘red’)
plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)
plt.title(‘Salary vs Experience (Training set)’)
plt.xlabel(‘Years of Experience’)
plt.ylabel(‘Salary’)
plt.show()

In the graph above, the red dots are the real salaries which the employees are getting and the blue straight line is the predicted salaries which the model which we have created have predicted.

Now after this we will need to verify whether the model is working correctly or not. For this we will be visualizing the test set result and along with this the predicted salaries of the test set result will also be there.

plt.scatter(X_test, y_test, color = ‘red’)
plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)
plt.title(‘Salary vs Experience (Test set)’)
plt.xlabel(‘Years of Experience’)
plt.ylabel(‘Salary’)
plt.show()

The red dots in the figure above are the real salaries and the blue line passing through them is the line having predicted values of salaries. We can clearly see that the model which we have built is predicting the salaries very accurately and now we can present the model to the HR and the company can have more profit by using our machine learning model.

With this we have come to the end of this article and I hope that after going through it, you will get some feel about linear regression and how you can implement it in your machine learning models.

Article by: Adil Hussain

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program

Also Check Other technical and Non Technical Internship Programs