Finding The Optimal Curve Fit For Regression Analysis

What is Curve fitting?

Curve fitting is an optimization technique used to find the optimal set of parameters for mapping our own function to the plot defined by the data points. Our mapping function is otherwise known as the basis function.

Why curve fitting?

Before starting any machine learning algorithm, we want to get an intuitive idea of how our data points are related to each other. It is often easier to express these dependencies through graphical mechanisms. We often take pairwise features and try to plot them in a graph and get that visual feel from the scatter plots. These scatter plots are very essential in determining which type of algorithm we should use for our machine learning problem. 

The thing we need to understand here is that curve fitting is one of the techniques for mapping the function onto the scattered data available. It becomes fairly easy when the graphical plots are 2-dimensional in nature. The mapping function so got gives us the optimal parameters require for plotting the curve. The curve with optimal parameters will help in finding the minimal loss function by minimizing the errors.  This will help in building a better model having higher accuracy.

Understanding curve fitting

It is easiest to think of our fitting in the 2-dimensional plane such as a graph. Consider a situation where we have collected the input and output data. We try to plot the input and output data points but we are unaware of the function that maps them. Curve fitting involves first defining the functional form of the mapping function and then searching for the result with minimum error. Error is calculate by considering the observations from our domain passing from the to our mapping function.

Once the curve is fit, we can interpolate or extrapolate new points in the respective domain. We can run a sequence of inputs to calculate a sequence of outputs. Saying all these things, in a nutshell, signifies the fact that we have to find the curve that correctly approximates the distribution of the data points.

We can start with the basic straight line equation

 y=ax+b

Where a represents the slope, b represents the intercept, y represents the output data, and x represents the input data. The notion of curve fitting is not limit to two variables, we can have an equation like y=ax1+bx2+c.

The equation need not be a straight line also. An equation such as y=ax^2+bx+c is a quadratic equation which falls under the category of polynomial regression. 

We can also map other mathematical functions such as sine cosine and more.

y=a*sin(bx)+c is one such example.

Implementation in Python

The SciPy library provides the curve_fit function for fitting the curves via the non-linear least-squares method. The function takes some input and output data as the argument along with our objective function or what we call the mapping function. 

The mapping function takes the input and output data along with the optimized parameters that are require to plot the data.

The steps include:

  1. Load the input and output values into x and y respectively.
  2. Define the objective function.
  3. Apply the curve fit function and get the optimal parameters.
  4. Unpack the arguments, take new input values, and try to plot the curve based on the optimal parameters.

Let us try to apply this knowledge in Longley Economic Dataset.

We will be using population and employment as the input and the output variables.

The scatter plot gives us dependency on how the population and the employment vary. It can clearly see that as the population increases the number of employment increases.


Let us define our objective function, here in this case it will be the straight-line equation.

Based on this objective function we are using the curve_fit function from the 

Scipy library and will be mapping our function to the scatter points.

Let us change our objective function and try to map a quadratic equation.

Make appropriate changes while unpacking the parameters.

The mapping function along with the equation can be visualize as:

Let us change the objective function by mapping a 5-degree polynomial.

Based on the objective function we get, a 5-degree polynomial which is depicted below:

Finally, let us try to map an arbitrary function of sine and cosine by changing the objective function.

The sine function mapped is visualized as follows:

Conclusion

From the various mapping functions, we looked at, to get the best fit for the data we need to define the function explicitly and check for which of the following cases the loss is minimum. If we do not follow the curve fit approach we would have to fit our model with various algorithms and see which one among them gives the optimal output with the highest accuracy.

Hope you had a good time learning! 

Written By: Swagat Sourav

Reviewed By: Vikas Bhardwaj

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *