Python Function for Curve Fitting
Introduction
For a given set of data, curve fitting uses optimisation techniques to determine the best possible values for the function’s parameters.
Although supervised learning may learn from examples, curve fitting requires us to specify the function mapping the inputs to the outputs.
The mapping function, also called the basis function, may have any shape we choose, including a straight line (in the case of linear regression) or a curved line (in the case of polynomial regression), among many others. By using optimisation to determine the specific ideal arguments of the function, this mapping function provides the pliability and control necessary to specify the shape of the curve.
In this lesson, we’ll learn what curve fitting is and how to do it in Python.
By the conclusion of this lesson, you should be able to
following:
- Curving fitting includes finding the optimal arguments to functions, mapping examples of inputs to outputs.
- The SciPy Python library offers an Application Programming Interface (API) in order to fit a curve to a dataset.
- Using the curve fitting in SciPy in order to fit a variety of different curves to the observation set.
Understanding Curve Fitting
Curve fitting, as we’ve already established, is an optimisation problem that lets us locate a line that’s suitable for a given collection of observations.
The process is simplified if we see the curve-fitting process in two dimensions as a graph.
Let’s pretend we’ve amassed some input and output data samples from the issue area.
When used as an input to a function, the x-axis of a graph may be thought of as an independent variable. The y-axis represents the function’s output, which is a dependent variable. Even if we don’t have the exact form of the function that maps some sample inputs to some sample outputs, we can still get a good approximation by utilising a more generic form of function.
As part of the curve-fitting process, you may expect
stages:
- First of all, define the functional form of the mapping function (also known as the objective function or the basis function)
- Secondly, search for the arguments to the function that consequence in the minimum error.
Domain-provided observations are used to estimate error by feeding them into a candidate objective function and then making an educated guess about the result. The results of the calculations are compared to the results of the experiments.
When the fitting process is complete, the basis function may be used to interpolate or extrapolate to the newly introduced points in the domain. Running a succession of input values through the basis function to estimate a sequence of outputs is a common operation. Next, we use this information to create a line plot that shows the input/output scatter and how the line was fit to the data points.
The shape of the basis function is the key to understanding why curve fitting occurs. Use the following formula to define a straight line between inputs and outputs: y = a × x + b
Where y represents the predicted result, x represents the input, and a and b represent the parameters of the basis function determined by an optimisation procedure.
Due to its nature as a weighted sum of inputs, this equation is classified as a Linear Equation.
They are denoted as coefficients in a linear regression model but are referred to as weights in a neural network.
This equation is infinitely generalizable, hence the concept of curve fitting is not limited to two dimensions (where one is input and the other is output). On the other hand, it could have a wide variety of input options.
The following is an example of the formula for a line objective function with two input variables: y = a(x1)a(x2)+b(x3)b(x3)
The equation need not have a straight-line appearance.
Adding exponents allows us to create curves in the objective function. For instance, we may substitute in an input that is a squared version of the input weighted by another parameter. y = axe + bx2 + c2
Polynomial regression is defined as the following equation, where the squared component represents a polynomial of degree 2.
So, with the aid of linear algebra, we may determine the best possible values for the argument by fitting the equations using shrinking least squares and then estimating them analytically.
Sin, cosine, tangent, and maybe even more may be added to the mix by some of us. The following result is obtained by adding the words together, each of which is given a weight in the form of an argument: the formula for y is: a sin(b cos(x)) + c
We lose the ability to make analytic estimates of the arguments when we include arbitrary mathematical functions into the objective function, necessitating the use of an algorithm for iterative optimisation.
Considering that the mapping function is now Non-Linear and not convex, this equation is classified as Non-Linear least squares.
Since we now know what curve fitting is, let’s move on to learning how it’s done in practise.
Python.
Performing Curve Fitting in Python
Python may be used to conduct curve fitting on the dataset. The SciPy package is an open-source library available in the Python programming language. The curve fit() method in this SciPy library does Non-Linear Least Squares curve fitting.
The goal function name and input and output data are the only required arguments for the curve fit() function.
Input data samples and a limited number of parameters are necessities for the goal function. Non-linear Least Squares optimisation will be applied to the remaining parameters in the form of coefficient or weight constants.
Let’s look at a sample illustration to see how this works in practise.
Let’s pretend we just have a few domain observations loaded, with x input variables and y output variables. Syntax:
...
# loading the input variables from a file
values_x = ...
values_y = ...
Now we need to create a function in Python that takes in inputs and arguments and returns an objective function that can be used to fit a line to the data.
Let’s pretend the function is a straight line, which would have the form given below. Syntax:
# defining a mapping function
def mapping(x, a, b, c):
return a * x + b
The created mapping function may then be used in conjunction with the curve fit() function to fit a straight line to the dataset.
Using curve fit(), you may get the best possible results for your goal function. To provide just one example, consider the coefficient values. In addition to the computed inputs, the function will return a co-variance matrix, which may be disregarded for the time being. Syntax:
...
# calling a the curve_fit() function
popt, _ = curve_fit(mapping, values_x, values_y)
After a successful fitting, we may assess the result for any subjective input by using the optimum arguments and the objective function mapping().
The results of the domain examples we have previously collected may be used in this function. Interpolating the observed values may need more recent values. It might potentially include values that are outside the range of the observed values, which were extrapolated. Syntax:
...
# defining new input values
new_x = ...
# unpacking the optimal arguments for the mapping function
a, b, c = popt
# using the optimal arguments to estimate new values
new_y = mapping(new_x, a, b, c)
Now that we know how to use the API for curve fitting, let’s have a look at a real-world example.
example.
Working Example for Curve Fitting in Python
Let’s get started by bringing in the required packages and libraries. Syntax:
# importing the required packages and libraries
from scipy.optimize import curve_fit
from numpy import array, exp
import matplotlib.pyplot as plt
If we have successfully imported all of the necessary packages, we will next need test data from the programme in order to actually conduct the curve fitting. The following definitions of input data (x) and output data (y) will be used. Syntax:
# defining the variables
values_y = array([11, 10, 12, 14, 15, 15, 14, 13, 14, 11, 10, 11, 7, 9, 8, 6, 5])
values_x = array(range(len(values_y)))
We’ll next construct some mapping functions to use with the curve fit() method, and check for discrepancies in the fitting, later on. The following set of equations will serve as our mapping
functions:
-
y = ax
2
+ bx + c -
y = ax
3
+ bx + c -
y = ax
3
+ bx
2
+ c - y = a × exp?(bx) + c
The following syntax outlines the steps for doing so: Syntax:
# defining objective functions
def mapping1(values_x, a, b, c):
return a * values_x**2 + b * values_x + c
def mapping2(values_x, a, b, c):
return a * values_x**3 + b * values_x + c
def mapping3(values_x, a, b, c):
return a * values_x**3 + b * values_x**2 + c
def mapping4(values_x, a, b, c):
return a * exp(b * values_x) + c
The curve fit() method makes it easy to find a good fit for the given x, y, and mapping function. If you call the curve fit() function, it will give you the best possible inputs and the covariance values it found. Syntax:
Product of
# using the curve_fit() function
args, covar = curve_fit(mapping1, values_x, values_y)
print("Arguments: ", args)
print("Co-Variance: ", covar)
Computers:
Arguments: [-0.08139835 0.8636481 11.1362229 ] Co-Variance: [[ 2.38376125e-04 -3.81401800e-03 9.53504499e-03] [-3.81401800e-03 6.55534344e-02 -1.88793892e-01] [ 9.53504499e-03 -1.88793892e-01 7.79966692e-01]]
We can see that the curve fit() method checked for the best inputs and calculated the Co-Variance. Afterwards, we’ve provided the users with a printed version of these figures.
We’ll start by fitting the data by passing the objective function, x, and y, to the curve fit() method and retrieving the parameter values for a, b, and c. We may ignore Co-Variance here since we are not using its values in any way. From there, we’ll use the estimated a, b, and c values for each function to get an approximation of the fitted y. Syntax:
args, _ = curve_fit(mapping1, values_x, values_y)
a, b, c = args[0], args[1], args[2]
y_fit1 = a * values_x**2 + b * values_x + c
args, _ = curve_fit(mapping2, values_x, values_y)
a, b, c = args[0], args[1], args[2]
y_fit2 = a * values_x**3 + b * values_x + c
args, _ = curve_fit(mapping3, values_x, values_y)
a, b, c = args[0], args[1], args[2]
y_fit3 = a * values_x**3 + b * values_x**2 + c
args, _ = curve_fit(mapping4, values_x, values_y)
a, b, c = args[0], args[1], args[2]
y_fit4 = a * exp(values_x * b) + c
Next, we’ll make a graph to visually confirm the changes. As an example of the appropriate syntax, consider the following: Syntax:
# plotting the graph
plt.plot(values_x, values_y, 'bo', label="y - original")
plt.plot(values_x, y_fit1, label="y = a * x^2 + b * x + c")
plt.plot(values_x, y_fit2, label="y = a * x^3 + b * x + c")
plt.plot(values_x, y_fit3, label="y = a * x^3 + b * x^2 * c")
plt.plot(values_x, y_fit4, label="y = a * exp(b * x) + c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc = 'best', fancybox = True, shadow = True)
plt.grid(True)
plt.show()
This is the program’s resulting graph: Graph: