Maurício Pinheiro
This paper is the fourth in our series of AI-Talks.org tutorial texts. It is focused on non-linear curve fitting concepts that are important for machine learning. By utilizing this approach, we can gain insights into the training of neural networks, which can help us better understand and analyze data. Our series seeks to provide a comprehensive and accessible resource for anyone interested in the field of AI.
1. Introduction
Pattern recognition is an essential aspect of Machine Learning and Artificial Intelligence, enabling the interpretation of massive amounts of data to predict outcomes and make decisions. It involves the identification of patterns in Big Data that would be challenging or impossible for humans to detect. The process of pattern recognition in AI is vital for various applications such as image, speech, and natural language processing, self-driving cars, and many other domains.

To recognize patterns effectively, AI algorithms require the ability to learn from examples. Neural Networks (NN) have become a popular choice for pattern recognition due to their ability to learn and adapt from large amounts of data. In NNs, each layer of the network consists of a series of interconnected Artificial Neurons.
Each neuron processes input data by applying a set of weighted connections and a specific activation non-linear function, which transforms the input signals into an output signal. You can think of it as a mathematical function composed of two functions in sequence: a linear one (the Averaged Sum) and a non-linear one (the Activation Function). The result is a number that will pass to the next neuron layer.

NN has proven to be particularly effective for complex pattern recognition tasks, such as Object Recognition in images, Speech Recognition, and Natural Language Processing. Besides learning and improving from experience, NNs can handle noisy, incomplete, or ambiguous data, and they can generalize well to new, unseen data, making them suitable for real-world applications.
Non-Linear Curve Fitting, on the other hand, is a robust method for identifying patterns in data by fitting a curve to a set of data points. The process begins by initializing the algorithm with a mathematical function and some parameters, and then adjusting these parameters to achieve the best fit to the input data. Once an optimized function with the resultant fit parameters is achieved within a pre-defined tolerance, it becomes possible to make predictions and identify trends within the data.
Here, we will provide an introduction to one of the fundamental techniques of Curve Fitting by Least Square Curve fitting. This technique involves minimizing the sum of squared residuals between the observed data points and the values predicted by a mathematical function. This technique is commonly employed in diverse fields such as engineering, physics, economics, biology, and of course, machine learning.
In the subsequent papers of this series, we will delve into more advanced topics related to machine learning, such as Stochastic Gradient Descent and Backpropagation. These techniques are essential for training Neural Networks as they involve finding the appropriate weights and biases for all artificial Neurons. This is the foundation of Deep Learning models that have transformed the field of Artificial Intelligence in recent years.
By understanding these techniques, you will gain a deeper insight into the underlying principles of machine learning and be able to develop more effective models for solving complex problems by yourself.
2. Curve Fitting
Curve Fitting can be seen as an example of pattern recognition, as it involves identifying patterns in the form of a mathematical function (Bishop, 2006), given a collection of data points. Curve fitting is widely used in many fields, including physics, biology, engineering, and finance, and is an essential tool for data analysis and prediction. It not only enables the identification of patterns in data but can also serve as a pedagogical tool for learning one of the fundamental aspects of machine learning algorithms, namely, how to train a NN to perform a specific task. This training process involves minimizing a Cost Function, which can be written as the average of the squared differences between hundreds of input training data from minibatches and their expected (labeled) output.
Minimizing the cost function for neural networks is conceptually similar to minimizing Least Square Errors in traditional curve fitting. However, the key difference lies in the number of parameters involved. While conventional curve fitting typically involves searching for a few optimal parameters, neural networks can have as many as 100 trillion parameters, as seen in the case of GPT-4.
To achieve this, the Stochastic Gradient Descent method is commonly employed. By minimizing the average cost function, we actively adjust the weights and bias of the neural network, which can be seen as analogous to fit parameters in curve fitting. This adjustment process is known as Backpropagation and was independently developed by several scientists, including Geoffrey Hinton, who is a widely recognized pioneer in the field of AI and is commonly referred to as the “godfather of AI.”
Curve fitting curve has a wide range of applications in various fields, such as:
- Physics: modeling the behavior of physical systems and making predictions about the outcomes of experiments, such as the trajectory of a projectile or the oscillations of a pendulum
- Epidemiology: modeling the spread of diseases and predicting the future course of outbreaks
- Economics: modeling the relationship between variables such as prices, supply and demand, and economic growth
- Engineering: designing and optimizing systems and devices, such as machines and structures
- Medicine: modeling the effectiveness of treatments and predicting the progression of diseases
- Geology: modeling the behavior of geological systems and predicting the outcomes of natural disasters
- Environmental science: modeling the impact of human activity on the environment and predicting the consequences of environmental changes
These are just a few examples of the many applications of curve fitting in different fields.
3. Least Squares Fitting
There are many different types of curve fitting techniques, but one common approach is Least Squares (LSQ) fitting. LSQ fitting is a technique used to identify patterns in data by fitting a curve to a set of data points (Bishop, 2006). The objective is to find the curve that minimizes the sum of the squared differences between the data points and the curve. This is done by minimizing the error term, which is the sum of the squares of the differences between the data points and the curve. To find the best-fitting curve, the parameters of the curve are iteratively adjusted and the error term is recalculated until it is minimized to a specified tolerance.
To learn the algorithm supose you have a colection of N datapoints labeled by

yi and xi can be, for example, the number of COVID-19 cases and the number of days after a particular date can be used to fit the growth curve of the epidemic over time (see below).
Now, suppose you want to fit this data to an exponential growth curve that corresponds to the rapid spread of the virus at the beginning of the epidemic. We must have a model (Target Function), such as exponential growth, as the target function. For example, the exponential growth function can be written as:

where y represents the number of cases, x represents the time (in days), and a, b and c are constants to be determined through fitting the model to the data. To do that one need to minimize the sum of the squared differences between the actual data points and the predicted values of the curve (in this case y(x)). The formula for the sum of squared errors (S) is:

where yi is the actual value, y(xi) is the predicted value, and Σ represents the sum of all the data points. The least squares method calculates the coefficients of the curve or line that minimizes the S until certain tolerance is achieved.
To minimize the sum of squared differences (S), you need to calculate its partial derivative with respect to the coefficients of the target function.



Note that we used the chain rule for differentiation. We then end up with a system of equations, which can be solved in various ways, such as using the Levenberg-Marquardt algorithm. For a LSQ Linear Regression with two coefficients and a small set of points, it can be solved analytically, and even a scientific calculator can handle it. However, for non-linear curve fitting, things can become slightly more complicated, and we often have to rely on numerical methods.
For the LSQ fit of non-linear fucntions there are several ways to perform this calculation, and most statistical and mathematical data analysis software, such as Microcal Origin, MATLAB, and SciDAVis, have this routine implemented. While we won’t go into the details of the algorithm, here’s an example of Python code (generated with a simple prompt in ChatGPT) that you can modify and improve to perform a linear least-squares fit for linear and non-linear functions.
ChatGPT Prompt> write a simple python code that generates 10 random (y,x) data that are around a simple exponential and a least square fit of these data with a y=a*exp(b*x)+c target function. the output should be a graphic with the original datapoints and the best fit, with best parameters and uncertainties writen in texts boxs in the graph with the R^2 numeric value
ChatGPT Output> Here is a sample code that generates 10 random data points around a simple exponential function, performs a least-squares fit using the curve_fit function from the scipy.optimize library, and produces a graph with the original data points and the best fit, along with the best-fit parameters, uncertainties, and R^2 value displayed in text boxes:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Define the target function
def exp_func(x, a, b, c):
return a * np.exp(b * x) + c
# Generate some random data around the target function
np.random.seed(0)
xdata = np.linspace(0, 1, 10)
ydata = exp_func(xdata, 1, 2, 0) + np.random.normal(scale=0.1, size=len(xdata))
# Perform the least-squares fit
popt, pcov = curve_fit(exp_func, xdata, ydata)
# Extract the best-fit parameters and uncertainties
a, b, c = popt
da, db, dc = np.sqrt(np.diag(pcov))
# Calculate the R^2 value
residuals = ydata - exp_func(xdata, *popt)
ss_res = np.sum(residuals**2)
ss_tot = np.sum((ydata - np.mean(ydata))**2)
r_squared = 1 - (ss_res / ss_tot)
# Generate the plot
fig, ax = plt.subplots()
ax.scatter(xdata, ydata, label='Data')
xfit = np.linspace(0, 1, 100)
yfit = exp_func(xfit, *popt)
ax.plot(xfit, yfit, label='Best fit')
ax.text(0.1, 0.8, f'a = {a:.3f} +/- {da:.3f}', transform=ax.transAxes)
ax.text(0.1, 0.75, f'b = {b:.3f} +/- {db:.3f}', transform=ax.transAxes)
ax.text(0.1, 0.7, f'c = {c:.3f} +/- {dc:.3f}', transform=ax.transAxes)
ax.text(0.1, 0.65, f'R^2 = {r_squared:.3f}', transform=ax.transAxes)
ax.legend()
plt.show()
In this example, the target function is defined using the exp_func function, which takes the independent variable x and three parameters a, b, and c. The function generates 10 random data points around the target function by adding random Gaussian noise to the function values. The curve_fit function is used to perform the least-squares fit, and the best-fit parameters and uncertainties are extracted from the resulting covariance matrix. The R^2 value is also calculated using the residual sum of squares and the total sum of squares. Finally, the plot is generated using matplotlib, with the original data points, the best fit, and the text boxes displaying the best-fit parameters, uncertainties, and R^2 value.
The output, after running at Jupyter Lite, is shown below:

For the purpose of this paper, it is sufficient to understand the principles discussed above, as this approach is similar to Gradient Descent used to train most NNs. As will see in a future article, Gradient Descent is an optimization algorithm commonly used in to minimize the cost function of a neural network during training.
In LSQ fitting, the cost function is the sum of squared errors between the predicted values and the actual values in the training data. In Gradient Descent, the cost function is the average difference between the predicted and actual outputs of the neural network on the training data.
While the algorithms have some similarities, there are also differences. For example, LSQ fitting involves solving a system of equations to obtain the optimal parameters, while Gradient Descent is an iterative optimization algorithm that adjusts the weights of a neural network to minimize a cost function, typically using Backpropagation to compute the gradients of the cost function with respect to the network’s weights. Additionally, LSQ fitting is typically used for curve fitting, while Gradient Descent is used in a wide range of machine learning tasks beyond curve fitting.
4. Example of an LSQ Fitting Application: Predicting the Evolution of COVID-19 Cases
A more realistic example of how LSQ works and how you can obtain predictions from it is to consider a scenario in which we want to predict the evolution of COVID-19 in China using official case count data from October 30, 2022 to December 7, 2022 (Worldometers, 2022).
The dataset represents the number of confirmed COVID-19 cases reported in China between October 30, 2022 and December 7, 2022. To model the spread of the disease, we consider an exponential growth curve and use LSQ fitting to find the values of the exponential function parameters that minimize the error term between the data points and the curve.
Through iterative adjustment of these parameters and recalculation of the error term until it reaches a specified tolerance, we can obtain the parameter values that best fit the data and make accurate predictions about the spread of COVID-19 in China.
The resulting curve fit, obtained using the free software Scidavis, along with the original data points, can be visualized in a graph to assess the fit and accuracy of the predictions. Through curve fitting, we can effectively model the spread of diseases like COVID-19 and predict their future course. In this example, we estimated that the number of cases would exceed 8000 after two months without any intervention to control the spread of the disease. With more data points in a longer time series, the prediction could be more precise. However, as of January 6th, 2023, just one month after the last data point, the total number of daily cases reached 7000, indicating a more rapid growth than predicted.

It’s important to note that the example we presented is just one way of using curve fitting in epidemic modeling. Different models and approaches may be more appropriate depending on the situation. For instance, some diseases may not follow an exponential growth pattern, in which case different curve fitting methods or models would be more suitable. Additionally, it is essential to consider the limitations and potential biases of the data being used. In this case, the official case count data for China may underestimate the true extent of the outbreak. Despite these limitations, curve fitting can be a powerful tool for understanding and predicting the spread of diseases. It can provide valuable insights to inform public health decisions and interventions.
In the context of pattern recognition and machine learning, the example of using curve fitting to model the spread of COVID-19 in China can be described as follows:
- Dataset: The training dataset used in this example includes the official number of COVID-19 cases reported in China between October 30, 2022 and December 7, 2022 (source: Worldometers, 2022). This dataset is the input data used to train the machine learning algorithm.
- Training Model: The goal of the training model is to model the spread of the disease using an exponential growth curve. The least squares fitting approach is applied to fit the curve to the data points.
- Method: The method used for the training model involves minimizing the sum of the LSQ error. This is done by finding the values of the parameters in the exponential growth curve that minimize the difference between the predicted values and the actual values in the dataset.
- Output: The output of the machine learning algorithm is the best-fitting curve, which is the target function with optimized parameters that describes the underlying trend in the data. The curve represents the predicted evolution of COVID-19 in China based on the input data. The resulting curve, along with the data points, can be visualized in a graph to assess the fit and accuracy of the predictions. Overall, this approach can provide valuable insights for predicting and understanding the spread of diseases, which is crucial for making informed public health decisions and interventions.
5. Conclusion
Curve fitting is a powerful tool in machine learning that enables the identification of patterns in data and creation of predictive models. By fitting curves to data points, we can gain insights into underlying relationships and make informed decisions. Mastering curve fitting techniques is essential for anyone working in machine learning, as it involves identifying a mathematical function that describes trends in the data and making predictions based on it. This process is similar to the principles used to train a neural network for pattern recognition.
In our next article in this series, we will show how a simple NN with one input layer, one hidden layer, and one output layer can approach any smooth function without the need for a target function. In essence, it is curve-fitting where you do not know the target function a priori, and the NN discovers one for you.
Here are some suggestions for further reading on the topics of curve fitting, pattern recognition, and machine learning:
- Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. Cambridge, Massachusetts: MIT Press.
- Pattern Recognition and Machine Learning, Bishop (2006).
- Artificial Intelligence: A Modern Approach, Novig and Russell (2020).
#PatternRecognition #MachineLearning #ArtificialIntelligence #DataAnalysis #CurveFitting #LeastSquaresFitting #MathematicalModeling #Physics #Epidemiology #Economics #Engineering #Medicine #Geology #EnvironmentalScience #COVID19 #DataPrediction #OptimizationAlgorithms #GradientDescent #DataVisualization #PredictiveModeling #MathematicalPatterns

Copyright 2026 AI-Talks.org