Curve fitting tutorial¶
Note
Curve fitting is the simple process of finding a surrogate function approximating a system dynamics, ie curve fitting. For example in epidemiology, cumulative deaths grow slowly at first, accelerate when the epidemics is at the peak and slows down to reach a plateau, this can be approximated using a logistic function.
Developer import¶
%matplotlib inline
%load_ext autoreload
%autoreload 2
# Developer import
import sys
sys.path.append("../../")
On Google Colab¶
Uncomment the following line to install the library locally
# !pip install pyepidemics
Verify the library is correctly installed¶
import pyepidemics
Introduction¶
Introduction to curve fitting¶
Epidemic curves often follow a particular shape :
- Cumulative deaths reach a plateau, we could approximate it with a logistic curve
- Infections, stay in hospital, often reach a peak before declining, we could approximate it with a gaussian or skewed gaussian curve
This is what curve fitting is about, once we have a surrogate function approximating the epidemic curve we can make predictions.
Yet it only works under the assumption that the situation is constant which is totally unrealistic.
However even if the model is wrong, that does not mean is not useful, curve fitting can easily be used at scale on many countries or used to capture the dynamics of epidemics curve to better calibrate compartmental models
Getting some data¶
from pyepidemics.dataset import fetch_daily_case_france
cases = fetch_daily_case_france()
cases[["D","H","ICU"]].plot(figsize = (15,4),title = "COVID19 cases (D=Dead,H=Hospital,ICU)")
plt.show()
Curve fitting in practice¶
from pyepidemics.curve_fitting import logistic_fn,gaussian_fn
from pyepidemics.curve_fitting import CurveFittingModel
Surrogate functions¶
Logistic function¶
A logistic function is useful to represent system dynamics reaching a plateau after an inflexion.
More on the wikipedia page https://en.wikipedia.org/wiki/Logistic_function
Formula is : f(x)=\frac{a}{1+e^{-k(x - \mu)}}
We can use the fonction pyepidemics.curve_fitting.logistic_fn(x,a,mu,k)
# Take x between 0 and 100
x = np.linspace(0,100)
# Prepare plot
plt.figure(figsize = (15,4))
# Plot many functions
for k in [0.05,0.1,0.2,0.5,1]:
plt.plot(logistic_fn(x,k = k,a = 10,mu = 50),label = f"k={k}")
# Display figure
plt.title("Logistic functions")
plt.legend()
plt.show()
Gaussian function¶
Formula is : f(x)=a e^{-0.5(\frac{x-\mu}{\sigma})^2}
We can use the fonction pyepidemics.curve_fitting.gaussian_fn(x,a,mu,sigma)
# Take x between 0 and 100
x = np.linspace(0,100)
# Prepare plot
plt.figure(figsize = (15,4))
# Plot many functions
for sigma in [5,10,20]:
plt.plot(gaussian_fn(x,sigma = sigma,a = 10,mu = 50),label = f"sigma={sigma}")
# Display figure
plt.title("Gaussian functions")
plt.legend()
plt.show()
Fitting surrogate functions to COVID cases¶
We will use the CurveFittingModel
which merely wraps curve_fit
function from scipy in a scikit learn fashion.
from pyepidemics.curve_fitting import CurveFittingModel
Fitting the death cases¶
Let's start by fitting the death cases with a logistic function
model = CurveFittingModel("logistic")
Fitting returns the parameters a,mu,k
of the logistics function
model.fit(cases["D"])
Let's predict for 20 days after the date and visualize the fit
pred = model.predict(20,show_fit = True)
Fitting the ICU cases with gaussian curve¶
model = CurveFittingModel("gaussian")
pred = model.fit_predict(cases["ICU"],show_fit = True)
It works ok, but actually ICU cases are not a proper gaussian curve, we have a step ascend in the number of cases, and a slower decline. Hence what we need here could simply be a skewed guassian
model = CurveFittingModel("skewed_gaussian")
pred = model.fit_predict(cases["ICU"],show_fit = True)
Now we have a much better fit
Advanced users¶
There are a few details you can use as an advanced user, deep dive into the code internals to learn more :
- You can specify the initial parameters p0 of the curve fitting optimization freely by passing
p0
argument to thefit
orfit_predict
function. Actually for the gaussian functions those initial parameters are automatically inferred from the dataset provided to facilitate convergence. - You can actually specify any function to fit in the curve fitting model as shown below
Fitting a custom function¶
Let's create a fake dataset with a simple linear function
# Preparing the true function and the function with noise
example_fn = lambda x,a,b : a * x + b
example_fn_random = lambda x,a,b : example_fn(x,a,b) + np.random.randn(len(x)) * 2
# Parameter we want to estimate
a = 2
b = 1
# Creating fake dataset
x = np.linspace(0,20)
y = example_fn_random(x,a,b)
# Visualizing fake dataset
plt.figure(figsize = (15,4))
plt.scatter(x,y)
plt.show()
And here we are, we have a Linear Regression model
model = CurveFittingModel(example_fn)
pred = model.fit_predict(y)
# Visualizing fake dataset
plt.figure(figsize = (15,4))
plt.scatter(x,y)
plt.plot(x,pred)
plt.show()