Skip to content

Curve fitting tutorial

Open In Colab

Note

Curve fitting is the simple process of finding a surrogate function approximating a system dynamics, ie curve fitting. For example in epidemiology, cumulative deaths grow slowly at first, accelerate when the epidemics is at the peak and slows down to reach a plateau, this can be approximated using a logistic function.

Developer import
%matplotlib inline
%load_ext autoreload
%autoreload 2

# Developer import
import sys
sys.path.append("../../")
On Google Colab

Uncomment the following line to install the library locally

# !pip install pyepidemics
Verify the library is correctly installed
import pyepidemics

Introduction

Introduction to curve fitting

Epidemic curves often follow a particular shape :

  • Cumulative deaths reach a plateau, we could approximate it with a logistic curve
  • Infections, stay in hospital, often reach a peak before declining, we could approximate it with a gaussian or skewed gaussian curve

This is what curve fitting is about, once we have a surrogate function approximating the epidemic curve we can make predictions.
Yet it only works under the assumption that the situation is constant which is totally unrealistic.

However even if the model is wrong, that does not mean is not useful, curve fitting can easily be used at scale on many countries or used to capture the dynamics of epidemics curve to better calibrate compartmental models

Getting some data

from pyepidemics.dataset import fetch_daily_case_france

cases = fetch_daily_case_france()
cases[["D","H","ICU"]].plot(figsize = (15,4),title = "COVID19 cases (D=Dead,H=Hospital,ICU)")
plt.show()

Curve fitting in practice

from pyepidemics.curve_fitting import logistic_fn,gaussian_fn
from pyepidemics.curve_fitting import CurveFittingModel

Surrogate functions

Logistic function

A logistic function is useful to represent system dynamics reaching a plateau after an inflexion.
More on the wikipedia page https://en.wikipedia.org/wiki/Logistic_function

Formula is : f(x)=\frac{a}{1+e^{-k(x - \mu)}}

We can use the fonction pyepidemics.curve_fitting.logistic_fn(x,a,mu,k)

# Take x between 0 and 100
x = np.linspace(0,100)

# Prepare plot
plt.figure(figsize = (15,4))

# Plot many functions
for k in [0.05,0.1,0.2,0.5,1]:
    plt.plot(logistic_fn(x,k = k,a = 10,mu = 50),label = f"k={k}")

# Display figure
plt.title("Logistic functions")
plt.legend()
plt.show()

Gaussian function

Formula is : f(x)=a e^{-0.5(\frac{x-\mu}{\sigma})^2}

We can use the fonction pyepidemics.curve_fitting.gaussian_fn(x,a,mu,sigma)

# Take x between 0 and 100
x = np.linspace(0,100)

# Prepare plot
plt.figure(figsize = (15,4))

# Plot many functions
for sigma in [5,10,20]:
    plt.plot(gaussian_fn(x,sigma = sigma,a = 10,mu = 50),label = f"sigma={sigma}")

# Display figure
plt.title("Gaussian functions")
plt.legend()
plt.show()

Fitting surrogate functions to COVID cases

We will use the CurveFittingModel which merely wraps curve_fit function from scipy in a scikit learn fashion.

from pyepidemics.curve_fitting import CurveFittingModel

Fitting the death cases

Let's start by fitting the death cases with a logistic function

model = CurveFittingModel("logistic")

Fitting returns the parameters a,mu,k of the logistics function

model.fit(cases["D"])
array([1.75350267e+04, 7.87308460e+01, 1.16910544e-01])

Let's predict for 20 days after the date and visualize the fit

pred = model.predict(20,show_fit = True)

Fitting the ICU cases with gaussian curve

model = CurveFittingModel("gaussian")
pred = model.fit_predict(cases["ICU"],show_fit = True)

It works ok, but actually ICU cases are not a proper gaussian curve, we have a step ascend in the number of cases, and a slower decline. Hence what we need here could simply be a skewed guassian

model = CurveFittingModel("skewed_gaussian")
pred = model.fit_predict(cases["ICU"],show_fit = True)

Now we have a much better fit

Advanced users

There are a few details you can use as an advanced user, deep dive into the code internals to learn more :

  • You can specify the initial parameters p0 of the curve fitting optimization freely by passing p0 argument to the fit or fit_predict function. Actually for the gaussian functions those initial parameters are automatically inferred from the dataset provided to facilitate convergence.
  • You can actually specify any function to fit in the curve fitting model as shown below

Fitting a custom function

Let's create a fake dataset with a simple linear function

# Preparing the true function and the function with noise
example_fn = lambda x,a,b : a * x + b
example_fn_random = lambda x,a,b : example_fn(x,a,b) + np.random.randn(len(x)) * 2

# Parameter we want to estimate
a = 2
b = 1

# Creating fake dataset
x = np.linspace(0,20)
y = example_fn_random(x,a,b)

# Visualizing fake dataset
plt.figure(figsize = (15,4))
plt.scatter(x,y)
plt.show()

And here we are, we have a Linear Regression model

model = CurveFittingModel(example_fn)
pred = model.fit_predict(y)

# Visualizing fake dataset
plt.figure(figsize = (15,4))
plt.scatter(x,y)
plt.plot(x,pred)
plt.show()