Skip to content

Calibration tutorial

Open In Colab

Note

Estimating the epidemiological parameters on real data is called *calibration*, in this tutorial we will see how pyepidemics can be used for such purpose. We will see how to calibrate a COVID19 model on French data

Developer import
%matplotlib inline
%load_ext autoreload
%autoreload 2

# Developer import
import sys
sys.path.append("../../")
On Google Colab

Uncomment the following line to install the library locally

# !pip install pyepidemics
Verify the library is correctly installed
import pyepidemics

Getting data

Some data helpers have been built in a scikit learn fashion to help you easily fetch data. For now it's only for French cases data, but feel free to contribute to add more data sources.

from pyepidemics.dataset import fetch_daily_case_france

We already have the compartments ready, cases have been also interpolated and smoothed to account for reporting errors

cases = fetch_daily_case_france()
cases.head()
cas_confirmes deces gueris hospitalises reanimation Is D R H ICU
2020-01-24 3.0 0.0 0.0 3.0 0.0 2.761419 0.0 0.0 2.761481 0.0
2020-01-25 3.0 0.0 0.0 3.0 0.0 2.964313 0.0 1.0 2.964423 1.0
2020-01-26 3.0 0.0 0.0 3.0 0.0 3.219465 0.0 1.0 3.219620 1.0
2020-01-27 3.0 0.0 0.0 3.0 0.0 3.586868 0.0 1.0 3.587032 1.0
2020-01-28 4.0 0.0 0.0 4.0 1.0 4.106566 1.0 1.0 4.106629 1.0
cases[["D","R","H","ICU"]].plot(figsize = (15,4),title = "Real data France");

First COVID19 model without calibration

We will use a COVID19 differential equation model with a prodromal phase and 3 levels of symptoms. This model has been validated by epidemiologists and is similar to other researchers' model like the one from INSERM.

Most parameters have been initialized by default according to INSERM and Pasteur research paper, other parameters are as followed.

As shown in other tutorials we will simulate the French lockdown with a \beta going from beta_high to beta_low at the date 53 (First data is available for France on 2020-01-24 and lockdown started on 2020-03-17, ie 53 days later)

N = 67e6 # French population
beta_high = 0.9 # R0 around 3.3
beta_low = 0.16 # R0 around 0.6
beta = [beta_high,[beta_low],[53]] # We go from R0 around 3.6 to O.6 during lockdown starting at day 53
from pyepidemics.models import COVID19Prodromal

model = COVID19Prodromal(
    N = N,
    beta = beta,
    I0 = 5,
)

To tweak all other parameters you can see the help

help(model.__init__)
Help on method __init__ in module pyepidemics.models.examples.covid_prodromal:

__init__(N, beta, incubation_rate=0.27027027027027023, prodromal_rate=0.6666666666666666, recovery_rate_asymptomatic=0.3125, recovery_rate_mild=0.3125, symptoms_to_hospital_rate=0.18181818181818182, symptoms_to_icu_rate=0.14285714285714285, proba_severe=0.071, proba_asymptomatic=0.2, proba_icu=0.182, recovery_rate_hospital=0.046, recovery_rate_icu=0.035, death_rate_hospital=0.0046, death_rate_icu=0.0087, isolation_ratio=0.25, I0=1, **kwargs) method of pyepidemics.models.examples.covid_prodromal.COVID19Prodromal instance
    COVID19 Compartmental Model

    Parameters:
    Default params are set according to INSERM research paper


Let's make a prediction

states = model.solve(n_days = 200)
states[["H","ICU","D"]].show(plotly = False)

We will want to compare the curves to the real data, this can be done easily with the following functions

model.show_prediction(cases[["H","ICU","D"]])

The model is just ok, the peak and scale are not that far, but the global dynamics is not close enough to hope to use it for predictions. That's why we need calibration, we will iterate over the set of possible parameters to find a better fit.

We can also observe the reproduction factor R0, which is little low compared to estimates of around 3.3 by epidemiologists.

model.R0(0.9)
2.5423081997918837

Calibrating the COVID19 model on France data

Preparing for calibration

To extend a model for calibration, we need to subclass the model to add a .reset() function explaining how to initialize a model with a tested set of parameters

N = 67e6

class CalibratedCOVID19(COVID19Prodromal):
    def __init__(self,params = None):
        if params is not None:
            self.reset(params)

    def reset(self,params):        
        beta = [params["beta_high"],[params["beta_low"]],[53]]
        super().__init__(N = N ,beta = beta,**params)
        self.calibrated_params = params

model = CalibratedCOVID19()

We also need a space of parameters to test, those range are done accordingly to research papers on the topic for France data, it should be adapted to other situations.
Some parameters like probabilities at the hospital are not calibrated because we are quite confident in their literature values.

space = {
    "beta_low":(0.01,0.4),
    "beta_high":(0.6,1.5),
    "I0":(0,100),
    "recovery_rate_asymptomatic":(1/8,1/2),
    "recovery_rate_mild":(1/8,1/2),
    "death_rate_hospital":(0.006,0.02),
    "death_rate_icu":(0.006,0.02),
    "recovery_rate_icu":(0.02,0.08),
    "recovery_rate_hospital":(0.02,0.08),
}

Finally because we know the reproduction factor supposed value for COVID19 we can also pass a constraint callback to help the search of parameters. This constraint will augment the loss if the R0 is not in the intended range.

def constraint(model,loss):
    R0 = model.R0(model.calibrated_params["beta_high"])
    if R0 < 3 or R0 > 3.4:
        return loss*10
    else:
        return loss

Calibration with bayesian optimization

Once everything is setup you can use the calibration helper function. In a scikit learn fashion the function is simply called .fit()
Within this function, a bayesian optimization search process is run the same way you would look for the optimal hyperparameters in a Machine Learning problem. This is done with the Optuna library

We will fit on 3 epidemic curves : deaths (D), ICU and hospital (H) because the other data points such as number of confirmed cases are not reliable because of asymptomatic cases and testing policies. The loss function will be a composition of the RMSE loss for each curve.

We will run for n iterations, with an early stopping value is the best set of parameters has not changed for a while. You can try running for a longer period of time to see if you find better parameters.

model.fit(
    cases[["D","H","ICU"]],
    space,
    n = 200,
    early_stopping = 100,
#     constraint = constraint,
)
C:\ProgramData\Anaconda3\lib\site-packages\optuna\_experimental.py:111: ExperimentalWarning:

HyperbandPruner is experimental (supported from v1.1.0). The interface can change in the future.


... Early stopping - best value has not changed since 100 trials at 0.2330863710622714
... Found best solution {'beta_low': 0.014302655959612443, 'beta_high': 0.8145837089021701, 'I0': 13.112422954306595, 'recovery_rate_asymptomatic': 0.4132953732704384, 'recovery_rate_mild': 0.1278943950841368, 'death_rate_hospital': 0.009513039512692313, 'death_rate_icu': 0.010200156298856395, 'recovery_rate_icu': 0.027412014833810148, 'recovery_rate_hospital': 0.026502178667788198} for value 0.2330863710622714
... Parameters saved in yaml file calibration_params_1591011003.yaml

Visualizing the results

We can again compare the prediction with real data

model.show_prediction(cases[["H","ICU","D"]])

We can also better understand the internals of the optimization by visualizing the set of tested parameters in a parallel coordinates graph

model.opt.show_parallel_coordinates()

You can also estimate confidence interval by sampling among the tested parameters in the bayesian optimization

params_dist = model.opt.estimate_params_distributions(summary = True)
params_dist
best min max mu_norm std_norm min_norm max_norm
value 0.233086 0.233390 0.277721 0.247968 0.015985 0.239976 0.255961
I0 13.112423 7.047305 53.386638 32.130874 19.119277 22.571236 41.690513
beta_high 0.814584 0.663269 0.808431 0.730971 0.053130 0.704406 0.757536
beta_low 0.014303 0.015712 0.085959 0.051239 0.025229 0.038625 0.063854
death_rate_hospital 0.009513 0.008810 0.011041 0.009814 0.000718 0.009455 0.010173
death_rate_icu 0.010200 0.009105 0.012007 0.010289 0.000993 0.009792 0.010786
recovery_rate_asymptomatic 0.413295 0.139529 0.393604 0.243902 0.087005 0.200400 0.287405
recovery_rate_hospital 0.026502 0.024059 0.042127 0.029477 0.006597 0.026178 0.032775
recovery_rate_icu 0.027412 0.025075 0.058566 0.043626 0.013692 0.036781 0.050472
recovery_rate_mild 0.127894 0.156650 0.401347 0.332942 0.088425 0.288730 0.377154