## Data-driven COVID-19 case forecasts according to Ontario public health units (PHUs)

**Background**This daily report presents forecasts of the number of COVID-19 cases reported in each of the 34 public health units (PHUs) in Ontario, Canada as well as for
the province. The projections for each of the 34 PHUs and for Ontario can be viewed using the dropdown menu below. The fitting results and forward projection are provided, as
well as the parameter estimates for the mathematical model underlying the projections. An outline of the methodology used to produce the forecasts is included below.

## Methods

We fitted the exponential growth model y(t) = y_{0}e^{rt} to the daily reported cases in each of the 34 PHUs. We used 14 days of data (n = 14 data points), or 2 weeks, of daily reported case data to estimate the model parameters. For the parameter estimation, we used the maximum likelihood method with negative binomial likelihood. The method is as follows:

First, the negative binomial probability distribution function (pdf) is given by

NB(k|n, p) = \binom{k + n - 1}{n - 1} p^{n} (1 - p)^{k}.

We, instead, utilized the parameterization of the negative binomial distribution in terms of its mean *μ* and variance \theta \mu,
i.e.,

p(\theta) = \frac{1}{\theta}, n(\mu, \theta) = \frac{\mu}{\theta - 1}.

The mean value of y_{0} was assumed to be growing exponentially, that is, \mathbb{E}(y(t)) = y_{0}e^{rt}. With these assertions in place, the likelihood function over an observation period of n days is then

L(y |y_{0}, r, \theta) = \prod^{n}_{i=1}NB(y(t)|n(y_{0}e^{rt}, \theta), p(\theta)) = \prod^{n}_{i=1}NB(y(t), \frac{y_{0}e^{rt}}{\theta - 1}, \frac{1}{\theta}).

For each PHU, we maximized the negative log likelihood using the Python package scipy's functions *fmin* and *gammaln* (to emulate the optimization procedure *fmincon *in
Matlab) to estimate the parameters y_{0}, r, \theta. Note that in addition to the parameters associated with the
exponential model y_{0} and exponential growth rate r, the
dispersion parameter associated with the negative binomial distribution \theta was also estimated.

Using the parameters obtained y_{0}, r, \theta, we constructed N = 3000 re-sampled time series all of identical length n = 14. For each re-sampling, we performed the parameter estimation process specified above to obtain a corresponding estimate of y_{0}, r, \theta. We therefore obtained a set of parameter N estimates, which gave an empirical distribution of the parameters and a corresponding set of N exponential projections for the future.

The final projection for each PHU was obtained by taking, for each day in the future, the mean of the set of projections obtained through bootstrapping, and 95% confidence interval obtained by taking the 0.025 and 0.975 quantiles of the same distribution. This process was replicated for each of the 34 PHUs in Ontario to produce the projections. This methodology was first published in [3] and was used to accurately predict the peak time and case numbers for the 1st wave of COVID-19 in Ontario.

## The estimation of the reproduction number R from r

To compute the reproduction number R, we used estimates for the growth/decline rate r. The calculation is based on the relationship between r and R through the Euler-Lotka equation\begin{equation} \frac{1}{R} = \int_{0}^{\infty} e^{-r \tau} g(\tau)d\tau,\end{equation}

where g(\tau) is the probability distribution function for the generation interval [1]. To simplify the calculations, we first introduce notation M(z) to denote the moment generating function of the distribution g(\tau): \begin{equation} M(z) = \int_{0}^{\infty} e^{z\tau} g(\tau)d\tau \end{equation}

Now, we write Equation (1) in terms of the moment generating function of g(\tau) to find: \begin{equation} R = \frac{1}{M(-r)} \end{equation}

We considered the special case of COVID-19, where the gamma distribution has been generally adopted to model the generation interval. The gamma distribution with rate parameter b and shape parameter n has moment generating function \begin{equation} M(z) = {\left(\frac{b}{b-z}\right)}^{n} \end{equation}

From Equations (3) and (4), we find the relationship between the reproduction number R, growth rate r and the parameters associated with the generation interval distribution: \begin{equation} R = {\left(1+\frac{r}{b}\right)}^{n} \end{equation}

We then used the following equation as an estimator of the reproduction number \hat{R} \begin{equation} \hat{R} = {\left(1+\frac{r}{b}\right)}^{n} \end{equation}

To assess the reproduction number R using the estimated growth rate r, we utilized Equation (6) informed by two estimates of the generation interval distribution. The two estimates of the generation interval distributions used had mean 5.2 days (SD 1.72 days) and estimated mean 3.95 days (SD 1.51 days) [2]. Using the relationship between the gamma distribution shape and scale parameters to its median and variance, we computed n,b for each generation interval estimate. We found

n_{1} = \frac{{5.2}^2}{{1.72}^2}, b_{1} = \frac{5.2}{{1.72}^2} \text{ and } n_{2} = \frac{{3.95}^2}{{1.51}^2}, b_{2} = \frac{3.95}{{1.51}^2}.

We utilized both estimates of the generation interval distribution to reflect different plausible estimates of the reproduction number R. Each estimate of the growth/decline rate r was passed to the estimator, Equation (6) to compute the mean and 95% confidence interval of R.

Generation Interval | Reproduction number R, median (95% CI) |
---|---|

Mean 5.2 days (SD 1.72 days) [2] | |

Mean 3.95 days (SD 1.51 days) [2] |

## The estimation of the doubling/halving time

In the case of increasing daily cases (r > 0), the doubling time t_{d} is the estimated number of days it will take for the daily incidence to double. In the context of case decline (r < 0), the halving time t_{h} is estimated elapsed time in days for the daily incidence to be half of its current value. We calculated the doubling/halving time using the following formula:

t_{d} = \frac{ln (2)}{r}.

To obtain the median and 95% CI, we passed each estimate of the growth/decline rate through the equation above. The resultant doubling/halving time was estimated to be days.

## Assumptions and limitations

The main modelling assumption is that the number of newly reported cases each day is generated by an exponentially growing mean. This is combined with the negative binomial probability distribution function to construct the likelihood function over the observations during the fitting window.

We utilized the number of reported cases of COVID-19 in each PHU in this analysis, which may be impacted by factors including variations in testing. Reported case data is also subject to reporting delays, which we have partially addressed by omitting the most recent two days of case data in the model fitting. Also, the projections may not fully reflect the impacts of either the recent implementations or relaxations of public health interventions, given the delay between transmission events and case diagnosis as well as the model fitting methodology utilized. In this light, it is recommended that care is taken when interpreting the projections. The estimation of the growth rate is sensitive to large fluctuations in case counts, potentially resulting in wide CIs for the estimated reproduction number and doubling time. This sensitivity is particularly relevant for PHUs with typically low daily case counts.

## Data

To parameterize the model, we utilized the line listed data of confirmed positive cases of COVID-19 in Ontario according to public health unit. The time series of the daily cases of COVID-19 in each of the 34 PHUs in Ontario was generated using individual line listed data from the Ontario Ministry of Health, which was made available to us through the Ontario COVID-19 Modeling Consensus Table. This main source of data enabled the fitting of mathematical model and all subsequent projections.

## References

- Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007 Feb 22;274(1609):599-604.
- Ganyani T, Kremer C, Chen D, Torneri A, Faes C, Wallinga J, Hens N. Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data, March 2020. Eurosurveillance. 2020 Apr 30;25(17):2000257.
- Francesca Scarabel, Lorenzo Pellis, Nicola Luigi Bragazzi, Jianhong Wu, Canada needs to rapidly escalate public health interventions for its COVID-19 mitigation strategies, Infectious Disease Modelling, Volume 5, 2020, Pages 316-322, ISSN 2468-0427, https://doi.org/10.1016/j.idm.2020.03.004.