*By Jose Sealtiel Cruz*

The path forward is in the hands of mathematical predictions.

Numerous public and private organizations have stepped up to the task of collating the Department of Health’s (DOH) COVID-19 data into easy to digest infographics to inform the public of the day-to-day COVID-19 situation, such as the youth-led science organization Earth Shaker, with their daily updates turning them into a social media fixture.

Though rudimentary figures help the public visualize how the country progresses in combatting the pandemic, it offers little help to policymakers in creating rules whose effects will affect the future. For such, they confide to models.

Perhaps the mention of models are better known as predictions, as in the one Harry Roque said the country beat in June. However, the University of the Philippines’ predictions, no matter how commanding, is still a mere prediction made ahead of time.

To understand better, we look at the art – and science – used in making striking predictions the government is keen to beat.

**Creating a math model**

Math models describe situations using mathematical relationships to gain insights and understanding of real problems and forecast future behavior. Contrary to the usual image of mathematics requiring specific equations to bridge a problem to a desired answer – a perspective that is indeed dangerous – math modelling allows room for different approaches to a problem as long as these choices are backed up properly, thus allowing the possibility of wildly different answers.

Though there are many ways to approach a modelling problem, the process of mathematical modelling follows a logical flow [1].

*Defining the problem*

The questions math models seek to answer – or at least shed light on – are very much unlike standard word problems where one is fed pertinent information (except some) and a relationship can be easily drawn. Real world problems are broad, complex, and mostly open-ended; thus, a problem can be attacked in different ways and multiple answers can be valid. Defining the problem helps narrow down the scope of a modelling project.

*Setting assumptions*

Assumptions tell the reader the conditions in which a mathematical model is valid. These will follow naturally after the problem is defined, but some assumptions may emerge as the model is created. Though the term used, assumptions should be reasonable enough to pattern a real, complex behavior as closely as possible while keeping the model simple if actual data is unavailable or inaccessible.

*Defining variables*

A variable, research-wise, is anything that varies in quantity or quality. Bliss et al (2014) put bluntly that “[t]he purpose of a [mathematical] model is to predict or quantify something of interest.” A variable may be *independent*, which is subject to the modeler’s control; *dependent*, which changes as the independent variable changes; and *constant*, which is predefined. The model aims to derive a relationship between the variables of interest to gain some form of insight to answer a problem.

*Finding solutions*

Math is, of course, an inevitable part of modelling. Though mathematical modelling allows for creative approaches, the modeler should be certain of the approach they would use, depending on many factors: number of variables, nature of the equations, nature of available data (which in itself should be processed before use), choosing between analytical or numerical methods, the viability of approximations, and the type of representation are among the usual modelling questions. Some approaches may be widely studied and available, while some call for novel approaches. There are numerous available software to at least ease the task of creating a mathematical model.

*Analyzing the model*

Though it may take hard work and a lot of time to produce a model, its accuracy cannot be sacrificed. Easily, the model should make sense: the values should make sense with the nature of your variables (for instance, a model of people infected by a disease cannot reach negative values); the model’s behavior when graphed is what is expected; and the resulting values should match, to a great degree of certainty, real data (if available). The model’s strengths and weaknesses can also be brought up here.

*Communicating answers*

A model is only as effective as how it is communicated to its intended audience. Insights derived from a model should be reported in a way advantageous to the manner it will be delivered.

When done properly, a mathematical model can be a powerful tool to communicate solutions to a problem. It is used *ad nauseam *across many fields and industries to help make informed decisions that may make or break an institution’s future, and has been continuously tapped to shed light on the way forward in dealing with the current pandemic.

**Modelling a disease**

There are plenty of ways to model a disease to account for nuances in both the disease and the population covered by the project, but it is usually built upon the basic principles of one model: the SIR model.

An SIR model divides a population into three classifications as a function of time *t*: **s**usceptible individuals (S), **i**nfected individuals (I), and **r**ecovered individuals (R). It assumes that individuals form a single, large population and interacts at random with each other, which allows the disease to initially spread exponentially [2]. The simplest SIR models also assume that

- an infectious person can pass on a disease to susceptible individuals at a constant rate; and
- infected persons recover at a rate that is dependent to how long a disease stays in their body, which is also constant.

It is expected that the total of susceptible, infected, and recovered individuals at any given time is constant. To relate all the said variables, we turn to differential equations.

A differential equation is, in its simplest, an equation involving a function and its derivative – a derivative is the rate at which something, in our case the number of individuals under any of the three groups, changes. For its sake, an SIR model’s differential equations can be found here.

The differential equations may be simple enough to be solved analytically – that is, a specific equation can be obtained that can give out a number for any given time *t*. An example of a function for infected individuals over time can be found here at page 29.

However, if analytical solutions are unavailable, a graph can still be produced using numerical methods, which approximates an analytic curve. This can be done with COVID cases as it is reported daily – thus, the time difference between any two consecutive figures is fixed (one day). Any of the two approaches should create a graph that can be then analyzed, though of different accuracies.

There are numerous iterations of the SIR model, such as the SEIR model which includes a new **e**xposed individuals group to account that

- a person may be exposed to a disease by an infectious individual, and
- that the disease will take time to incubate (rings a bell?) before it becomes infectious;

and variations such as SIRS and SEIRS where a person who recovered from the disease may be susceptible to contracting the disease again. It can also be further modified to account for natural birth and death rates (see here) for a disease that stays within a population for extended periods of time.

The SIR model, along with the others mentioned, are considered as *compartmental*, as it divides a population into groups based on their infection status; and *deterministic*, where no randomness is involved nor accounted for. The latter characteristic of SIR models is its major flaw: after all, there is some form of randomness in a real system – in this case, a human population.

**Simulating reality**

It must be remembered that a modelling problem has multiple viable solutions – thus, in the pursuit to “accurately” model the country’s COVID-19 figures, one can also look at multiple solutions.

Perhaps the easiest way out is to model smaller territories instead of the national population. This is plausible and is currently done in cities and provinces, however it is egregious in terms of manpower and the sheer amount of data needed to be collated for a macroscopic (national) view of the pandemic.

More categories can also be added to match how the country groups people in the pandemic. This may include groups of mild and asymptomatic individuals (different from exposed individuals) and isolated individuals who can no longer spread the disease. This is also done by the UP COVID-19 response team (see the documentation here). However, this addition alone does not address the completely random manner of travelling of the people in the population.

The random interaction assumption is always underlined. It can be clear why this is emphasized with a simple thought: imagine an individual physically interacting with someone in Mindanao. Also be wary that, naturally, a person will interact more with their families, loved ones, or with people in their place of work (issues in deterministic math models are explained in depth in this article).

The next best choice is to add “noise.”

A stochastic model takes other variables that may cause “randomness,” such as births, deaths, geographic location, mobility, time spent in a place, and others. It can also involve considering that the basic ratios used in deterministic models – infection rates, recovery rates, and re-infection rates – can change over time.

There are many unique methods in adding “noise,” or variability, in a model such as stochastic differential equations where one (or more) of the equations is a stochastic process (produces a collection of random variables indexed by time) and chaos, among an exhaustive list of feasible methods. Stochastic modelling is especially powerful when there is a relatively small number of infected people to record.

Note that, hypothetically, a stochastic model approximates a suitable deterministic model at infinite populations for processes that involve jumping from one state to another (a fairly exhaustive mathematical proof is available here) – thus it applies for COVID-19 models as an individual can “jump” from one category to another.

Despite the amount of thought used over creating a model for COVID-19 infections, it can still be overthrown with ease.

**The limits of forecasting**

UP has used a model to depict with as much accuracy the COVID-19 situation in the country through their model as documented in the endcov.com website and implemented in Berkeley-Madonna, a powerful differential equations solver; besides deploying the model in smaller scales; obtaining other necessary ubiquitous figures such as the reproduction number; and monitoring medical capacity.

The University of Santo Tomas (UST) has adapted the DELPHI model that was conceptualized in the Massachusetts Institute of Technology (documented here), which also added more subgroups in the SEIR model, making key parameters vary over time, and quantified government response; and has likewise offered recommendations to the Inter-Agency Task Force against COVID-19 (IATF).

However, there is a very real possibility that models become inaccurate at longer time frames, much like weather forecasts becoming more unreliable in predicting the weather a week later compared to a day after.

UP has tweaked their predictions numerous times. For instance, their initial analysis last April for the National Capital Region produced wildly varied results: from a low of 140,000 predicted to peak at April-June to a high of 550,000 predicted to peak by April.

The projection was tweaked on July 18 to be 85,000 by July-end, from a lower 60,000 projection made before the end of June. In the time between the two projections, the country has seen an average of 2,000 additional COVID-19 cases per day.

The latest projection stands on an average of almost 180,000 cases by the end of August as the country watched COVID-19 cases breached 100,000 while registering a new single-day high of 6,352 on August 4 and the medical community sending their pleas of help to the government for a “time-out.”

Plainly, a model is only accurate when the parameters it held constant remain similar over the period it predicts.

While these changes affect future forecasts, some developments can completely throw off even the most experienced statistician. Consider how DOH reported 38,075 recoveries on July 30 after redefining how individuals with COVID-19 are declared “recovered,” bringing down the number of active cases by more than half and consequently receiving flak from netizens.

However, the highest hurdle in painting an accurate picture of what’s to come is the quality of DOH’s data. UP School of Statistics professor Peter Cayton, a proponent of the university’s pandemic response team, has been vocal in his pleas for open data, open science, and open methodology to improve both reporting and forecasting of cases and in pointing out the blatant errors of the data released by DOH, such as the lack of recorded recovery dates of individuals with COVID – apparently, only 12,297 recoveries have recorded dates of recovery.

George Box, a celebrated statistician, posited that “all models are wrong” – in the sense that all of them are approximations. The better question to ask now is if “the model [is] illuminating and useful”; if used at all, that is.

References

[1] *Math Modeling: Getting Started and Getting Solutions, K. M. Bliss, K. R. Fowler, and B. J. Galluzzo, SIAM, Philadelphia, 2014.*

[2] Chowell, G., Sattenspiel, L., Bansal, S., & Viboud, C. (2016). Mathematical models to characterize early epidemic growth: A review. *Physics of life reviews*, *18*, 66-97.