An algorithmic method to identify epidemic waves of COVID-19


The COVID-19 pandemic has brought epidemiology to the fore. Outbreaks, epidemic peaks and waves of transmission are all topics of discussion. However, there is no agreed universal definition of these concepts. The phrase “epidemic wave” can refer to anything from a well-defined attribute of a mathematical object to a loosely defined component of a time series. Despite the limitations of the definitions, these descriptive phrases are useful for planning and public health.

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of the pandemic, has been spreading globally since it emerged in Wuhan, China in late December 2019. Non-pharmaceutical interventions (NPI ) have been carried out at varying levels of rigor and speed by governments around the world in an attempt to prevent and reduce the importation and local spread of the virus. Unfortunately, these NPIs often come at a high price. It is therefore essential to determine how to reduce transmission costs as efficiently as possible. Moreover, given the many potential drivers of regional heterogeneities, understanding the epidemic in a single country is difficult; making meaningful comparisons between countries is even more difficult.

In this research paper, a team of scientists from various institutions in the UK and Poland provide contributions aimed at solving this problem. First, the authors clarify the multiple ways researchers use the word “epidemic wave.” Their technique divides epidemic time series (of confirmed cases and deaths) into “observed waves” without overlap. It is emphasized that this is not another definition of an epidemic wave, but rather an exercise in highlighting some of the traits that any viable definition should include. Following this analysis, the authors present a more nuanced interpretation of the data.

A preprint version of this study, which has not yet been peer-reviewed, is currently available on the medRxiv* server.

The study

The algorithm used in this study was applied to all countries for which data was available in the context of COVID-19. By applying the algorithm to time series of cases and deaths, the authors could use cross-validation to account for the confounding effect of shifting case ascertainment and improve the identification of waves of cases.

(A) Choropleth shows the number of days since the emergence of the first cases in China on December 31, 2019, until the cumulative number of deaths in each country exceeded 10. Countries with darker colors exceeded the threshold more sooner than the clearest of countries. After starting in China, outbreaks occurred in Europe, the Middle East and North America before moving to South America, Africa and the Pacific. (B) Scatterplot showing the correlation between the days until the epidemic threshold has been reached in each country versus the GNI per capita for that country showing a negative trend, i.e. the pandemic spread first to countries with higher GNI per capita. Linear regression line in purple with shaded 95% confidence interval (C) Time series of daily number of confirmed cases (left) and deaths (right) per 10,000 population among countries that have evidence of a second wave (light gray), and the 7-day moving median of the average across countries (black line). For each country, the time is taken in relation to the date on which the epidemic took hold.

Only two trends identified are statistically significant at the 5% level. First, more waves are related to longer response time in a pinch (one-tailed Mann-Whitney test suggests that countries with more than one wave responded considerably slower than countries with a single wave , p = 0.0002) and gross national income (GNI) (p 0.0001). The relationship between population density and mortality is not statistically significant.

The descriptions of the waves found are based on the idea that the time series of deaths is a more reliable and consistent indicator of patterns of viral activity than a simple time series of cases. Transmission and testing are the two main drivers of waves in the case incidence time series.

An increase in transmission can trigger a surge, an increase in testing, or a combination of both, if the test regime changes during a transmission surge.

Therefore, it is often not possible to compare the incidence statistics of cases from two successive waves. However, at the very least, the presence or absence of an associated mortality incidence peak can be used to infer the relative difference between drivers. Furthermore, the authors identify a third type of wave on a national scale (spatially asynchronous waves). Countries with this wave typology could benefit from isolating local epidemic curves and developing local response measures.

In Italy, two separate waves of confirmed cases and two separate waves of deaths are occurring at nearly identical times. However, the case-to-death ratio around each peak varies significantly between the first and second waves, implying a downward trend in the case fatality rate (CFR) that requires further examination.

Identification of epidemic waves of COVID-19. A: Zambia shows a clear pattern with two waves (red circles) in the case data, while no wave is identified in the death data. B: UK shows a pattern that could arguably have two or three waves, but sub-algorithm D combines the last two. C: In Ghana, sub-algorithm B filters out an early spike in cases. Visually, it is not clear whether this is noise or a significant epidemiological event; the algorithm can’t do better than the reader to determine this from simple inspection of a graph. No wave of deaths is identified due to the low absolute numbers. D: The number of cases in Costa Rica does not drop by 70% after the first wave, so it is not identified by the algorithm as a wave. This shows how important the Prel parameter can be. However, cross-validation with the death time series helps identify the wave (yellow circle)

In the United States, three waves of cases and deaths are visually perceived, with the algorithm integrating the first two waveforms into a single wave. Again, there is a noticeable disparity between the number of cases and deaths. In this case, investigators noticed regional diversity between waves, with the outbreak concentrating in different locations at different times. This is an illustration of spatially asynchronous waves in action.


It is possible to convert the intuitive visual perception of “waves” of time series into simple mathematical procedures that can annotate many time series by objectively identifying their component waves. These waves can occur due to increased transmission, increased testing, or a combination of both in the context of COVID-19. Also, waves can form as a result of aggregating time series from a large geographic area, so the second wave is actually the first, but for a different part of the country. When performing comparative analyzes of the links between interventions and disease-related mortality, using the wave as the temporal unit of analysis can lead to more precise conclusions. The speed at which interventions are applied is critically related to the wave structure of the next epidemic.

*Important Notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behaviors, or treated as established information.


About Author

Comments are closed.