Quick, if you had to guess, what do you think is more likely to end all life on Earth: a meteor impact, climate change, or a solar flare? (Choose carefully.)

A new statistical method could help to accurately analyze the risk of worst (or best) case scenarios. Scientists have announced a new way to unravel information about rare, but very important events, such as pandemics and insurance payouts.

This discovery helps statisticians use mathematics to determine the shape of the underlying distribution of a data set. It can help everyone from investors to government officials make informed decisions, and is especially useful when data is scarce, such as with major earthquakes.

“While by definition rare, such events do happen and they matter; we hope this is a useful set of tools to better understand and calculate these risks,” said mathematical biologist Joel Cohen, co-author of a new study published in November. 16 in the Proceedings of the National Academy of Sciences. A visiting scholar in the Department of Statistics at the University of Chicago, Cohen is a professor at Rockefeller University and the Earth Institute at Columbia University.

Vary the questions

Statistics is the science of using limited data to learn more about the world and the future. His questions range from “What is the best time of year to spray pesticides on a crop field?” to “How likely is a global pandemic to interrupt large swaths of public life?”

A century old, the statistical theory of rare but extreme events is a relatively new field, and scientists are still cataloging the best ways to deal with different types of data. Calculation methods can significantly affect conclusions, so researchers must carefully adjust their approaches to the data.

Two powerful tools in statistics are the mean and the variance. You probably know the average; if a student scores 80 on a test and a student scores 82, their average score is 81. Variance, on the other hand, measures the range of these scores: you will get the same average if a student scores 62 and the d others got 100, but the classroom implications would be very different.

In most situations, the mean and variance are finite numbers, like the situation above. But things get weirder when you look at events that are very rare, but have huge consequences when they happen. Most years, there isn’t a gigantic burst of activity on the sun’s surface large enough to fry all of Earth’s electronics, but if it did happen this year, the results could be catastrophic. Likewise, although the vast majority of tech startups die out, sometimes a Google or a Facebook happens.

“There’s a category where big events happen very rarely, but often enough to drive the mean and/or variance toward infinity,” Cohen said.

These situations, where the mean and variance approach infinity as more and more data is collected, require their own special tools. And understanding the risk of these types of events (known in statistical parlance as “heavy-tailed distribution” events) is important to many people. Government officials need to know how much effort and money to invest in disaster preparedness, and investors want to know how to maximize returns.

Cohen and his colleagues looked at a recently used mathematical method for calculating risk, which divides the variance in the middle and calculates the variance below the mean and above the mean, which can give you more information about the downside risks and upside risks. For example, a technology company may be much more likely to fail (i.e., end up below average) than succeed (end up above average), which an investor might like to know when considering investing. But the method had not been tested for distributions of low probability, very high impact events with infinite mean and variance.

By carrying out tests, the scientists discovered that the standard methods of working with these figures, called semi-variances, do not provide much information. But they found other ways that worked. For example, they might extract useful information by calculating the ratio of the logarithm of the mean to the logarithm of the semi-variance. “Without the logs, you get less useful information,” Cohen said. “But with logs, the limiting behavior for large data samples gives you insight into the shape of the underlying distribution, which is very useful.” This information can help inform decision-making.

The researchers hope this will lay the groundwork for new and better exploration of risk.

“We think there are practical applications for financial mathematics, for agricultural economics and potentially even for epidemics, but because it’s so new, we don’t even know what the most useful areas might be,” Cohen said. “We just opened up this world. It’s just the beginning.”

Financial Crashes, Pandemics, Snow in Texas: How Mathematics Could Predict “Black Swan” Events