Study reveals flaws in popular genetic method


Scientific reports (2022). DOI: 10.1038/s41598-022-14395-4″ width=”800″ height=”527″/>

Assessing the accuracy of PCA clustering for a heterogeneous test population in a simulation of a GWAS environment. (A) The true distribution of the Cyan test population (n=1000). (B) PCA of the test population with eight samples of equal size (n = 250) from reference populations. (C) PCA of the test population with Blue from the previous analysis shows minimal overlap between cohorts. (D) PCA of the test population with five samples of equal size (n = 250) from reference populations, including Cyan (marked with an arrow). Colors (B) from top to bottom and left to right include: Yellow [1,1,0]light red [1,0,0.5]Purple [1,0,1]dark purple [0.5,0,0.5]Black [0,0,0]dark green [0,0.5,0]Green [0,1,0]and blue [1,0,0]. Credit: Scientific reports (2022). DOI: 10.1038/s41598-022-14395-4

The most common method of analysis in population genetics is deeply flawed, according to a new study from Lund University in Sweden. This may have led to incorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting the results of medical genetics and even commercial ancestry testing. The study is published in Scientific reports.

The rate at which scientific data can be collected is increasing exponentially, leading to massive and highly complex datasets, dubbed the “Big Data revolution”. To make this data more manageable, researchers use statistical methods that aim to compact and simplify the data while retaining most of the key information. Perhaps the most widely used method is called PCA (Principal Component Analysis). By analogy, think of PCA as an oven with flour, sugar, and eggs as input data. The oven can always do the same thing, but the result, a cake, depends mainly on the proportions of the ingredients and how they are combined.

“This method is expected to give correct results because it is so frequently used. But this is neither a guarantee of reliability nor statistically robust conclusions,” says Dr. Eran Elhaik, Associate Professor of Molecular Cell Biology at Lund University.

According to Elhaik, the method helped create old perceptions about race and ethnicity. It plays a role in shaping historical narratives about who and where people came from, not only by the scientific community but also by commercial ancestry companies. A famous example is when a prominent American politician took an ancestry test ahead of the 2020 presidential campaign to support his ancestral claims. Another example is the misconception of Ashkenazi Jews as an isolated race or group driven by APC results.

“This study demonstrates that these results were not reliable,” says Eran Elhaik.

PCR is used in many fields of science, but Elhaik’s study focuses on its use in population genetics, where the explosion in the size of datasets is particularly acute, due to the reduced costs of sequencing. DNA.

The field of paleogenomics, where we want to learn more about ancient peoples and individuals such as Copper Age Europeans, relies heavily on PCA. PCR is used to create a genetic map that positions the unknown sample alongside known reference samples. Until now, unknown samples have been assumed to be related to the reference population with which they overlap or lie closest on the map.

However, Elhaik discovered that the unknown sample could be made to lie near virtually any reference population simply by changing the number and types of the reference samples, generating virtually endless historical versions, all mathematically “correct”, but only one can be biologically correct. .

In the study, Elhaik looked at the twelve most common population genetic applications of PCA. He used both simulated and real genetic data to show how flexible PCA results can be. According to Elhaik, this flexibility means conclusions based on PCA are unreliable because any changes to the reference or test samples will produce different results.

Between 32,000 and 216,000 scientific papers on genetics alone have used PCA to explore and visualize similarities and differences between individuals and populations and have based their conclusions on these results.

“I think these results need to be re-evaluated,” Elhaik says.

He hopes the new study will develop a better approach to questioning the findings and thus help make the science more reliable. He has spent a significant part of the last decade developing such methods as Geographic Population Structure (GPS) to predict biogeography from DNA, and the Pairwise Matcher, which improves case matches. -controls used in genetic testing and drug testing.

“Techniques that offer such flexibility encourage bad science and are especially dangerous in a world of intense pressure to publish. If a researcher runs PCA multiple times, the temptation will always be to select the output that makes the best story,” adds Professor William Amos, from the University of Cambridge, who was not involved in the study.

Researchers develop first AI-based method to date archaeological remains

More information:
Eran Elhaik, results based on principal component analyzes (PCA) in population genetic studies are highly biased and need to be re-evaluated, Scientific reports (2022). DOI: 10.1038/s41598-022-14395-4

Provided by Lund University

Quote: Study Reveals Flaws in Popular Genetic Method (2022, August 30) Retrieved August 31, 2022 from

This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.


About Author

Comments are closed.