Reproducibility and replicability are major concerns for chemists and life scientists involved in the quality control of products produced at various sites around the world. This is often called method transfer. When a new lab is added to the network, method transfer protocols often involve training the staff of the new lab in the originating lab to factor out variance associated with location. Plus, running assays in one lab with different people permits measurement of operator-associated variance.
However, experience also shows that this is not enough, and that prudent planning usually involves sending experienced staff from the originating lab to the new lab to debug the protocols running in the new lab. This usually leads to harmonization of results. Once the results are harmonized, they need to be monitored by periodic tests in round-robin mode. In addition, some enterprises monitor the QC data for all sites in real time to look for trends and compare laboratory performance. This is also critical in evaluating the robustness of the method.
Box 5-2 of the NAS report1 (page 71) presents two cases of non-replication of results, one arising from agitation by stirring versus shaking. A second example with worms found that operator technique in handling worms was a major contributor to assay variability. Careful debugging of the methods resolved differences in construction of the 0 point of timelines for the worms. After the 0 points were harmonized, the researchers found that worms partitioned into two groups with different longevity. This led to questions about why. So, the experimental design was modified. These examples illustrate how non-replicability can lead to in advances in knowledge.
Complexity and controllability are also important variables (page 73 and ff.) The laboratory sciences enjoy the advantages of studying less complex systems than the social sciences, which deal with human behavior. Plus, the laboratory sciences generally study systems with much easier controllability. We study chemical reactions where we simplify the system by using purified reagents reacting under controlled conditions. This improves replicability and the ability to discriminate between small differences. Dealing with whole humans means all the idiosyncrasies of humans such as language, nonverbal communication, hidden motives, etc. The report concludes that it is important to study the sources of non-replicability because it reduces the efficiency and integrity of science.
The authors point out that one needs to be careful about rejecting non-replicability of previously reported values. The first report may be an outlier, and a subsequent report may be more representative (page 77). They warn that one-on-one studies are not effective in determining bias in replication studies.
Other criticisms of the status quo are the misalignment between academic incentives and inappropriate statistical inference (page 79). The latter compares post-hypothesis testing with exploratory research, which generates hypotheses. This leads to unexpectedly high false positive rates. I was impressed with the paragraph on HARKing—Hypothesizing After Results Are Known. Errors can creep in from almost any source or time.
In the lay press, the criticism of non-replicability raises the question of potential fraud. The frequency of fraud in published reports seems to be very small (4 in 10,000) compared to the huge amount of information being reported and curated. If supplemental information is posted openly, fraud will certainly be discovered if the work is significant enough to justify the expense. And, if it is not important, why take the risk?
The report is a thoughtful response to the general criticism of science and scientists. Lay people believe that science provides black or white answers. Depending upon the topic, this may or may not be true—usually, not. Experimental complexity and controllability are very different between chemistry and physics, astronomy, the life sciences, etc., and psychology, economics, political science, etc. We have developed highly discriminating tools that help deliver useful new products, including improved health. Society is clearly benefiting. We need to develop similarly effective tools, especially to deal with problems of mental health and behavior.
I recommend this report for anyone concerned about the integrity of science. It should also be useful for people responsible for improving the scientific infrastructure such as data storage, retrieval, and longevity.
- Reproducibility and Replicability in Science. The National Academies Press 2019, Washington, DC; doi: https://doi.org/10.17226/25303.
Robert L. Stevenson, Ph.D., is Editor Emeritus, American Laboratory/Labcompare; e-mail: [email protected]