Adding Humans to Computational Repeatability

A prior blog post explored the concept of computational repeatability.

Following the unusual definitions of the NAS report,1 non-replicability introduces the human-intensive variances to the problem of computational repeatability. The U.S. Congress charged the authors to examine what is known about non-replicability and what can be done to mitigate it.

The paragraphs in the NAS report explain reasons for non-replicability, include many causes inherent in statistical analysis such as p values. The report recommends:

Recommendation 5-1: Researchers should, as applicable to the specific study, provide an accurate and appropriate characterization of relevant uncertainties when they report or publish their research. Researchers should thoughtfully communicate all recognized uncertainties and estimate or acknowledge other potential sources of uncertainty that bear on their results, including stochastic uncertainties and uncertainties in measurement, computation, knowledge, modeling, and methods of analysis.

One issue is that studies may have different purposes and end points. Comparison of data from different studies may involve repurposing of data sets outside of their original purpose. There are certainly time changes between studies, and time changes often bring unavoidable changes in instrumentation, technique, and experimental design. Science is not static, especially given the pressure to publish advances. Would editors really want to publish a long list of reports focused on confirming the discovery of new protein using old and accepted technology? Not likely. What would this do to the impact factor of their journal?

Science today is built on the acceptance of the preponderance of evidence. Topics that merit reporting of checking replicability are either exceptionally important or unexpected and probably both.

Recommendation 5.1 seems to ask scientists to conjecture about the possible other meanings of their reported results. This would entail generation of models including explanations and possibilities that the author conceived but rejected. I think this will add length to published papers and make them less readable. Current practice is to let another scientist propose a different model, concept, or test, and then design an experiment to differentiate between the new and previous model.

The report acknowledges that time spent checking prior reports is probably not worth the time. It is better to try to expand scientific understanding. However, they then add, “Efforts to minimize avoidable and unhelpful sources of non-replicability warrant continued attention.”

Next, they address fraud and poor judgment, including poor execution. This boils down to intent to deceive, a simple mistake, or poor skills. It is better to point out the unexpected and unexplained data and say we are working to understand it.

Attributes of a particular line of scientific inquiry within any discipline can be associated with higher or lower rates of non-replicability. Susceptibility to non-replicability depends on:

  • the complexity of the system under study;
  • the number and relationship of variables within the system under study;
  • the ability to control the variables;
  • levels of noise within the system (or signal to noise ratios);
  • a mismatch of scale of the phenomena and the scale at which it can be measured;
  • stability across time and space of the underlying principles;
  • fidelity of the available measures to the underlying construct at study (e.g., direct versus indirect measurements); and
  • the a priori probability (pre-experimental plausibility) of the scientific hypothesis.

It is hard to argue with the experimental difficulty of some fields. Quantum chemistry is one. Surveys, economics, and political science are others confounded by human variability.

Today, we are struggling to understand quantum effects of entangling of quantum states over long distances and the expected power of quantum computing. Quantum experts talk of technology that is very foreign to our experience. The infant state of quantum development means that one should expect experimental difficulty, including lots of failures. Failures do not mean fraud or poor skills. They are a consequence of the state-of-the-art.

Reference

  1. Reproducibility and Replicability in Science. The National Academies Press 2019, Washington, DC; doi: https://doi.org/10.17226/25303.

Robert L. Stevenson, Ph.D., is Editor Emeritus, American Laboratory/Labcompare; e-mail: [email protected]

Related Products

Comments