Improving Reproducibility and Replicability in the Sciences: Part 3

The National Academies of Science, Engineering, and Medicine formed the Committee on Reproducibility and Replicability in Science and drafted 10 Recommendations that should improve both. This installment is the third to examine the report by the NAS.1 (See parts 1 and 2 at https://www.americanlaboratory.com/Blog/360851-A-Critical-Review-of-Reproducibility-and-Replicability-from-the-NAS-in-2019/ and https://www.americanlaboratory.com/Blog/360993-Adding-Humans-to-Computational-Repeatability/, respectively. The recommendations listed below are followed by my commentary.

Recommendation 6-1: All researchers should include a clear, specific, and complete description of how the reported result was reached. Different areas of study or types of inquiry may require different kinds of information. Reports should include details appropriate for the type of research, including:

  • a clear description of all methods, instruments, materials, procedures, measurements, and other variables involved in the study;
  • a clear description of the analysis of data and decisions for exclusion of some data and inclusion of others;
  • for results that depend on statistical inference, a description of the analytic decisions and when these decisions were made and whether the study is exploratory or confirmatory;
  • a discussion of the expected constraints on generality, such as which methodological features the authors think could be varied without affecting the result and which must remain constant;
  • reporting of precision or statistical power; and
  • a discussion of the uncertainty of the measurements, results, and inferences.

This seems to call for a modest expansion of current practice.

Recommendation 6-2: Academic institutions and institutions managing scientific work such as industry and the national laboratories should include training in the proper use of statistical analysis and inference. Researchers who use statistical inference analyses should learn to use them properly.

There is a movement to revise some statistical protocols and measures such as p values. It will be hard to describe and prescribe “proper use” when the topic is under critical review.

Recommendation 6-3: Funding agencies and organizations should consider investing in research and development of open-source, usable tools, and infrastructure that support reproducibility for a broad range of studies across different domains in a seamless fashion. Concurrently, investments would be helpful in outreach to inform and train researchers on best practices and how to use these tools.

This recommendation is hard to dispute, but the infrastructure should first fit the science at hand. Historical precedence should also be considered. Different segments of science have different ontologies. Erasing these without full explanation and bridging will lead to confusion and mistakes.

Recommendation 6-4: Journals should consider ways to ensure computational reproducibility for publications that make claims based on computations, to the extent ethically and legally possible. Although ensuring such reproducibility prior to publication presents technological and practical challenges for researchers and journals, new tools might make this goal more realistic. Journals should make every reasonable effort to use these tools, make clear and enforce their transparency requirements, and increase the reproducibility of their published articles.

Recommendation 6.4 puts an extreme burden on the journals. Journal publishers would need to have access to computational experts, possibly with years of experience. This would certainly be a large, potentially expensive expansion of scope. Open-source journalism will shrink the revenue stream of many publishers. Adding a new responsibility is not likely.

Another factor is that the code that controls the instrument and provides the data output may not be available to the user. Features such as auto-tune to change the filtering as a function of time or instrument response may not be under the control of the operator and can vary from run-to-run or within a run.

Recommendation 6-5: In order to facilitate the transparent sharing and availability of digital artifacts, such as data and code, for its studies, the National Science Foundation (NSF) should:

  • develop a set of criteria for trusted open repositories to be used by the scientific community for objects of the scholarly record.
  • seek to harmonize with other funding agencies the repository criteria and data-management plans for scholarly objects.
  • endorse or consider creating code and data repositories for long-term archiving and preservation of digital artifacts that support claims made in the scholarly record based on NSF-funded research. These archives could be based at the institutional level or be part of, and harmonized with, the NSF-funded Public Access Repository.
  • consider extending NSF’s current data-management plan to include other digital artifacts, such as software.
  • work with communities reliant on non-public data or code to develop alternative mechanisms for demonstrating reproducibility.

Science is built on archived studies including data that needs to be preserved indefinitely. However, the rapid evolution of data storage technology makes this a formidable task with a real danger of obsolescence destroying access.Box 6-1 on page 98 discusses archival repositories that claim long-term preservation (10 years). Figshare and Zenodo are discussed, and Dataverse and Dryad are also mentioned. Ten years is not my idea of a long-term solution. How about 100, 1,000, or 10,000 years? Indefinite stewardship of files by future machines and code is discussed in more detail on page 99 and ff. The ongoing costs are an unresolved issue.

The digital object identifier (DOI) is designed to provide a persistent link to material on the internet. Publishers can update the location, which directs the user to the most recent object.

Recommendation 6-6: Many stakeholders have a role to play in improving computational reproducibility, including educational institutions, professional societies, researchers, and funders.

  • Educational institutions should educate and train students and faculty about computational methods and tools to improve the quality of data and code and to produce reproducible research.
  • Professional societies should take responsibility for educating the public and their professional members about the importance and limitations of computational research. Societies have an important role in educating the public about the evolving nature of science and the tools and methods that are used.
  • Researchers should collaborate with expert colleagues when their education and training are not adequate to meet the computational requirements of their research.
  • In line with its priority for “harnessing the data revolution,” the National Science Foundation (and other funders) should consider funding of activities to promote computational reproducibility.

The skills to increase computational reproducibility rare today and outside the content of graduate programs that I’m aware of. Our current apprenticeship program for training scientific professionals (academic and industrial) would need to be expanded in focus and duration. Such expansion would be expensive and certainly decrease the number of grants.

Also, the focus of 6-6 is on American granting agencies, NSF in particular. Yet the problem is global. There needs to be an international involvement and consensus, probably at the G-20 level.

Recommendation 6-7: Journals and scientific societies requesting submissions for conferences should disclose their policies relevant to achieving reproducibility and replicability. The strength of the claims made in a journal article or conference submission should reflect the reproducibility and replicability standards to which an article is held, with stronger claims reserved for higher expected levels of reproducibility and replicability.

Journals and conference organizers are encouraged to:

  • set and implement desired standards of reproducibility and replicability and make this one of their priorities, such as deciding which level they wish to achieve for each Transparency and Openness Promotion guideline and working towards that goal;
  • adopt policies to reduce the likelihood of non-replicability, such as considering incentives or requirements for research materials transparency, design, and analysis plan transparency, enhanced review of statistical methods, study or analysis plan preregistration, and replication studies; and
  • require as a review criterion that all research reports include a thoughtful discussion of the uncertainty in measurements and conclusions.

Recommendation 6-7 seems to miss the point that material presented as lectures and posters at conferences and technical meetings are necessarily limited in time. Also, presentations are limited by the attention span of the audience. Most presentations are summaries of a body of work. Details should be handled off-lectern. The new practice of posting supplemental material on the web is probably the best way to archive and communicate detailed information.

Recommendation 6-8: Many considerations enter into decisions about what types of scientific studies to fund, including striking a balance between exploratory and confirmatory research. If private or public funders choose to invest in initiatives on reproducibility and replication, two areas may benefit from additional funding:

  • education and training initiatives to ensure that researchers have the knowledge, skills, and tools needed to conduct research in ways that adhere to the highest scientific standards; that describe methods clearly, specifically, and completely; and that express accurately and appropriately the uncertainty involved in the research; and
  • reviews of published work, such as testing the reproducibility of published research, conducting rigorous replication studies, and publishing sound critical commentaries.

Recommendation 6-8 advocates establishing a balance between exploratory and confirmatory research. Current practice favors exploratory science, with confirmatory studies justified and thus limited to cases where data integrity is highly valuable such as validated methods for clinical diagnostics and human therapeutics. It would be nice to have both high reproducibility and replicability. However, there is a cost and value to high reproducibility and replicability. Decision-making usually favors situations where the cost of replicability is less costly than the value. Value is hard to evaluate rigorously, but the consumer (payer) is usually able to make the calculation instantaneously.

In a resource-constrained world, priorities will exist. Priorities need to be clearly stated so that all can understand, and most will agree.

Recommendation 6-9: Funders should require a thoughtful discussion in grant applications of how uncertainties will be evaluated, along with any relevant issues regarding replicability and computational reproducibility. Funders should introduce review of reproducibility and replicability guidelines and activities into their merit-review criteria, as a low-cost way to enhance both.

Funders such as NSF should inform grant applicants about how the review committees will be asked to evaluate how uncertainties in measured results will be evaluated. Perhaps Recommendation 6-9 could be handled as a series of boiler-plate pages that could be attached to the grant application and award.

Recommendation 6-10: When funders, researchers, and other stakeholders are considering whether and where to direct resources for replication studies, they should consider the following criteria:

  • The scientific results are important for individual decision-making or for policy decisions.
  • The results have the potential to make a large contribution to basic scientific knowledge.
  • The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.
  • There is controversy about the topic.
  • There was potential bias in the original investigation, due, for example, to the source of funding.
  • There was a weakness or flaw in the design, methods, or analysis of the original study.
  • The cost of a replication is offset by (less than) the potential value in reaffirming the original results.
  • Future expensive and important studies will build on the original scientific results.

Where are we today? One recent data point from a list in a 2017 report shows that 20% of PLOS One articles have data or code in a repository, 60% have data in the main text or supplemental information, and 20% have restrictions on data access.2

Implementation of the 10 recommendations would probably improve data integrity, but at the expense of longer training periods for earning advanced degrees. Adoption will decrease the productive period of individuals by the same period. Already, society gets only about 40 years of work product after 20 plus years of training (preschool to Ph.D.).

I think that there are other items that should be added to the curriculum for graduate students in the laboratory sciences. This includes laboratory safety, instrumentation, and computational qualifications, including installation (IQ), operational (OQ) and performance (PQ). Perhaps this course could replace the traditional language exams at least for English speakers.

References

  1. Reproducibility and Replicability in Science. The National Academies Press 2019, Washington, DC; doi: https://doi.org/10.17226/25303.
  2. Byrne, M. Making progress toward open data: reflections on data sharing. PLOS One May 8, 2017; https://blogs.plos.org/everyone/2017/05/08/making-progress-toward-open-data/

Robert L. Stevenson, Ph.D., is Editor Emeritus, American Laboratory/Labcompare; e-mail: [email protected]

Related Products

Comments