Introduction
Computational reproducibility is defined as the ability to obtain consistent computational results using the same input data, computational steps, methods, code, and analysis conditions as those used in the original study. Computational reproducibility is a means of making scientific claims more transparent. It is imperative for verifying and building upon reported findings, preserving a complete scientific record, and enhancing pedagogical strategies for research methods training. For various reasons, the computational reproducibility standard has not been adopted as an integral part of normative scientific practice.
Computational reproducibility involves the assembly of a “reproducibility file bundle” or “research compendium” that includes all of the research artifacts (e.g., data, code, documentation) necessary for the computation. Prior to publishing or archiving the research compendium, the objects contained within require curation to ensure they meet quality standards for computational reproducibility.
The object of curation in the context of reproducibility is the scientific claim: We are curating the digital artifacts that underlie the claim.
Curation for FAIR and reproducible research is the process of reviewing and enhancing a research compendium for long term reuse.
We curate to: a) assess whether computational reproducibility can be achieved using the digital artifacts contained within the research compendium; and b) ensure that the quality of the digital artifacts aligns with the FAIR principles and community standards for long-term archival preservation.
Both activities are essential aspects of curation in the context of reproducible research. We call these practices Curation for FAIR and Reproducible Research, or CURE-FAIR.
Curation is often carried out near the end of the research lifecycle by data and archive professionals who may also be subject matter experts. However, there are many data and code management actions that other key stakeholders can take earlier in the lifecycle to facilitate the production of FAIR and computationally reproducible research compendia.
Curators can be viewed as the first reusers of the research compendium. Prior to publication or archiving, curators can flag issues with a research compendium that preclude computational reproducibility, then take actions to remedy problems or recommend an appropriate course of action. For example, curators can make sure that software configuration and dependency information is well-documented so that an independent researcher can recreate the computational environment with the proper technical specifications.
This document includes standards-based guidelines for CURE-FAIR best practices in archiving and publishing computationally reproducible studies that rely on quantitative data, primarily in the social sciences. Our hope is that these “10 Things for Curating Reproducible and FAIR Research” will serve as a starting point for the development of curatorial guidelines to extend beyond the specific concerns of the social sciences community and other domains and disciplines that use similar methods, and to the particular curatorial concerns and requirements of an archives or publisher.
Computational reproducibility requires a village. This document is primarily for data curators and information professionals who are charged with verifying that a computation can be executed and that it can reproduce prespecified results. Secondarily, it will be of interest to researchers, publishers, editors, reviewers, and others who have a stake in creating, using, sharing, publishing, or preserving reproducible research.
Additional resources can be found in the CURE-FAIR Zotero Library on Curating for Reproducibility.