Scientific Reproducibility and the LIS Professional
Overview
Teaching: 30 min
Exercises: 15 minQuestions
What is the role of the LIS professional in supporting reproducible research?
What are the requisite skills and knowledge for executing data curation for reproducibility workflows?
In what ways can data curation for reproducibility activities be incorporated into services?
Objectives
Explain how curation for reproducibility differs from common models of data curation that focuses on data as the object of curation.
Describe what it means to be a data savvy librarian.
Provide examples of curation for reproducibility service implementation.
There are several stakeholders that contribute to reproducibility. The researcher who incorporates data management activities in their research workflows, the funding agencies who mandate data sharing, the repository that provides a platform for making the data available, and the journal editor who checks that authors provide information on how to access their data all play a role in promoting reproducibility. This episode explores why library and information science (LIS) professionals are also an important part of this endeavor and how they can support it.
The Role of the LIS Professional
Researchers are becoming more aware of services that support reproducibility standards, many of which fall within the domain of the LIS professional. They have made note that “libraries, with their long-standing tradition of organization, documentation, and access, have a role to play in supporting research transparency and preserving…’the research crown jewels’” (Lyon, Jeng, & Mattern, 2017, p. 57). Indeed, appraising the value of materials, arranging and describing them, creating standardized metadata, assigning unique identifiers, and other common library and archives tasks are applicable to curation for reproducibility.
It is important to acknowledge that accepting this type of role introduces additional demands on the LIS toolbox that calls for some degree of subject knowledge and technical skills to engage in rigorous data curation activities that support sustained access and use of high-quality research materials to promote scientific reproducibility i.e., data curation for reproducibility.
The “data savvy librarian” is one who can execute data curation for reproducibility workflows because they:
- can apply the fundamentals of digital preservation to ensure that research materials are discoverable, accessible, understandable, and reusable into the future;
- have the technical skills to assess the quality of research materials produced by computational research projects; and
- are familiar with the research lifecycle including the methods, workflows, and tools of the disciplinary domain in which data are collected or generated, transformed, and analyzed.
The diagram below illustrates how various areas of skills and knowledge apply to primary components of data curation for reproducibility as they are defined by the Data Quality Review Framework (see Lesson 2 for an in depth discussion of each component of the Data Quality Review Framework).
(Optional) Read more about the data savvy librarian and their role in supporting scientific reproducibility:
Barbaro, A. (2016). On the importance of being a data-savvy librarian. JEAHIL, 12(1), 25–27. http://ojs.eahil.eu/ojs/index.php/JEAHIL/article/view/100/104
Burton, M., Lyon, L., Erdmann, C., & Tijerina, B. (2018). Shifting to data savvy: The future of data science in libraries. University of Pittsburgh. http://d-scholarship.pitt.edu/id/eprint/33891
Kouper, I., Fear, K., Ishida, M., Kollen, C., & Williams, S. C. (2017). Research data services maturity in academic libraries. In Curating research data: Practical strategies for your digital repository (pp. 153–170). Association of College and Research Libraries. https://doi.library.ubc.ca/10.14288/1.0343479
Lyon, L., Jeng, W., & Mattern, E. (2017). Research transparency: A preliminary study of disciplinary conceptualisation, drivers, tools and support services. International Journal of Digital Curation, 12(1), 46. https://doi.org/10.2218/ijdc.v12i1.530
Pryor, G., & Donnelly, M. (2009). Skilling up to do data: Whose role, whose responsibility, whose career? International Journal of Digital Curation, 4(2), 158–170. https://doi.org/10.2218/ijdc.v4i2.105
Sawchuk, S. L., & Khair, S. (2021). Computational reproducibility: A practical framework for data curators. Journal of eScience Librarianship, 10(3), 1206. https://doi.org/10.7191/jeslib.2021.1206
Spotlight: CuRe Career Pathways
Curating for reproducibility requires a set of skills that are rare for any one person to have. Pryor and Donnelly (2009) remarked that careers in data curation are often “accidental” in the absence of established career pathways. Indeed, how people obtain the skills to fill professional roles that include data curation for reproducibility responsibilities can be very different. Consider these examples:
A researcher with years of experience engaged in computational science comes to understand the importance of data curation after responding to data sharing demands from journals, funding agencies, and other researchers. Experiencing the benefits of managing their data to enable sharing, the researcher makes a career transition to become a data manager for a research lab. In this new role, they use their domain expertise to communicate effectively with researchers as they incorporate data curation activities into the lab’s research workflows.
A librarian, who took several graduate-level courses in digital archives and records management, works for an academic library that is expanding its research support services. Because of the relevance of knowledge gained from their graduate studies, the librarian is assigned a data curation role that includes performing quality review and ingesting research data into the institutional repository. After engaging with researchers to understand their data needs and taking classes in statistical software, the librarian is able to include curation for reproducibility activities into the repository ingest workflow.
Among the editorial staff of some scholarly journals is a data editor, who is responsible for enforcing the journal’s strict data policies that require authors to submit their research compendium for review prior to article publication. The data editor uses their domain expertise, computational skills, and understanding of data quality standards to evaluate results for reproducibility. In this role, the data editor also provides guidance to authors to encourage them to curate their research compendium before submitting them for review.
Regardless of the path that leads one to a career that involves curation for reproducibility, they are part of an emerging workforce equipped with a unique set of skills that are highly sought by scientific stakeholders that demand reproducible research.
Reproducibility Services
Founded in 1947, the Roper Center is considered to be one of the earliest examples of an institution formalizing specific activities around data preservation and dissemination. Despite growth in the number of organizations dedicated to providing long-term access to research data assets, data curation is still early in its maturity as an established discipline. Data curation for reproducibility is an even more undeveloped area, with few individuals and groups actively engaged in the practice.
Those that have implemented data curation for reproducibility services can serve as models for academic libraries, data repositories, research institutions, and other groups planning to expand their services to support reproducibility. The three institutions highlighted below offer examples of how data curation for reproducibility services can be delivered.
Institution for Social and Political Studies, Yale University
The Institution for Social and Policy Studies (ISPS) was established in 1968 by the Yale Corporation as an interdisciplinary center at the university to facilitate research in the social sciences and public policy arenas. ISPS is an independent academic unit within the university, including affiliates from across the social sciences. ISPS hosts its own digital repository meant to capture and preserve the intellectual output of and the research produced by scholars affiliated with ISPS, and strives to serve as a model for sharing and preserving research data by implementing the ideals of scientific reproducibility and transparency.
Datasets housed in the ISPS Data Archive have undergone a rigorous ingest process that combines data curation with data quality review to ensure materials meet quality standards that support computational reproducibility. The process is managed by the Yale Application for Research Data workflow tool, which structures and tracks curation and review activities to generate high quality data packages that are repository-agnostic.
Cornell Center for Social Sciences, Cornell University
The Cornell Center for Social Sciences founded in 1981, anticipates and supports the evolving computational and data needs of Cornell social scientists and economists throughout the entire research process and data lifecycle. CCSS is home to one of the oldest university-based social science data archives in the United States that contains an extensive collection of public and restricted numeric data files in the social sciences with particular emphasis on demography, economics and labor, political and social behavior, family life, and health.
CCSS also offers a Data Curation and Reproduction of Results Service, R-squared or R2, where researchers with papers ready to submit for publication can send their data and code to CCSS prior to submission for appraisal, curation and replication. This is to ensure that published results are replicable; and that data and codes are well documented, reusable, packaged, and preserved in a trustworthy data repository for access by current and future generations of researchers.
Odum Institute for Research in Social Science, University of North Carolina at Chapel Hill
The Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill provides education and support for research planning, implementation, and dissemination. The Odum Institute hosts the UNC Dataverse, which provides open access to curated collections of social science research datasets, while also serving as a repository platform for researchers to preserve, share, and publish their data.
While the Odum Institute Data Archive has been curating research data to support discovery, access, and reuse since 1969, the archive has recently expanded its service model to include comprehensive data quality review. The Odum Institute provides this data review service to journals that wish to add a verification component to their data sharing policies. The Odum Institute model of curating data for reproducibility as a cost-based service to journals exemplifies a convergence of stakeholders around the principles of research transparency and reproducibility.
Discussion: CuRe Implementers
Visit the website for an organization below or any other organization that has implemented data curation for reproducibility services and/or workflows. Based on the information presented on the website, discuss the following:
- Overall mission of the organization
- How the service supports that mission
- Who within the organization provides the service
- The audience to whom the service is targeted
American Economics Association, Data Editor
https://aeadataeditor.github.io/aea-de-guidance/Certification Agency for Scientific Code and Data (cascad)
https://www.cascad.tech/CODECHECK, University of Twente
https://www.itc.nl/research/open-science/codecheck/Cornell Center for Social Sciences: Results Reproduction (R-squared) Service
https://socialsciences.cornell.edu/research-support/R-squaredInstitution for Social and Policy Studies, Yale University
https://isps.yale.edu/research/dataValidation by The Science Exchange
http://validation.scienceexchange.com/#/homeSmathers Libraries, University of Florida
https://arcs.uflib.ufl.edu/services/reproducibility/
Talking to others about the importance and benefits of curating for reproducibility can be intimidating when put on the spot, especially when asked to speak about it for the first time. Taking time to think through what services look like or could look like at your institution can go a long way to articulating your ideas effectively. Be strategic, have fun, and get others as excited with your understanding and commitment to reproducibility by meeting others where they are.
Exercise: Elevator Pitch
Using what you have learned about reproducibility, its importance and how the LIS profession is well situated to support researchers with curating for reproducibility, spend 5-10 minutes drafting an elevator pitch about piloting a service to your colleague, supervisor, or dean. What will you want to get across with just a few minutes of their time? As you draft your pitch consider the following:
- How does the service fit into your organization’s strategic plan?
- What value will it bring to the organization?
- Who are the stakeholders providing the service as well as receiving the service?
After you draft your pitch, practice saying it out loud a few times. The next time you have a chance to advocate for reproducibility, you will be ready!
Key Points
Data savvy librarians and other information professionals play an important role in supporting and promoting scientific reproducibility.
While LIS professionals already engage in many practices that support reproducibility, they may need to skill up to perform some critical curation for reproducibility tasks.
There are various models of data curation for implementation services. It is important to think about what a service might look like at your organization so that you can articulate your ideas effectively when given the opportunity.