Thing 6: Access
Related FAIR4RS Principles: F1, F2, F4, A1, A1.1, A1.2, R1.1
Science is a community endeavor that sees maximum benefits when the materials produced in the course of research activities are made maximally available to community members to enable them to build upon, reuse, and verify scientific knowledge.
Fortunately, trustworthy repositories exist to preserve research artifacts for long-term discovery, access, and use. Trustworthy repositories are equipped with the policies, infrastructure, expertise, and workflows that ensure research compendia submitted to the repository are made publicly available alongside human- and machine-readable information (i.e, metadata) to facilitate access and use.
Ideally, the research compendium and the objects contained within it would be made as open as possible by placing the compendium in the public domain, which would permit anyone to freely use, modify, and redistribute the materials for any purpose. However, ethical and/or legal constraints may make it necessary to define and enforce acceptable uses of the research compendium or any of its component parts. This is best accomplished by assigning a formal license to the materials.
In addition to physical and legal access to the compendium, access is concerned with access to the technology and tools necessary to use the compendium. Requiring the use of costly, proprietary, obscure, or obsolete hardware or software to render compendium files, for example, impedes efforts to reproduce research results.
Because the research compendium serves as the evidence-base for published findings, enabling others to access it with minimal physical, legal, and technical barriers is essential for supporting and promoting computational reproducibility.
Access to the compendium and its components is best preserved when placed in a trustworthy repository. When considering physical access to a compendium, address the following questions:
- Is the research compendium (and all of its component parts) made available in a trustworthy repository that provides long-term archival access to materials without undue burden?
- Is the research compendium assigned a digital object identifier (DOI) or other unique, persistent identifier that allows web access to the compendium, even if its online location should change?
- Is standardized metadata included with compendium materials to facilitate discovery, access, and use of the compendium?
Whether a research compendium is placed in the public domain or has use restrictions applied, acceptable and/or conditional uses of the compendium and its individual contents should be declared using precise language in the form of a waiver or machine-readable license that addresses the following:
- Are there any compelling reasons that the compendium cannot be placed in the public domain to maximize access?
- If access to the materials must be restricted, who is permitted to request access?
- For what purposes may the materials be used?
- What types of uses of the materials are prohibited (e.g., commercial uses, modification, redistribution)?
- What are the specific protocols (if any) that one must follow to access and use the materials (e.g., IRB approval, use of a secure computing workstation)?
- What obligations must be fulfilled as conditions for accessing and using the materials (e.g., data citation, funder statement)?
As much as possible, the technology (i.e., hardware and software) required to render and use compendium files to reproduce the associated published results should be reasonably accessible by scholars for whom the research is relevant. Determine if this is the case by addressing the following:
- Is required software open-source or in common use by the research community?
- Is comprehensive, up-to-date documentation available to facilitate use of the technology?
- Does the technology have longevity, i.e, is it unlikely that the technology will become obsolete in the near future, making it unusable?
- Are there alternatives that would be suitable for rendering and using compendium files should the technology become difficult or impossible to access? In other words, are file formats hardware-, operating system-, and software-agnostic?
The resources below offer a closer look at mechanisms for supporting physical, legal, and technical access to research compendium files as a means of promoting research reproducibility.
A trustworthy repository has provisions in place to ensure the long-term discoverability, accessibility, and usability of research artifacts. Read more about the criteria repositories must meet to be considered trustworthy, and how to select a repository best suited for the types of materials contained in the compendium:
- Lin, D., Crabtree, J., Dillo, I., Downs, R. R., Edmunds, R., Giaretta, D., De Giusti, M., L’Hours, H., Hugo, W., Jenkyns, R., Khodiyar, V., Martone, M. E., Mokrane, M., Navale, V., Petters, J., Sierman, B., Sokolova, D. V., Stockhause, M., & Westbrook, J. (2020). The TRUST Principles for digital repositories. Scientific Data, 7(1), 144. https://doi.org/10.1038/s41597-020-0486-7
These authors, who are representatives of the digital repository community, present the TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) Principles that offer a framework for repository best practices.
- CoreTrustSeal. (n.d.). CoreTrustSeal. CoreTrustSeal. https://www.coretrustseal.org/
The CoreTrustSeal website provides a list of repositories that have undergone a thorough audit and review of their practices, services, and infrastructure and earned the CoreTrustSeal indicating trustworthiness.
- re3data. (n.d.). re3data.org: Registry of research data repositories. http://www.re3data.org/
Search for a repository using the re3data registry of research data repositories.
According to the Joint Declaration of Data Citation Principles developed by the Data Citation Synthesis Group (2014), “Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record." Learn more about data citation below:
- Data Citation Synthesis Group. (2014). Joint Declaration of Data Citation Principles. Force11. https://doi.org/10.25490/A97F-EGYK
The Data Citation Principles presented here emphasize the importance of data access to the scientific enterprise as it outlines the purpose, function, and value of data citations.
- Callaghan, S., Donegan, S., Pepler, S., Thorley, M., Cunningham, N., Kirsch, P., Ault, L., Bell, P., Bowie, R., Leadbetter, A., Lowry, R., Moncoiffé, G., Harrison, K., Smith-Haddon, B., Weatherby, A., & Wright, D. (2012). Making data a first class scientific output: Data citation and publication by NERC’s environmental data centres. International Journal of Digital Curation, 7(1), 107–113. https://doi.org/10.2218/ijdc.v7i1.218
This article describes mechanisms for data citation and publication to promote research transparency in the context of NERC’s environmental data centers, though is applicable to many disciplinary contexts more generally.
- National Research Council. (2012). For attribution: Developing data attribution and citation practices and standards: Summary of an international workshop. National Academies Press. https://doi.org/10.17226/13564
This National Academies publication summarizes a workshop during which participants addressed various questions on the current state of data citation practices; the importance of data citation; the technical, economic, legal, and cultural issues of data citation; and implementation of data citation practices and standards.
Whether a research artifact is placed in the public domain or not, licenses let others know how they can use, modify, or redistribute the materials. The resources below offer more insight into the why and how of licensing in the sciences:
- Stodden, V. (2009). Enabling reproducible research: Open licensing for scientific innovation. International Journal of Communications Law and Policy, 13, 22–47. Retrieved from https://ssrn.com/abstract=1362040
Victoria Stodden explains the necessity of a standard for research artifact licensing as a means of promoting sharing and attribution to support research reproducibility.
- Open Knowledge Foundation. (n.d.). Guide to open data licensing. Open Definition. https://opendefinition.org/guide/data/
This guide breaks down the practical aspects of licensing data and legal intellectual property rights for data in various country jurisdictions.
- Morin, A., Urban, J., & Sliz, P. (2012). A quick guide to software licensing for the scientist-programmer. PLoS Computational Biology, 8(7), e1002598. https://doi.org/10.1371/journal.pcbi.1002598
This article covers information on software licensing and considerations when choosing a software license.
- Hrynaszkiewicz, I., Busch, S., & Cockerill, M. J. (2013). Licensing the future: Report on BioMed Central’s public consultation on open data in peer-reviewed journals. BMC Research Notes, 6(318). https://doi.org/10.1186/1756-0500-6-318
This article provides responses to common questions about open licenses for data based on BioMed Central’s public consultation on Open Data.
It is important to understand the differences among various standard licenses to be able to determine which is most appropriate for the specific type of research artifact to be made publicly available. Below are resources and tools to help with license selection.
- Open Source Initiative. (n.d.). Licenses & standards. Open Source Initiative. https://opensource.org/licenses
The Open Source Initiative provides centralized access to information on various open source licenses along with an FAQ on open source licensing.
- Kamocki, P., Straňák, P., & Sedlák, M. (2015). Public license selector. Institute of Formal and Applied Linguistics. [http://ufal.github.io/public-license-selector/]()
This web-based tool helps to choose a license for data or software from among various licensing including Apache, CDDL, BSD, Creative Commons, GNU, and MIT licenses.
- GitHub. (n.d.). Choose an open source license. https://choosealicense.com/
This tool, developed by GitHub with contributions from the developer community, walks through the selection of an open source license.
- Ball, A. (2014, July 17). How to license research data. Digital Curation Centre. [https://www.dcc.ac.uk/guidance/how-guides/license-research-data](https://www.dcc.ac.uk/guidance/how-guides/license-research-data)
This DCC How-to Guide explains the why and how of licensing for research data including specific information about various types of licenses and when their use is most appropriate.
Lack of access to the technology (i.e., hardware, software, computing systems) required to re-execute the research workflow precludes any attempt to confirm the reproducibility of results. The resources below provide information to help maximize the usability of technology over time.
- Library of Congress. (2021, November 12). Sustainability of digital formats: Planning for Library of Congress collections. Library of Congress. https://www.loc.gov/preservation/digital/formats/index.html
This Library of Congress web site provides comprehensive information about the factors that affect the sustainability of various types of digital content.
- Library of Congress. (2021). Library of Congress recommended formats statement 2021-2022. Library of Congress. https://www.loc.gov/preservation/resources/rfs/
The Library of Congress maintains a recommended formats statement that outlines the necessary characteristics of file formats that ensure long-term preservation and usability along with a discrete list of recommended formats for various file types.
- See also Thing 10: Review to learn more about ways to ensure sustained access to the technology required for research reproducibility.
Public access to research artifacts is a cornerstone of Open Science. Learn more about the Open Science movement to situate knowledge and knowledge-making processes in the public domain.
- Open Knowledge Foundation. (n.d.). The open definition. Open Definition. http://opendefinition.org/
The Open Knowledge Foundation provides a precise definition of “openness” as it refers to research artifacts.
- Murray-Rust, P., Neylon, C., Pollock, R., & Wilbanks, J. (2010). Panton Principles: Principles for open data in science. http://pantonprinciples.org/
The Panton Principles are a set of recommendations for the adoption of specific practices to make data maximally open.
- Molloy, J. C. (2011). The Open Knowledge Foundation: Open data means better science. PLoS Biology, 9(12), e1001195. https://doi.org/10.1371/journal.pbio.1001195
In this article, the author defines open data and describes the impetus behind the open data movement in science.
- Hey, T., & Payne, M. C. (2015). Open science decoded. Nature Physics, 11(5), 367–369. https://doi.org/10.1038/nphys3313
This article focuses on reproducibility as the reason that open source code is as important as open data.
While Open Science is the goal, there are instances in which access and use of materials must be restricted due to their sensitive nature. Datasets that contain personal health information or other identifiable data collected from human participants, data collected with Indigenous partners, traditional knowledge, and precise location information of vulnerable species or protected sites often must be safeguarded against unauthorized access or disclosure. Learn more about measures for assessing and mitigating risks to human participant data from the resources below:
- Sensitive Data Expert Group. (2020). Sensitive data toolkit for researchers part 2: Human participant research data risk matrix. Zenodo. https://doi.org/10.5281/zenodo.4088954
This tool, developed by the Sensitive Data Expert Group of the Portage Network, helps to assess the level of risk to data containing personally identifiable information based on data content, context, and other factors impacting risk.
- Sensitive Data Expert Group. (2020). Sensitive data toolkit for researchers part 3: Research data management language for informed consent. Zenodo. https://doi.org/10.5281/zenodo.4107178
Planning for the public release of human participant data begins with the informed consent process. This resource explains how consent language can impact if and how data can be shared.
- Darragh, J., Hofelich, A. M., Hunt, S., Woodbrook, R., Fearon, D., Moore, J., & Hadley, H. (2020). Human subjects data essentials data curation primer. Version 2.0. https://github.com/DataCurationNetwork/data-primers
The Data Curation Network offers a primer on human subjects data that includes curation approaches for minimizing disclosure risk while providing access to the data or public-use versions of the data.
- Steffensmeier, D., & Schwartz, J. (2021) 21st century corporate financial fraud, United States, 2005-2010. Inter-university Consortium for Political and Social Research [distributor]. https://doi.org/10.3886/ICPSR37328.v1
This ICPSR repository record is an example of how sensitive data can be made discoverable while limiting access to only those individuals who have followed specific protocols for access approval.
- Centre for Applied Data Ethics. (2021). Ethical considerations in the use of geospatial data for research and statistics. Confidentiality and disclosure risk. UK Statistics Authority. https://uksa.statisticsauthority.gov.uk/publication/ethical-considerations-in-the-use-ofgeospatial-data-for-research-and-statistics/pages/3/
Geospatial data can present particular risks when the data pinpoint an individual address or when overlapping data narrow a location small enough to identify a certain population. This resource provides important information on protecting geospatial data.