This lesson is still being designed and assembled (Pre-Alpha version)

Trustworthy Repositories

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What is a “trustworthy” repository?

  • What are the broad types of repositories?

  • How can you locate a trustworthy repository for your data?

  • What are things to consider as you decide on a repository?

Objectives
  • Identify the elements of a trustworthy repository

  • Navigate resources for finding trustworthy repositories across various disciplines

  • Learn the differences between generalist, disciplinary, and institutional repositories

  • Assess if a trusted repository is right for your data

What is a “Trustworthy” Repository?

As you have learned so far in this curriculum, the data curation process entails many steps that support the long-term sustainability and reuse of your research data. In this episode, you will learn why storing this compendium package in an appropriate location is a crucial step in ensuring that other researchers can find your data and research outputs and reuse them in the future. With a steadily increasing amount of data repositories available to researchers, it can be difficult to know which repository is right for your research data and will support keeping your data safeguarded and preserved. Lin et al. (2020) define a set of principles defining digital repositories that serve this function called the TRUST principles:

T - Transparency

R - Responsibility

U - User Focus

S - Sustainability

T - Technology

If a digital repository enacts these principles, you can reasonably consider it to be a trustworthy repository for your data. A helpful guiding framework for assessing if a digital repository exemplifies the TRUST principles is the CoreTrustSeal certification, which offers a catalogue of requirements that represent a trustworthy data repository for your research data. This catalogue includes 16 data repository requirements that are assessed in repositories and if the repository meets those requirements, it gains the CoreTrustSeal certification. In the “Things to Consider When Choosing a Repository” section below, the process of assessing the trustworthiness of a repository based on the CoreTrustSeal requirements are detailed.

Spotlight: Data Sharing Platforms - Are They Trustworthy Repositories?

There are many platforms available to researchers for storing and sharing their data, such as GitHub, Open Science Framework, and ArcGIS Online. However, can these platforms be considered trustworthy repositories? Information scientists and librarians often do not place these kinds of data sharing platforms in the same categories as other trustworthy repositories, despite their heavy usage as research compendium storage products. Data curators may need to discuss with researchers why a trustworthy repository is a better choice for the long-term preservation of their research compendium, based on the following reasons:

  1. While these platforms may have high visibility in some research areas (such as geospatial scientists sharing data in ArcGIS Online), they often do not adhere to the CoreTrustSeal requirements and the TRUST principles. This means that the data may not have the same assurances for long-term preservation, governance, and transparency as a trustworthy repository.

  2. Trustworthy repositories have dedicated personnel to support data depositors, which may not be the case with these kinds of data sharing platforms. Therefore, researchers may find themselves having to navigate data deposit issues without the advice and guidance of a data curator.

While platforms such as GitHub, Open Science Framework, and ArcGIS Online are useful project management and collaboration platforms for the research process, researchers should strive to store their research compendia in trusted repositories that fit the TRUST principles and CoreTrustSeal requirements.

Finding a Repository

Now, you will learn more about the different types of repositories available to researchers, and how to assess their trustworthiness. As funding agencies and publishing outlets continue to push for more data sharing in academic research, the amount of repositories available for this data has also grown. Choosing a trustworthy repository is important for safeguarding the long-term sustainability and reuse potential of the data. Beyond the trustworthiness of a repository, there are categories of repositories that may be more suitable for the particular type of research data you are working with, and may support more exposure and reuse of your shared data. The Digital Curation Centre offers further resources and discussion on the benefits and considerations of each of these kinds of repositories.

Disciplinary and Subject-Focused Repositories

Disciplinary and subject-focused repositories cater to data and research outputs from specific areas of study and focus, such as political science, mechanical engineering, Indigenous data, and social sciences. Examples of disciplinary repositories include:

  1. National Center for Biotechnology Information (NCBI), which provides access to research outputs concerning biomedical and genomic information
  2. Database of Religious History, a repository containing quantitative and qualitative data pertaining to religious cultural history
  3. Center for International Earth Science Information Network, which provides access to data concerning human-environment interactions across the world.
  4. Mukurtu, a platform which empowers Indigenous communities to manage, narrate, and share their digital heritage
  5. Inter-university Consortium for Political and Social Research (ICPSR), which maintains and provides access to an extensive archive of social science data for research and instruction purposes

Many disciplinary repositories can be found on Re3data, a registry of research data repositories that allows users to search by the research data subject (note that if a repository is included on re3data.org, it does not mean it is automatically a trusted repository and must still be evaluated for trustworthiness by the researchers).

Generalist Repositories

Generalist repositories store and provide access to a wide range of research data types and do not restrict content types by discipline. These repositories are particularly useful for researchers whose discipline does not have a repository dedicated to their area of study. Examples of curated generalist repositories include:

  1. Dataverse
  2. Dryad
  3. Zenodo

While many generalist repositories can be considered trustworthy repositories, researchers should still plan to evaluate each potential repository they are considering for their data.

Institutional Repositories

These repositories are associated with a particular institution, such as a university/college, research institute, or national laboratory., and are generally used to store and showcase the outputs of researchers within that institution (Callicott et al. 2016). These types of repositories generally accept research materials from all disciplines of research present at an institution, and thus can be considered generalist repositories as opposed to disciplinary repositories focusing on a specific area of research. Some schools have a single repository for all research products (such as Temple University’s TUScholarShare) while others may have separate repositories for data and other scholarly products (pre-prints, articles, Electronic Theses and Dissertations, etc.). Examples of institutional repositories include:

  1. KiltHub, the official institutional repository for Carnegie Mellon University, managed through the University Libraries
  2. Oxford University Research Archive, the institutional repository for researchers at the University of Oxford
  3. Stanford Digital Repository, managing scholarly outputs from researchers at Stanford University

Most institutional repositories have a dedicated support staff and a mission for preserving scholarly information, and often constitute a trustworthy digital repository. However, as is the case with disciplinary and generalist repositories, researchers should plan time in their research workflow to evaluate their repository of choice for trustworthiness.

Things to Consider When Choosing a Repository

Now that you have learned about different categories of repositories, what are some additional tips for choosing the right repository for your data? Researchers can look for the CoreTrustSeal logo when visiting the website of a potential repository they might use for storing and sharing their compendium package.

An example of the CoreTrustSeal on a digital repository

However, your repository of interest may not have the official CoreTrustSeal certification. This does not necessarily mean that the repository cannot be considered trustworthy! You can still use the 16 requirements of the CoreTrustSeal certification to review your potential repository for trustworthiness and adherence to the TRUST principles. The CoreTrustSeal certification is an example of a standard used to signal that a repository meets the requirements of the TRUST principles, but due to the extensive certification process required, some repositories may not (yet) have this certification. Therefore, reviewing a repository that does not have the CoreTrustSeal certification for its adherence to the TRUST principles is incredibly important.

Spotlight: Repository Checklist

When assessing a repository for trustworthiness and fit for your data, look for the CoreTrustSeal certification, or in the absence of this certification, consider the following questions based on the 16 CoreTrustSeal requirements. In the column with the header “TRUST Principle,” list the principle(s) that correspond to the goals of each requirement.

Question to Consider from CoreTrustSeal Requirements TRUST Principle
Look for the mission/scope of the repository. Does the mission/scope discuss providing access to and preserving data?  
Look for the mission/scope of the repository. Does the mission/scope discuss providing access to and preserving data?  
What are the licenses covering data access and use in the repository? Does the repository monitor compliance of data access and use in the repository?  
Does the repository have a continuity plan to ensure ongoing access to and preservation of the items within?  
Does the repository ensure that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms?  
Does the repository have adequate funding, staff, and governance for carrying out the mission of the repository?  
Does the repository have access to expert guidance and feedback beyond the repository staff which can be applied to deposits in the repository?  
Does the repository guarantee the integrity and authenticity of the data?  
Does the repository have an appraisal process to determine if the data and metadata meet certain criteria levels for deposits?  
Does the repository have documented procedures and processes for managing archival copies of the deposits?  
Does the repository assume responsibility for long-term preservation?  
Does the repository have appropriate expertise to address technical data and metadata quality?  
Does archiving take place according to workflows from ingest and dissemination?  
Does the repository enable users to discover the data and refer to them in a persistent way through proper citation?  
Does the repository enable reuse of the data over time, ensuring that appropriate metadata are available to support reuse?  
Does the repository function on well-supported operating systems and core infrastructural software?  
Does the repository have security functions which provide for the protection of the platform and its data, products, services, and users?  

It is important to acknowledge that not every repository is a good fit for all data. Sometimes recommending a repository outside of your own institution is a better service than accepting something that is not a good fit. Take some time to explore a specific data set and repositories that would be a good fit for it. This hypothetical exercise will help get you out from under your repository biases.

Exercise: Locate a Trustworthy Repository for a Dataset

Dataset: Investing in Education in Europe: Attitudes, Politics and Policies (INVEDUC)

Now, let’s test out your knowledge so far from this episode by locating a trustworthy repository for the above dataset. Imagine you are the creator of this dataset, and you are looking for a repository to store this data to preserve and share it with others. Evaluate the characteristics (discipline, data type, etc.) of this dataset and use re3data.org https://www.re3data.org/ to identify three possible trustworthy repositories that are well-suited for the dataset.

Solution

In a group discussion, demonstrate how your three possible repositories demonstrate trustworthiness and fit for this sample dataset. How would you decide which repository to choose?

If you are having trouble finding appropriate repositories, consider these potential candidate repositories:

Discussion: Assessing your Repository using CoreTrustSeal Requirements

Assess one of the repositories you identified in the previous exercise on its compliance with the 16 CoreTrustSeal requirements using the repository checklist. Discuss the pros and cons of using this repository based on its compliance with the requirements: would you and your research team still consider using this repository? Are there any special considerations for what you might include in your research compendium, such as additional necessary documentation?

Key Points

  • The CoreTrustSeal certification is one of the benchmarks that denotes trustworthy repositories where researchers can safely store and share their compendium packages.

  • Researchers can choose disciplinary, generalist, or institutional repositories to store their compendium packages.

  • If your repository of choice does not have the CoreTrustSeal certification, that does not necessarily mean that it is not considered trustworthy. You can still evaluate the trustworthiness of a repository through the CoreTrustSeal’s 16 requirements.