Skip to Main Content

Research Data

Sharing Data

Why Share Research Data?

Researchers devote a large amount of physical and intellectual effort to collect, manage, collate, and analyse their data before publishing their results. Many of these datasets have significant value beyond the usage for the original research, and sharing the data can be seen as beneficial in a number of ways:

  • Research integrity and reproducibility: Publishing research data and citing its location in published research papers allows other to replicate, validate or build upon your results thus improving the scientific record by encouraging scientific enquiry and debate. Openly sharing research data also encourages the improvement and validation of research methods and minimises the need for data re-collection.
  • Preservation of Research Data: Some research data will be unique and cannot be replaced if destroyed or lost. Sharing via a repository will mean that the repository will look after and preserve your data into the future, even after technology becomes obsolete.
  • Innovation: Data created for one research purpose may be re-invented or re-interpreted for future unrelated research and into contexts not currently envisaged. Data sharing and re-use across borders and disciplines can also promote innovation by potential new data users.
  • Impact: Others who re-use your data and cite it in their own research help to raise interest in your research and increase your impact within your field and beyond. “Open” data leads to increased citations of the data itself, and of associated papers.
  • Funder requirements: A growing number of funding bodies and research councils have adopted research data sharing policies and mandate or encourage researchers to share data and outputs to avoid duplication of effort and reduce data collection costs.
  • Journal publisher requirements: A growing number of journal publishers require data that underpin research findings to be published in open access repositories when manuscripts are submitted.

There may be reasons for not sharing your data e.g. privacy and confidentiality issues, commercial value of the data. Horizon 2020 has coined the phrase: “As open as possible, as closed as necessary.

If you are unable to publicly share your data, consider the possibility that you may wish to make your data available internally to future researchers to facilitate follow-on research, and/or to create a metadata record in your chosen archives or repository. A metadata record will describe your data and aid others in knowing about it. In order to ensure this can happen you will need to manage your data.

 

Reasons for not sharing

There are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this in their Data Management Plan (DMP).

It is generally possible to choose not to share research data using the following criteria:

  • data are commercially sensitive
  • data are confidential (in connection with security issues)
  • sharing would break data protection regulations (though data which have been properly anonymised can be shared without breaching data protection regulations)
  • sharing would mean that the project's main aim might not be achieved
  • the project will not generate / collect any research data

This list has been adapted from the Horizon 2020 recommendations.

 

Access control

Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. The access controls you can put in place will be guided by those available from your chosen Archive or Repository so it's important to talk to them about your options.

Below we describe different levels of access for data:

  • Open data

Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.

  • Safeguarded data

Data that are available only under certain conditions. This is for data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data.

ISSDA provides access to safeguarded quantitative data in the Social Sciences under certain conditions. For example the user must be using the data for research or teaching purposes and must sign a legally binding End User License, which sets out additional terms and conditions.

  • Controlled data

This level of access control is suitable for data that may be disclosed. Access is generally approved by a Data Access Committee, who may require that certain training has taken place or that the data are only available from certain computers in a controlled 'data room'. 

  •  Embargo

Most data repositories allow you to place a temporary embargo on your data. During the embargo period, only the description of the dataset is published. The data themselves will become available in open access after a certain period of time.

 

Publishing and Sharing Sensitive Data

If you are conducting any study involving human participants, and wish to make the data available at the end of the study then you need to consider from the very beginning  when designing the study. Enabling others to re-use your data will mean planning for this from the start of your research project. You will need to think critically of how research data can be shared, what might limit or prohibit data sharing (e.g. consent forms, confidentiality concerns), and whether any steps can be taken to remove such limitations. In paticular you will need to ensure you are asking for informed consent to share the data.

Key messages from ANDS Publishing and sharing sensitive data guide:

  • The advantages of publishing your sensitive data will probably far outweigh any potential disadvantages when simple and appropriate steps are taken
  • Publishing your data, or just a description of your data (that is the metadata), means that others can discover and cite it
  • You can publish a description of your data without making the data itself openly accessible
  • You can place conditions around access to published data
  • Sensitive data that has been de-identified can be shared

Repositories for Sharing Data

Why Use a Data Repository to Share Your Data?

Using a data repository means that your data will be indexed by Google Dataset Search and is more likely to be found by those looking for data. If you use Box, Google Drive, or a website, your data won't be indexed by Google Dataset Search.

Very large datasets: You can have DOIT store your data in AWS and create a record in a data repository with information on your data and who to contact to get access to the data.

Sensitive data: You can keep your data on a secure server and create a record in a data repository with information on your data and the conditions under which it will be shared, such as only with qualified researchers with a data use agreement, and who to contact.

Funder Required Repository

Be sure to check your funder in case it requires that you share your data in a particular repository.

Find a Data Repository

The best place for your data is in a repository with similar data. Here are some tools for finding one:

Generalist Data Repositories
  • Zenodo is free for researchers worldwide to share their data. It provides DOIs and is indexed by Google Dataset Search. Total file size per record is 50 GB. Larger sizes can be requested and are granted on a case-by-case basis.
  • Figshare is free for researchers worldwide to share their data. It does not provide DOIs but you can obtain a ScholarWorks@UMBC DOI linking to your data in Figshare by emailing a link to your data in Figshare to ScholarWorks@UMBC. It's indexed by Google Dataset Search, It accepts files up to 20 GB. Larger datasets can be stored on Figshare+ with the payment of a one-time fee ranging from $450 for 100 GB up (Figshare+ pricing info is here: https://knowledge.figshare.com/plus)
  • Harvard Dataverse is free for researchers worldwide to share their data. It provides DOIs for datasets and is indexed by Google Dataset Search. It accepts file up to 10 GB with a 1 TB limit per researcher.
  • Open Science Framework  is free for researchers worldwide to share their data. It provides DOIs and is indexed by Google Dataset Search. It accepts file and projects up to 5 GB, but 50 GB per project if the project is public More storage available with institutional membership

These are other generalist data repositories that UMBC would need to subscribe to for our researchers to use:

  • Dryad requires an institutional membership. IIt provides DOIs for datasets and is indexed by Google Dataset Search.t accepts files up to 300 GB.
  • IEEE Dataport is free to post a dataset accessible only to IEEE DataPort subscribers and costs $1950 to post a dataset open access. It provides DOIs and is indexed by Google Dataset Search..It accepts datasets up to 2 terabytes (10 terabytes for institutional DataPort subscribers). Costs $1950 to post a dataset open access.