Why Share Research Data?
Researchers devote a large amount of physical and intellectual effort to collect, manage, collate, and analyse their data before publishing their results. Many of these datasets have significant value beyond the usage for the original research, and sharing the data can be seen as beneficial in a number of ways:
- Research integrity and reproducibility: Publishing research data and citing its location in published research papers allows other to replicate, validate or build upon your results thus improving the scientific record by encouraging scientific enquiry and debate. Openly sharing research data also encourages the improvement and validation of research methods and minimises the need for data re-collection.
- Preservation of Research Data: Some research data will be unique and cannot be replaced if destroyed or lost. Sharing via a repository will mean that the repository will look after and preserve your data into the future, even after technology becomes obsolete.
- Innovation: Data created for one research purpose may be re-invented or re-interpreted for future unrelated research and into contexts not currently envisaged. Data sharing and re-use across borders and disciplines can also promote innovation by potential new data users.
- Impact: Others who re-use your data and cite it in their own research help to raise interest in your research and increase your impact within your field and beyond. “Open” data leads to increased citations of the data itself, and of associated papers.
- Funder requirements: A growing number of funding bodies and research councils have adopted research data sharing policies and mandate or encourage researchers to share data and outputs to avoid duplication of effort and reduce data collection costs.
- Journal publisher requirements: A growing number of journal publishers require data that underpin research findings to be published in open access repositories when manuscripts are submitted.
There may be reasons for not sharing your data e.g. privacy and confidentiality issues, commercial value of the data. Horizon 2020 has coined the phrase: “As open as possible, as closed as necessary.”
If you are unable to publicly share your data, consider the possibility that you may wish to make your data available internally to future researchers to facilitate follow-on research, and/or to create a metadata record in your chosen archives or repository. A metadata record will describe your data and aid others in knowing about it. In order to ensure this can happen you will need to manage your data.
Reasons for not sharing
There are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this in their Data Management Plan (DMP).
It is generally possible to choose not to share research data using the following criteria:
- data are commercially sensitive
- data are confidential (in connection with security issues)
- sharing would break data protection regulations (though data which have been properly anonymised can be shared without breaching data protection regulations)
- sharing would mean that the project's main aim might not be achieved
- the project will not generate / collect any research data
This list has been adapted from the Horizon 2020 recommendations.
Access control
Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. The access controls you can put in place will be guided by those available from your chosen Archive or Repository so it's important to talk to them about your options.
Below we describe different levels of access for data:
Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.
Data that are available only under certain conditions. This is for data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data.
ISSDA provides access to safeguarded quantitative data in the Social Sciences under certain conditions. For example the user must be using the data for research or teaching purposes and must sign a legally binding End User License, which sets out additional terms and conditions.
This level of access control is suitable for data that may be disclosed. Access is generally approved by a Data Access Committee, who may require that certain training has taken place or that the data are only available from certain computers in a controlled 'data room'.
Most data repositories allow you to place a temporary embargo on your data. During the embargo period, only the description of the dataset is published. The data themselves will become available in open access after a certain period of time.
Publishing and Sharing Sensitive Data
If you are conducting any study involving human participants, and wish to make the data available at the end of the study then you need to consider from the very beginning when designing the study. Enabling others to re-use your data will mean planning for this from the start of your research project. You will need to think critically of how research data can be shared, what might limit or prohibit data sharing (e.g. consent forms, confidentiality concerns), and whether any steps can be taken to remove such limitations. In paticular you will need to ensure you are asking for informed consent to share the data.
Key messages from ANDS Publishing and sharing sensitive data guide:
- The advantages of publishing your sensitive data will probably far outweigh any potential disadvantages when simple and appropriate steps are taken
- Publishing your data, or just a description of your data (that is the metadata), means that others can discover and cite it
- You can publish a description of your data without making the data itself openly accessible
- You can place conditions around access to published data
- Sensitive data that has been de-identified can be shared