You may be able to re-use both data that is and isn't publicly available for free. When re-using data, always cite the data set(s) that you're re-using.

Here are some considerations when thinking about re-using data:

Is the data available and what are the terms of use?

Free online data may be on a Creative Commons license giving you permission to re-use the data in specific ways. More information on Creative Commons licenses and the permissions that the give you is available here: If the open data that you want to re-use isn't a on a Creative Commons License, check it for information on terms and constraints on re-use. For example, if you find a dataset in ICSPR, they often say that the data is available to anyone affiliated with a member institution, and if you check their member list, you'll find the UMBC is a member.

The confidentiality of health and education data is protected by HIPPA and FERPA which may limit its accessibility. A data use agreement or license may be required, limiting how the data can be used. See below for more information on data use agreements. Data may be on a Creative Commons license granting the public certain rights. See the bottom section here in UMBC's Guide to Open Access to learn more about Creative Commons license.

Availability of documentation.

Ideally, researchers include high quality documentation with their data, but if information is missing, you may need to contact the researcher for more information. If working with data provided by a commercial vendor, the documentation may be included with the license, and you may need to contact the person or department who purchased the data to obtain that documentation. For documentation of government or organizational data, you usually need to contact them directly.

Who or what the data describes.

Assess how well the subject of the data meshes with your research question. Determine if it includes the concept you want to measure and that it states the unit of analysis or observation. Determine if the data aggregate, tabular, or individual records and ensure that it will work for your research. If the data is from a survey or interview, if individual responses are needed, ensure that they’re included.

The characteristics of the sample

The sample should be statistically significant. When applicable, it should include data on the geographic locations you are researching and include data on the years that you are researching. Check that any other limiting factors on the data will mesh with your research.

The reason why the data was collected.

Research data is more easily re-purposed for other research than data collected for other purposes. Data collected for other purposes, such as health records or data collected by the census is called administrative data. Administrative data may not be adequately documented or structured for research. If administrative data can be used for research, it may require a lot more data cleaning and management.

The Quality of the data

The data is likely to be of better quality when quality control procedures were in place when collecting the data and you should look for this in the documentation. If the data was the basis for a published research paper, it’s validated by peer review which is an indicator of good quality. If the data is under expert curatorship by an expert, it’s also more likely to be better quality.

Who uses the data

If nobody is using the dataset, that is a red flag. By asking who is using this data (local or not), you can connect with other researchers to work out gaps in documentation and learn from their experience. In addition to looking at published works, look at working paper, unpublished works, and preprint works to find out who is using the data.

Data Use Agreements

Some data you discover online may be restricted-use because it contains confidential, protected , or restricted-use data. In these instances, a Data Use Agreement (DUA) may be required. Individual UMBC researchers aren't authorized to sign DUAs--only the the UMBC Office of Sponsored Programs (OSP) can sign a DUA. For more information on DUAs and procedures for submitting one to OSP for review, approval, and signature, see the OSP Data Use Agreement page, here:

Citing Data and Statistics

Citations for data or statistical tables should include at least the following pieces of information, which you will need to arrange according to the citation style you use.  

  • Author or creator - the person(s), organization, issuing agency or agencies responsible for creating the dataset
  • Date of publication - the year the dataset was published, posted or otherwise released to the public (not the date of the subject matter).
  • Title or description - complete title or if no title exists, you must create a brief description of the data, including time period covered in the data if applicable
  • Publisher  - entity (organization, database, archive, journal) responsible for hosting the data 
  • URL or DOI  - the unique identifier if the data set is online

Certain styles may also ask for additional information such as:

  • Edition or version
  • Date accessed online (Note: APA does not require this)
  • Format description e.g. data file, database, CD-ROM, computer software

Tips for finding additional citation guidance:

  • If there is a readme file with the data, there may be a recommended citation in it.
  • Check to see if the publisher or distributor of your dataset provides suggestions for citing their data.  For example data providers like OECD and repositories like ICPSR and Dryad offer guidance for formatting citations to the hundreds of datafiles they host or produce.
  • Look through your style manual for instructions on using a similar format such as citation styles for electronic resources, electronic references, web pages, or tables.

Unless otherwise noted, the basic elements and guidelines described here are from the Publication Manual of the American Psychological Association, 6th edition (McHenry Reference Desk BF 76.7 .P83 2010).  You may also wish to consult the Purdue OWL or How to Cite Data from Michigan State University for MLA examples and explanations.


1. Include format type in brackets [ ] to describe format, not title information (e.g. data set, data file and codebook).  [See APA guidelines for "Nonroutine information in titles" (pp. 186)]

2. Use “Available from” if the URL or DOI points you to a website or information on how to obtain or download data at a general site that houses data sets. Use “Retrieved from” if the URL or DOI takes you directly to the data table or database. (APA Style Manual, 2001 ed., pp.281 or Purdue OWL Electronic Sources: Data Sets)

I. Data sets:

Author/Rightsholder, A. A. (Year). Title of publication or data set (Version number if available) [Data File]. Retrieved from (or available from) http://xxxx

The title of the data set should be italicized unless the data set is included as part of a larger work or volume

Example of data set:

The World Bank, World Development Indicators (2012). GNI per capita, Atlas method  [Data file]. Retrieved from

Example of Table generated from an interactive data set:

Bureau of Economic Analysis, U.S. Department of Commerce (2013). U.S. Direct Investment Abroad, All U.S. Parent Companies 2009-2010. [Data file].  Available from 

II. Table from a publication 

Author. (Year). Title of entry. In Editor (Edition), Title of publication (pp. xxx-xxx). Retrieved from http:// OR Location: Publisher OR doi:xxxx.

Example of a Table from a publication: (Note: Editor & Edition elements are not applicable in this example)

World Trade Organization. (2012). Table I.3: World merchandise trade and trade in commercial services by region and selected economy, 2005-2011.  In International Trade Statistics, 2012 (p. 22).  Retrieved from:

The title of the data set should be italicized unless the data set is included as part of a larger work or volume, as in the example above.