You may be able to re-use both data that is and isn't publicly available for free. When re-using data, always cite the data set(s) that you're re-using.
Here are some considerations when thinking about re-using data:
Is the data available and what are the terms of use?
Free online data may be on a Creative Commons license giving you permission to re-use the data in specific ways. More information on Creative Commons licenses and the permissions that the give you is available here: https://creativecommons.org/about/cclicenses/. If the open data that you want to re-use isn't a on a Creative Commons License, check it for information on terms and constraints on re-use. For example, if you find a dataset in ICSPR, they often say that the data is available to anyone affiliated with a member institution, and if you check their member list, you'll find the UMBC is a member.
The confidentiality of health and education data is protected by HIPPA and FERPA which may limit its accessibility. A data use agreement or license may be required, limiting how the data can be used. See below for more information on data use agreements. Data may be on a Creative Commons license granting the public certain rights. See the bottom section here in UMBC's Guide to Open Access to learn more about Creative Commons license.
Availability of documentation.
Ideally, researchers include high quality documentation with their data, but if information is missing, you may need to contact the researcher for more information. If working with data provided by a commercial vendor, the documentation may be included with the license, and you may need to contact the person or department who purchased the data to obtain that documentation. For documentation of government or organizational data, you usually need to contact them directly.
Who or what the data describes.
Assess how well the subject of the data meshes with your research question. Determine if it includes the concept you want to measure and that it states the unit of analysis or observation. Determine if the data aggregate, tabular, or individual records and ensure that it will work for your research. If the data is from a survey or interview, if individual responses are needed, ensure that they’re included.
The characteristics of the sample
The sample should be statistically significant. When applicable, it should include data on the geographic locations you are researching and include data on the years that you are researching. Check that any other limiting factors on the data will mesh with your research.
The reason why the data was collected.
Research data is more easily re-purposed for other research than data collected for other purposes. Data collected for other purposes, such as health records or data collected by the census is called administrative data. Administrative data may not be adequately documented or structured for research. If administrative data can be used for research, it may require a lot more data cleaning and management.
The Quality of the data
The data is likely to be of better quality when quality control procedures were in place when collecting the data and you should look for this in the documentation. If the data was the basis for a published research paper, it’s validated by peer review which is an indicator of good quality. If the data is under expert curatorship by an expert, it’s also more likely to be better quality.
Who uses the data
If nobody is using the dataset, that is a red flag. By asking who is using this data (local or not), you can connect with other researchers to work out gaps in documentation and learn from their experience. In addition to looking at published works, look at working paper, unpublished works, and preprint works to find out who is using the data.