Skip to Main Content

Research Data

UMBC Policies and Procedures

All human subject research at UMBC requires the approval of the Institutional Review Board (IRB), whose responsibility it is to advocate for ethical standards, safeguards, and protection of human research participants. For information on the IRB process, see The Office of Research & Creative Achievement's "Overview of the IRB Process" webpage.

When doing some types of human subject research or other confidential research utilizing data from an outside agency, UMBC requires a Data Use Agreement (DUA). Information on the DUA process is available on Office of Research & Creative Achievement's "Data Use Agreements" webpage.

Principles of Human Subject Research

The Belmont Report outlines three, basic, ethical principals to guide the protection of human subjects in research:

Principle Description Related Topics
Respect for persons The autonomy of all participants in human subjects research must be respected. Informed consent (UMBC Consent Guidelines and Templates), Anonymity and Confidentiality
Beneficence Research should maximize benefits to human subjects and minimize harms. Debriefing, Right to withdraw
Justice Research should be well considered, non-exploitive, and administered fairly. Inclusion/exlusion

UMBC Consent Guidelines and Templates


Data Anonymization

FDP Tool for Classifying Human Subjects Data

18 HIPAA Identifiers that comprise Personally Identifiable Information (PII)

HIPAA – Limited Data Set

FERPA – Personally Identifiable Information

 

PII may be used alone or with other sources to identify an individual. PII in conjunction with medical records (including payments for medical care) becomes Protected Health Information (PHI).

  1. Name (including initials)
  2. Address (all geographic subdivisions smaller than state: street address, city, county, zip code)
  3. All elements (except years) of dates related to an individual (including birthdate, admission date, discharge date, date of death, and exact age if over 89)
  4. Telephone numbers
  5. Fax number
  6. Email address
  7. Social Security Number
  8. Medical record number
  9. Health plan beneficiary number
  10. Account number
  11. Certificate or license number
  12. Any vehicle identifiers, including license plate
  13. Device identifiers and serial numbers
  14. Web URL
  15. Internet Protocol (IP) Address
  16. Finger or voice print
  17. Photographic image - Photographic images are not limited to images of the face
  18. Any other characteristic that could uniquely identify the individual
A data set containing any of these identifiers, or parts of the identifier, is considered “identified”

A Limited Data Set must omit all of the HIPAA Identifiers in the left-hand column except for the following:

  1. City, state, zip code
  2. Dates of admission, discharge, service, date of birth, date of death

Ages in years, months or days or hours To re-iterate: initials are always considered PHI/PII

 

 

 

 

HIPAA – De-identified Data

All of the 18 HIPAA Identifiers in the left-hand column must be removed in order for a data set to be considered de-identified with caveats for the following:

  1. All geographic subdivisions smaller than a state, except for the initial three digits of the ZIP code: (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000;

Ages in years and for those older than 89, all ages must be aggregated into a single category of 90 or older

In the context of FERPA, PII includes, but is not limited to:

  1. Student’s name
  2. The name of the student’s parent(s) or other family members
  3. Address of the student or student’s family
  4. Student’s personal identifiers, such as:
    1. Social Security Number;
    2. Student number; or
    3. Biometric record (i.e. Finger or voice print)
  5. Student’s other indirect identifiers, such as:
    1. Birthdate;
    2. Place of birth; or
    3. Mother’s maiden name
  6. Other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty
  7. Information requested by a person who the educational agency or
institution reasonably believes knows the identity of the student to whom the education record relates

 

Free Data De-Identification Tools

Note that a human must oversee these tools to ensure that all of the data is properly de-identified.

NLM Scrubber

Files must be plain text. It works on free text such as medical histories and lab reports.

NLM Scrubber Website

NLM Scrubber User Manual

NLM Scrubber Product Guide 

CliniDelD

Files must be plain text or SQL. It works on free text such as discharge summaries. Java is required.

CliniDeID Website

The MITRE Identification Scrubber Toolkit

File must be plain text. It works on free text such as lab reports and orders. It requires Java and Python.

MITRE Identification Scrubber Toolkit Website

ARX Anonymization Tool

It works on tabular data in SQL, CSV, or Excel files.

ARX Anonymization Tool Website

 

Indigenous Data

The International Indigenous Data Sovereignty Interest Group (within the Research Data Alliance) is a network of nation-state based Indigenous data sovereignty networks and individuals that developed the ‘CARE Principles for Indigenous Data Governance’ (Collective Benefit, Authority to Control, Responsibility, and Ethics) in consultation with Indigenous Peoples, scholars, non-profit organizations, and governments:

Principle Description
Collective Benefit Data ecosystems shall be designed and function in ways that enable indigenous Peoples to derive benefit from the data.
Authority to Control Indigenous Peoples' rights and interests in indigenous data must be recognized and their authority to control such data be empowered. Indigenous data governance enables Indigenous Peoples and governing bodies to determine how Indigenous Peoples, as well as indigenous lands, territories, resources, resources, knowledge, and geographical indicators are represented and identified within data.
Responsibility Those working with indigenous data have a responsibility to share how those data are used to support Indigenous Peoples' self-determination and collective benefit. Accountability requires meaningful and openly available evidence of these efforts and the benefits accruing to Indigenous Peoples.
Ethics indigenous Peoples' rights and wellbeing should be the primary concern at all stage of the data life cycle and across data ecosystems.