Skip to Main Content

Research Data

Active Storage Site Selection

"Active" or "Working" storage refers to the where you store your data while you're collecting and accessing it during the course of a project. Some storage options will better meet your project's needs, others not so much.

UMBC has a campus-wide subscription to Lab Archives for data collection and research documentation. It allows all research team members to work together and communicate, with granular access control, change history, offsite disaster-recovery back-ups. and more.

Box (UMBC FAQ), Google Drive (UMBC FAQ ), and Microsoft OneDrive are available to UMBC Faculty, Staff, and Students. DoIt has a chart comparing Box and Google Drive, here: https://wiki.umbc.edu/pages/viewpage.action?pageId=31916775.

Factors to consider when choosing where to store your data:

  • Anticipated size of dataset--Will you exceed space quotas? Will a cloud service readily upload and download files of that size?
  • Computational requirements--Do you need high speed/performance processors for large scale analysis? If so, consider using the UMBC High Performance Computing Facility (you'll need to use the Linux operating system). This isn't appropriate for data storage that doesn't need to use the HPCF processors. It also doesn't provide backups or version control like the cloud storage, so you'll need to manage that  yourself.
  • Sharing capabilities and permission settings--Do you have a project team that will need to access the data? Do you want to limit what student assistants or other project participants can do?
  • Version control--Will it be helpful to have a history of the changes made to your data? Will your storage do this automatically for you? Or do you need to design and use a version control table?
  • Backup--Will it backup your data? Or do you need set up a backup?
  • Security--Will it meet IRB standards for storage of human subjects data? Is the data encrypted? Does it use secure transmission channels? Does it require strong passwords?

Backup

3-2-1 RULE

To keep data safe, it is recommended that folks follow the 3-2-1 Rule, which suggests you maintain three copies of your data on two different storage types, with 1 of those being offsite:

3-2-1 rule as described above with some clipart

 

3-2-1 WITH UMBC RESOURCES

Both Google Drive and Box have desktop applications (Google FilestreamBox Drive) where folks can mount and access files quickly. When downloaded and installed, the applications create a folder that appears just like a My Documents folder, only it’s connected to your account on whatever service (so it’s Google Drive or Box in your file explorer). Then it operates like a two-way door: changes will be synced to and from your local computer to the service in the cloud.

This helps us stick to the 3-2-1 rule pretty nicely as well:

  1. Sync data between local copies (on all my computers) and on the Google Drive server located elsewhere.
    1. So this is 2 copies on 2 different storage media, with 1 copy offsite
  2. Run the backup to an external hard drive over the Google Drive folder on my laptop whenever there are changes.
    1. This brings us to 3 copies on 2 media with 1 offsite copy!

Data Collaboration and Sharing Tools

The Open Science Framework

The Open Science Framework (OSF) is a Web-based project management tool created by the Center for Open Science (COS). It was designed to promote research transparency, quality and reproducibility. The OSF allows individuals or groups to develop a project workflow, organize data, develop documentation and share all or part of your project with the greater research community.

  • You have the ability to set very granular permissions (Administrator/Read+Write/Read) for all contributors on a project.  Contributors can be from anywhere.
     
  • You can link in existing file storage and versioning tools such as Google Drive, Box, Amazon S3, DropBox, and GitHub as well as bibliographic management tools Mendeley and Zotero. They are integrating new tools regularly.  OSF also provides their own storage that is unlimited, though individual files need to be 5GB or less.
     
  • Many file formats will render in the OSF to allow for previews (Word, Powerpoint, Excel, PDF, jpeg).
     
  • OSF provides built-in version controll that allows users to "check out" files for editing and check them back in before re-upload.  All previous version are accessible through the Revisions button.
     
  • You can register a project to provide a public "snapshot" version of where you are at a given point (helps with transparency)
     
  • You can choose to keep some parts of your project private (raw data) and others public (analysis plan) due to the component structure. 
     
  • You can fork projects should you be using your OSF to manage multiple, similarly structured projects (e.g. lab template). You can also link projects as well (e.g. multi-site projects).
     
  • You can view analytics for each component of your project to see how many times your project has been visited (does track unique views) and what pages are most popular.  
     
  • You can send a "view only" link to those outside of the project (e.g. publishers, potential funders) for verification and transparency purposes.