"Discover Data" is borrowed from the NYU Libraries Data Management Planning LibGuide section titled "Selecting a Repository" and is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Inter-University Consortium for Political and Social Research (ICPSR) An international consortium of about 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community. ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. UMBC is a member giving all authorized library users free full access. Be sure you use your UMBC email address when creating an account to be authenticated as a member. Here is more of what ICPSR offers you as a member:
15,000+ Studies from 40 social and behavioral science disciplines: Find data here: https://www.icpsr.umich.edu/web/pages/ICPSR/index.html
5 million variable: Search and compare variables here: https://www.icpsr.umich.edu/web/pages/ICPSR/ssvd/
A bibliography of data-related publications containing 95,000+ publications: Find data related publications here: https://www.icpsr.umich.edu/web/pages/ICPSR/citations/
Will receive, enhance, clean, and share your data and technical documentation: see the ICPSR Data Preparation Guide
A help desk with on-demand data support: Email icpsr-help@umich.edu
Tracks how often datasets are downloaded or accessed: The number of downloads and a link to a usage report appears with every data set
Assists with data management plans: Information on data management and curation with ICPSR is available here: https://www.icpsr.umich.edu/web/pages/datamanagement/index.html
Workshops and webinars: See their events page for a schedule: https://www.icpsr.umich.edu/web/pages/about/events.html.
Lesson plans and exercises for faculty teaching data to undergraduates: See students: https://www.icpsr.umich.edu/web/pages/instructors/
Resources for students: See https://www.icpsr.umich.edu/web/pages/instructors/student-resources.html.
Qualitative Data Repository. QDR is a dedicated repository for preserving and sharing the digital assets associated with social science and mixed methods projects. It was founded with support from the National Science Foundation and the Center for Qualitative and Multi-Method Inquiry, a unit of the Maxwell School of Citizenship and Public Affairs at Syracuse University.
Australian Social Science Data Archive. From the Australian Demographic and Social Research Institute at the Australian National University.
CESSDA Data Portal. From the Council of European Social Science Data Archives (CESSDA).
National Neighborhood Data Archive(NaNDA). The National Neighborhood Data Archive (NaNDA) is a publicly available data archive containing measures of the physical, economic, demographic, and social environment at multiple levels of spatial scale (eg, census tract, ZIP code tabulation area, county). Each NaNDA dataset covers all or most of the entire nation (including both rural and urban areas) and represents a set of measures on a single topic of interest, including socioeconomic disadvantage, healthcare, housing, partisanship, and public transit, with temporal coverage dating back to 2000.
Digital Repositories E-Science Network (DReSNeT). From the UK Engineering & Physical Sciences Research Council (EPSRC). A network of social science repositories for texts and data.
Astronomy
Astronomical Data Archives Center From the National Astronomical Observatory of Japan.
Astrophysics Data System From the Smithsonian Astrophysical Observatory (SAO) and National Aeronautics and Space Administration (NASA).
National Space Science Data Center From the US National Aeronautics and Space Administration (NASA).
Biology
The Cell: An Image Library Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library. Freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes.
DataBasin OA data in conservation. From the Conservation Biology Institute in partnership with Rhiza Labs.
GENBank The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2013 Jan;41(D1):D36-42). GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.
Global Biodiversity Information Facility (GBIF) "Free and open access to biodiversity data." Launched in 2007 by institutions in 17 countries under a non-binding inter-governmental agreement.
MorphoBank "Homology of phenotypes over the web." Hosted by the State University of New York at Stony Brook.
Morphbank Holds biological Imaging documents from a wide variety of research including: specimen-based research in comparative anatomy, morphological phylogenetics, taxonomy and related fields focused on increasing our knowledge about biodiversity. The project receives its main funding from the Biological Databases and Informatics program of the National Science Foundation (Grant DBI-0446224).
TreeBASE "A Database of Phylogenetic Knowledge." Released in March 2010 based on a prototype launched in 1994. Hosted by the Phyloinformatics Research Foundation.
Chemistry
The Cambridge Crystallographic Data Centre (CCDC) The CCDC is a non-profit, charitable Institution whose objectives are the general advancement and promotion of the science of chemistry and crystallography for the public benefit.
Crystallography Open Database A joint project of the Mineralogical Society of America, Mineralogical Association of Canada, European Journal of Mineralogy,International Union of Crystallography, and the US National Science Foundation. Data are in the public domain.
ZINC "A free database of commercially-available compounds for virtual screening." From the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco.
Computer Science
GitHub Keeps your public and private code available, secure, and backed up.
SourceForge 2.7 million developers create powerful software in over 260,000 projects. Our popular directory connects more than 46 million consumers with these open source projects and serves more than 2,000,000 downloads a day. SourceForge is where open source happens.
SNAP Stanford Large Network Dataset Collection. The SNAP library is being actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges.
Energy
DOE Data Explorer From the US Department of Energy (DOE). Data generated by DOE-sponsored research.
OpenEI: Open Energy Information Freely-available energy data, tools, models, and other resources.
Environmental Sciences
Climate Change Data Portal From the Environment Department of the World Bank.
The Marine Geoscience Data System (MGDS) The Marine Geoscience Data System (MGDS) provides access to data portals for the NSF-supported Ridge 2000 and MARGINS programs, the Antarctic and Southern Ocean Data Synthesis, the Global Multi-Resolution Topography Synthesis, and Seismic Reflection Field Data Portal.
National Ecological Observatory Network (NEON). A joint project of 50+ US universities and laboratories.
Geology
GSA Data Repository From the Geological Society of America.
IRIS (Incorporated Research Institutions for Seismology). From 100+ US universities and the National Science Foundation.
Geosciences & Geospatial Data
EarthChem Holds data systems and services for geochemical, geochronological, and petrological data, developed and maintained by EarthChem, including the EarthChem Library, the EarthChem Portal, PetDB, NAVDAT, SedDB, and Geochron. EarthChem is operated by a joint team of disciplinary scientists, data scientists, data managers and information technology developers who are part of the NSF-funded data facility Integrated Earth Data Applications (IEDA).
Geodata Repository From the Open Source Geospatial Foundation.
The Geosciences Network (GEON) This project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research. GEON is funded by the NSF Information Technology Research (ITR) program.
National Geographic Data Center An archive of national and international marine environmental and ecosystem datasets.
The National Space Science Data Center This serves as the permanent archive for NASA space science mission data. "Space science" means astronomy and astrophysics, solar and space plasma physics, and planetary and lunar science. As permanent archive, NSSDC teams with NASA's discipline-specific space science "active archives" which provide access to data to researchers and, in some cases, to the general public.
Medicine
All of Us Research Hub The Research Hub houses one of the largest, most diverse, and most broadly accessible datasets ever assembled. It also provides an interactive Data Browser where anyone can learn about the type and quantity of data that All of Us collects. Users can explore aggregate data including genomic variants, survey responses, physical measurements, electronic health record information, and wearables data.
Gene Expression Omnibus From the U.S. National Center for Biotechnology Information of the National Institutes of Health.
MIRAGE (Middlesex medical Image Repository with a CBIR ArchivinG Environment). From JISC and Middlesex University.
National Center for Biotechnology Information (NCBI) The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
NeuroMorpho Neuronal morphology data. From the Krasnow Institute for Advanced Study at George Mason University.
Virginia Henderson Global Nursing e-Repository Nursing research data.
Physics
Blue Obelisk Data Repository Repository of isotope masses, under MIT license. From the Blue Obelisk. Described in 10.1021/ci050400b.
CERN Scientific Information Online particle physics data and information
Nist Atomic Spectra Database The Atomic Spectra Database (ASD) contains data for radiative transitions and energy levels in atoms and atomic ions. Data are included for observed transitions of 99 elements and energy levels of 56 elements.
DataONE An international federation of data repositories containing earth observations data, including data from fields such as ecology, biology, evolution, and environmental sciences such as hydrology, oceanography, and atmospheric science. DataONE is a federation with participation from hundreds of field stations, universities, and government agencies through the DataONE Member Nodes.
Dryad An international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Dryad is a non-profit organization.
Entrez databases A directory of chemical, biochemical, biomedical, and medical databases from the U.S. National Center for Biotechnology Information of the National Institutes of Health.
FigShare FigShare allows you to share all of your data, negative results and unpublished figures.
KNB The Knowledge Network for Biocomplexity (KNB) is an international data repository containing ecology, biology, and environmental science data with a global distribution. The KNB is a grass-roots partnership of collaborating feld stations, laboratories, and research networks that openly publish and share data. The KNB is a Member Node within the DataONE data federation.
PANGAEA Stands for "Publishing Network for Geoscientific & Environmental Data". Open to deposits from any scientist. Most datasets are open; some are restricted. Hosted by the Alfred Wegener Institute for Polar and Marine Research and the University of Bremen's Center for Marine Environmental Sciences.
Public Data Sets on AWS from Amazon Web Services. The site already hosts OA datasets in biology, chemistry, and economics, and is willing to host them in any field.