GLEON Data Repositories

In collaboration with DataONE and CUAHSI GLEON offers its members two solutions to archiving data and making them discoverable and re-usable for others.

DataONE and CUAHSI provide distinct and complementary services to data users and providers. CUAHSI is the Consortium of Universities for the Advancement of Hydrologic Science, Inc., a 501(c)3 research organization representing more than 100 U.S. universities and international water science-related organizations. CUAHSI receives support from the National Science Foundation (NSF) to support the advancement of water science in the United States. Data Observation Network for Earth (DataONE - https://www.dataone.org/) is a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data; DataONE is also funded by the NSF.

GLEON has recently established a DataONE member node (GLEON Member Node), through which members can archive data in DataONE. The GLEON Information Technology working group recommends sites and members submit static, harmonized datasets to the GLEON DataONE Repository as a method of archiving data. Datasets in DataONE are discoverable and accessible to the public, but access may be restricted as necessary. All DataONE datasets in the GLEON Repository are assigned a unique digital object identifier (DOI; http://en.wikipedia.org/wiki/Digital_object_identifier) that can be used to cite the dataset and provide attribution to data authors. DataONE can be used to archive any data type: time series of high frequency streaming sensor data, data derivatives, combined streaming datasets, and manually collected datasets.

In contrast, the CUAHSI tool is ideal for making accessible streaming high frequency sensor data in a dynamic approach. Near real time access, subsetting, plotting, editing functionality. The data can be discovered by users through the HydroDesktop (http://hydrodesktop.codeplex.com/) mapping tool. Data providers do an initial set up, after which streaming data are easily loaded into the system. Datasets within the CUAHSI system are harmonized, that is, have consistent structure and format, and thus can easily be combined for multi-site analyses.  The rigorous metadata requirements result in CUAHSI data holdings being easily analyzed, centrally quality controlled, and quickly visualized. Downstream applications developed for CUAHSI can be broadly applied across the user community.

Although data systems in the future will evolve, data providers who invest in archiving data through DataONE and CUAHSI now will be positioned to adapt to these changes more easily because the data are well documented.

Characteristics of GLEON services to support reproducible research:

DataONE CUAHSI
Datasets are immutable; updates will create new versions of the dataset Datasets are mutable by data provider without notification to data users
Providers can submit records consisting only of metadata with provider contact information so that users can contact data provider directly for access  
Datasets can be set as public, or access can be restricted to selected users All data are public
Each dataset receives a unique DOI  
Current cost: FREE Current cost: FREE
Major data wrangling needed to combine data from different users Little data wrangling needed by the user
Lower effort to submit metadata and data Significant effort to set up metadata the first time
Dataset-by-dataset QA/QC, visualization, and analysis Potential for streamlined QA/QC, visualization, and analysis for multiple datasets
Web search and access to datasets or access data directly through scripting languages Requires Windows machine to run HydroDesktop to search and access data, or access data directly through scripting languages
Any data types Designed for high frequency time series sensor data
Data downloads are tracked  
Metadata structure is defined, but content is free form Highly defined metadata structure and content
  Defined semantics and controlled vocabulary
Not queryable within datasets Can search within and across datasets and download the combined results
Data are long-term archived within GLEON member node and backed up on other nodes by DataONE Datasets are stored in a cloud location and backed up centrally by CUAHSI
No requirements for data structure and format (no harmonization required by data provider). Any schema may be used. Provider required to submit harmonized (consistent) data structure. No harmonization required by data user to combine data sets.
Web services available: REST API SOAP but not REST services
Uses XML such as EML, FGDC, etc.  Can hold any data type Only WaterML data types

 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer