The NSF Arctic Data Center: A New Home for Arctic Research Data

By: Amber E. Budden, DataONE, Co-Principal Investigator Arctic Data Center; Matthew B. Jones, National Center for Ecological Analysis and Synthesis (NCEAS), Principal Investigator Arctic Data Center; and Mark P. Schildhauer, NCEAS, Co-Principal Investigator Arctic Data Center

In March 2016, the Arctic Data Center was launched and assumed preservation responsibility for Arctic research data from National Science Foundation (NSF) awards. The center serves as the NSF research community's primary repository for Arctic data preservation and data discovery, and is funded by the NSF via a five-year award. The Arctic Data Center currently lists 3,899 data sets covering data from myriad research fields, such as plant ecology (Eissenstat 2016), glacial chemistry (McConnell 2015), oceanography (Aagaard et al. 2016), and limnology (Arp), among many others.

The Arctic Data Center is a product of a national partnership, led by the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara, along with the National Oceanic and Atmospheric Administration's National Centers for Environmental Information (NCEI), and the NSF-funded Data Observation Network for Earth (DataONE). To preserve data over the long-term requires multiple partners that are institutionally stable and funded through diverse streams. Hence the partnership behind the Arctic Data Center ensures there are multiple agencies involved in data archival; NCEAS is leading the effort in software development, whilst enabling replication across the NOAA archives as well as participating in the large scale DataONE federation.

Arctic Data Center Services

The NSF Arctic Data Center provides data storage, curation, and discovery features needed to support NSF's Arctic research community. Though the Center is focused on data preservation, this is but one step of a multi-stage data lifecycle. Data archival is embedded within a greater scientific mission and many Arctic researchers and data producers have needs that go beyond simply preservation—including data management planning, data acquisition, analysis, and other stages within the data lifecycle. As such, the Arctic Data Center also provides tools that support researchers in those areas. Further, the Center is able to archive not just data, but other research products such as graphics, software, workflows, and provenance information that encompass the entire research process.

The key deliverables for the Arctic Data Center, as laid out in the funding proposal to NSF, include: 1) a repository for NSF funded Arctic research data; 2) a user-friendly portal for data discovery and access; 3) tools to support data and metadata submission; 4) data recovery and support services; and 5) training, education, and outreach. Currently, the repository, search and discovery portal, and initial data submission tools are live, in addition to a team actively supporting investigators with their data upload and recovery needs. Over the next year, significant new capabilities and features supporting Arctic researchers will be added.

Data Discovery Portal and Upload Tool

Figure 1: Arctic Data Center discovery portal, showing the geographic distribution of data sets, the list of recently added data sets, and filtering tools for precisely searching for data of interest. Image courtesy of the Arctic Data Center.Figure 1: Arctic Data Center discovery portal, showing the geographic distribution of data sets, the list of recently added data sets, and filtering tools for precisely searching for data of interest. Image courtesy of the Arctic Data Center.

The NSF Arctic Data Center interface allows users to search for data from the extensive Arctic data collection using filters such as the name of the data creator, year, identifier, taxa, location, keywords, and others (see Figure 1). This discovery interface also provides a map-based overview of the spatial distribution of data sets and allows users to zoom and pan to specific locations of interest, which will be helpful in locating historical data in specific regions. Users can quickly identify whether a record includes downloadable data, the number of views received by the record, and a brief overview of the content. Opening a record provides rich metadata in a standardized, easy to read format, and the option to download individual data files (see Figure 2). Users may also quickly copy citation information for each record.

Figure 2: Example view of a metadata record describing meteorological data from Lynch 2016. Image courtesy of the Arctic Data Center.Figure 2: Example view of a metadata record describing meteorological data from Lynch 2016. Image courtesy of the Arctic Data Center.

Using the "Submit Data" button, authors are able to seamlessly upload and share their data from their desktop, contributing associated metadata and attaching data files. Once the data and metadata are reviewed and edited, Center staff assign a Digital Object Identifier so that the data are easily citable. For larger data sets, the Center supports automating uploads through scripting with common languages including R and MATLAB.

Community Participation and Training

To ensure the tools and services developed by the Center are meeting the needs of the Arctic research community, a Steering Committee comprising leaders across domains has been established. This committee will guide the Center leadership in their activities and help set prioritization for future developments. The Center will also be actively engaging researchers at society and other meetings, through webinars, and at workshop training events to determine how to improve support for open, reproducible science for the Arctic.

Early career researchers will be eligible to further participate in the Arctic Data Center through a fellowship program focused on data-management and open-science. Fellows will work in cohorts, will be provided with training in data management and science communication, will benefit from hands-on experience working within the Arctic Data Center environment, and will participate in team meetings. We are currently recruiting two new positions who will oversee these fellowship students, one focused on training and outreach, and the other on data science activities.

The Arctic Data Center will also support an annual "Arctic Synthesis Science" Working Group in order to advance innovative, integrative Arctic science research, as well as test and inform the utility of the Arctic Data Center repository for assisting with such research. Working Groups are built on the model of NCEAS synthesis working groups and will conduct interdisciplinary work, using existing data, carried out by scientists from multiple institutions.

Finally, the Center will support annual data science training events both at the Center in Santa Barbara and associated with meetings of the Arctic research community. These training events will center around techniques and approaches towards open science and data-driven synthesis, including topical foci on research computing, communication, collaboration, and other aspects of data science that support scientific synthesis. While our primary goal is to improve the research community's ability to use Arctic Data Center systems, we also anticipate significant improvements based on feedback and usability testing conducted during these training events.

New and Upcoming Features

While the initial release of data systems for the Arctic Data Center are fully functional, we plan numerous new features to improve the usefulness, capabilities, and efficiency of the system for researchers. Current work focuses on evaluating, streamlining, and improving the submission system to accommodate complete metadata and a streamlined editing process. Over the course of the project we will be adding new features for automated metadata and data quality checks, for advanced editing of provenance information, and for new analytical tools like IDL and Matlab that can access and submit data to the system. These improvements will be driven by an extensive dialog with the Arctic research community, and we welcome suggestions for new features, services, and programs serving Arctic research.

Further information is available on the Arctic Data website.

References

Knut Dr. Aagaard, Rebecca Woodgate, and Thomas J. Weingartner. 2016. Alpha Helix 2001 Bering Strait Cruise Underway Data. NSF Arctic Data Center. doi:10.5065/D6N014NC.

Christopher D. Arp. 2013. Eastern Lake water hourly time series Temperature, depth, dissolved oxygen, conductivity, water level (CALON). NSF Arctic Data Center. doi:10.18739/A2K31V.

David Eissenstat. 2016. Greenland root phenology: warming and herbivory. NSF Arctic Data Center. doi:10.18739/A23H1N.

Amanda H. Lynch. 2016. Council Climate NCAR ISS 915 Mhz Profiler Winds (ASCII). NSF Arctic Data Center. doi:10.5065/D6474829, version: urn:uuid:76005b63-e42f-41ac-a76e-58e274781c61.

Joseph R. McConnell. 2015. Glaciochemical measurements in McCall Glacier (AK) upper cirque ice core. NSF Arctic Data Center. doi:10.18739/A2PS56.