CHAPEL HILL – RENCI and the School of Information and Library Sciences (SILS) at UNC Chapel Hill have joined a nationwide effort to build an integrated data management infrastructure that will help scientists better understand the complex processes of the planet Earth, from meteorology to solid earth sciences to hydrology and oceanography.
Earth Cube, a project of the National Science Foundation, seeks to transform geoscience research by creating a seamless data infrastructure that allows scientists to easily integrate, share and manage data and use a wide range of data analysis tools. The project launched in 2011 by seeking input from scientists and technologists. The NSF brought both groups together for a meeting in Washington, D.C., last November.
In early 2012, the NSF awarded several Early-concept Grants for Exploratory Research (EAGER) to research teams to develop plans for addressing the technical challenges of building a geoscience data infrastructure. One of those awards went to a team led by Reagan Moore, RENCI chief data scientist and a SILS professor. The team will use a relatively small ($61,719) NSF grant to develop their roadmap for implementing a data infrastructure.
“This will be a one-year effort to develop a roadmap that describes how we would implement a data infrastructure that supports collaboration, data sharing and preservation,” said Moore. “We are one of several groups looking at the problem and developing roadmaps to address it. It’s exciting to be involved in Earth Cube because we work closely with scientists who are at the forefront of important research related to the environment, climate change, water quality, and much more. ”
Next year, the NSF will review the roadmaps and issue a call for proposals to develop prototype data infrastructures. Based on analysis of those prototypes, production-quality infrastructure will be developed that allows scientists to share models and analysis tools, collaborate over long distances, and share data created in different formats using different query terms and semantics.
Moore’s team, called the Layered Architecture team, will develop a plan for a geosciences data management environment that enables collaborative research. Scientists must be able to access community resources such as data repositories, information catalogs, processing pipelines, and web services, while collaborating on research initiatives, according to Moore “A layered architecture is needed that manages the multiple components of the collaboration environment,” he said.
The layers include a unifying persistent name space, which provides researchers access to data repositories; an agreed upon set of properties that describe shared data collections so data management policies can be consistently enforced; and an environment that allows researchers to share their data analysis workflows and analysis results.
A variety of technologies would be used in the layered architecture, including the integrated Rule Oriented Data System (iRODS), a rule-based system for organizing, managing and sharing digital data. iRODS was developed by the Data Intensive Cyber Environments research group at UNC’s SILS and the University of California, San Diego. Groups at RENCI and SILS continue to support iRODS development.
RENCI researcher Michael Shoffner will participate in the early-stage project by setting up an interoperability test bed that will identify existing cyberinfrastructure components used by geoscientists and determine their level of interoperability, or how well they work over different scientific domains that use different query terms and semantics.
Other participating organizations in the project are George Mason University, Woods Hole Oceanographic Institute, the Open Geospatial Consortium (a group aimed at building consensus on publically available interface standards), the National Center for Supercomputing Applications at the University of Illinois, the Institute for the Environment at UNC Chapel Hill, the University of California at San Diego, and Colorado State University.