CHAPEL HILL, NC, Sept. 28, 2011–The National Science Foundation has funded the University of North Carolina at Chapel Hill to lead a multi-institutional team that will build and deploy a prototype national data management infrastructure that addresses some of the key data challenges facing scientific researchers in the digital age.
The infrastructure will support collaborative multidisciplinary research through shared collections, data publication within digital libraries and reference collections within persistent archives.
The NSF awarded nearly $8 million over five years to the DataNet Federation Consortium (DFC), a group that spans seven universities. The DFC will address the data management needs of six science and engineering disciplines: oceanography, hydrology, engineering design, plant biology, cognitive science, and social science. About half the award will support research and development at UNC-Chapel Hill.
The Data Intensive Cyber Environments (DICE) research group in UNC’s School of Information and Library Science (SILS) leads the consortium and RENCI (the Renaissance Computing Institute at UNC-Chapel Hill) is responsible for federating the consortium’s diverse data repositories to enable cross-disciplinary research. Federating data involves building a common name space for identifying files, providing a context for file meaning and relevance, providing a common access interface, and developing management policies across the distributed collection.
The DFC will use iRODS, the integrated Rule Oriented Data System, to implement a policy-based data management infrastructure. iRODS, developed by UNC’s DICE Center and DICE researchers at the University of California at San Diego, enforces policies as computer actionable rules to organize distributed data into sharable collections. Procedures to automate data management functions are cast as computer executable workflows. Policies control data access, sharing and archiving. Research groups worldwide, including the NASA Center for Climate Simulations, the National Optical Astronomy Observatory, the Australian Research Collaboration Service, and the Texas Digital Libraries, use iRODS technology to manage their research data grids, implement digital libraries, and build persistent archives.
“Excelling in the digital age requires that scientific disciplines and government agencies have the ability to manage the enormous amount of data that are generated each day,” said UNC-Chapel Hill Vice Chancellor for Research Barbara Entwisle. “Scientists can only solve the important problems of our times if they can easily access, share, analyze, and preserve data for future researchers and students. This award is important beyond its dollar amount because it establishes Carolina as the leader in the worldwide research community in taming the data deluge and as the data federation hub for collaborative research. It’s a role that is essential for future discoveries and innovations.”
Experts in the DICE group and at RENCI will work with six NSF-supported national consortia to federate their distributed data repositories and create policies for retention, distribution, access and validation of critical data properties. Those communities are:
- The Ocean Observatories Initiative (OOI), an NSF-funded program led by the University of California at San Diego and the Scripps Institution of Oceanography in San Diego. The OOI researchers use data from environmental sensors to study the physical, chemical, geological and biological variables in the ocean and seafloor.
- The Consortium of Universities for Advancement of Hydrologic Science, Inc. (CUAHSI) an organization led by the University of South Carolina. CUAHSI includes more than 130 partner organizations, including UNC’s Institute for the Environment, working to advance water science.
- CIBER-U, the Cyber-Infrastructure-Based Engineering Repositories for Undergraduates, an initiative led by Drexel University, which uses digital design repositories to enhance engineering instruction and learning.
- The iPlant Collaborative, a community of researchers and students led by the University of Arizona that is developing an integrated cyberinfrastructure to advance studies of plant biology.
- The Odum Institute for Research in Social Science, an interdisciplinary institute at UNC-Chapel Hill that focuses on teaching and research in the social sciences.
- The Temporal Dynamics of Learning Center (TDLC), an NSF Science of Learning Center based at the University of California at San Diego that studies the role of time and timing in learning in order to improve educational practices.
Arizona State University researchers will participate in the DFC by collaborating on policy-based data management systems and Duke University researchers will collaborate on education and outreach initiatives to broaden the impact of the DFC.
“The data we will work with includes observational data from sensors, experimental and simulation data, engineering designs and both structured and unstructured data,” said Reagan Moore, Ph.D., the principal investigator for the consortium, director of the DICE Center, SILS professor and domain scientist for data management at RENCI. “The infrastructure we develop will address all stages in the community-based data collection lifecycle, from initial collection formation for a single project, to shared collections across institutions, to formation of data processing pipelines, to publication and long term preservation. We see this as the first step to building a data infrastructure that will accommodate collaborative research, new educational approaches and innovative problem solving in academic institutions, in federal agencies and across national boundaries.”
During the first 18 months of the grant, the consortium will focus on federating the data management cyberinfrastructure for the OOI, CUASHI and CIBER-U. The work will include identifying federation requirements, integrating existing data management systems, deploying a federation hub, and developing policies and procedures for data sharing so that the data collections of these research communities can become the foundation of a national data cyberinfrastructure.
For more information: