RENCI teams with DICE group to tame the data deluge

CHAPEL HILL, NC, July 16, 2009—Almost a year after the Data Intensive Cyber Environments (DICE) research group moved from the University of California at San Diego to the University of North Carolina at Chapel Hill, the internationally recognized research group has established deep ties to the Renaissance Computing Institute (RENCI).

Together, DICE and RENCI are working to link data repositories across the state and make them accessible and easy to use by researchers, businesses and government. Through a variety of collaborations, the two groups hope to establish North Carolina as an international leader in solving the pressing problems of managing and sharing today’s deluge of digital data in ways that provide a boost to research, business and education.

DICE members, long of  UCSD’s, San Diego Supercomputer Center, made the cross county trek to UNC Chapel Hill early last fall, bringing with them a research portfolio in excess of $10 million. The researchers hold appointments in Carolina’s nationally recognized School of Information and Library Science (SILS) and also serve as chief scientists at RENCI, with research space at RENCI’s off campus Chapel Hill location.

For more than 10 years the DICE group’s data grid technologies have been used in research projects worldwide to manage large, distributed data collections and support discovery, access, retrieval, replication, archiving, and analysis tasks. The researchers most recently released version 2.1 of iRODS, the open source Integrated Rule-Oriented Data System, which introduced user-settable rules that automate complex management policies, helping users handle today’s mushrooming collections of digital data.

“Having RENCI in North Carolina was one of the main considerations in coming here,” said Reagan Moore, head of the DICE group, director of the new DICE Center at UNC-Chapel Hill and chief scientist for data intensive cyber environments at RENCI. “RENCI works to provide the technology infrastructure that thriving universities and business communities need and a data infrastructure is part of that. Working together, RENCI and DICE can develop the kind of technology infrastructure that opens up all kinds of opportunities.”

RENCI works closely with the DICE Center, which was launched in May with funding from the National Science Foundation, the National Archives and Records Administration and the National Historical Publications and Records Commission. The center draws on leading data management technology, which is advanced and generic enough to have a remarkable array of uses, from helping the National Archives digitally preserve the nation’s historical information to helping digital libraries cope with the ever-increasing size and complexity of digital knowledge to enabling sharing of digital data by large-scale interdisciplinary scientific research collaborations across the nation and the globe.

The center is working to establish an interoperable data center that spans the 17-campus UNC system that allows campus researchers to easily access, search, share and mine vast stores of data. The bulk of the center’s data cache will be located at RENCI, which currently provides about 100 terabytes of data storage to the DICE center. Eventually, said Moore, DICE hopes to have a data infrastructure, supported by RENCI, that provides more than 1 petabyte of data management and storage capacity.

“The indispensable role of digital data across society and the increasing size and complexity of data collections are reaching a critical point,” said Richard Marciano, executive director of the DICE center and RENCI chief scientist for persistent archives and digital preservation. “With the growing need for practical digital data technologies, the new DICE Center is already collaborating with many important projects across UNC-Chapel Hill as well as national and international partners, helping them harness their digital data collections and working with them to efficiently create, share and preserve new knowledge.”

Regional hubs through Data Grid
A premier DICE-RENCI project is the RENCI Data Grid, which links RENCI’s engagement centers at UNC-Chapel Hill, Duke, NC State, UNC Charlotte, UNC Asheville, and East Carolina University as interconnected regional data hubs managed by iRODS. The project started in late 2008 by adding multi-terabyte storage capacity at the RENCI engagement centers and at RENCI’s offices in Chapel Hill. Those data hubs will house RENCI visualization files, the data on which the visualizations are based, and the visual analytics software tools that enable many additional levels of insight into the data. But Data Grid’s potential is much greater, according to Ray Idaszak, RENCI’s director of visualization and collaborative environments.

“Data Grid is the start of an effort by DICE and RENCI to give the state and its universities a data resource that will be a tool used for decision making, for increasing productivity or for examining data in the context of other data,” said Idaszak.
When completed, the Data Grid in action might work like this: Data on development patterns around the North Carolina would be stored at RENCI at UNC Charlotte, where researchers at the RENCI engagement center study urban growth patterns and their implications. An urban planner in eastern North Carolina would be able to access that data as well as the software tools that allow it to be viewed in a visual, intuitive format. Those same researchers also would be able to access coastal floodplain maps and storm surge visualizations stored at other data hubs and to use all of the information to plan sustainable coastal developments.

One of the largest datasets the Data Grid plan to offer will be orthophotographic images—an extremely high resolution photographic map of the state with data points at every six inches across a statewide grid. When overlaid with other datasets—for example, census data or information on the location of healthcare facilities—the orthophotographs become tools useful to emergency managers, city planners, researchers and area businesses.

All of the Data Grid hubs will be operational by fall 2009 and new datasets will be added to the collection as more storage capacity becomes available.

Other projects on which RENCI and DICE collaborate include:
• Distributed Custodial Archival Preservation Environments (DCAPE), a project to build a distributed data preservation environment that meets the needs of archival repositories for trusted archival preservation services.  DCAPE develops preservation policies for state and university archives and cultural institutions, using iRODS to implement and deliver services. Participants include the North Carolina State Archives and the State Library of North Carolina.

• EnginFrame engagement portals, an effort to integrate EnginFrame’s portal environment for using grid-empowered applications on organizational intranets with the RENCI Data Grid. EnginFrame is a major portal for Enabling Grid for e-SciencE (EGEE), the European grid infrastructure for scientists. The collaboration among RENCI, DICE and EnginFrame allows RENCI to improve its Science Gateway portal to the National Science Foundation’s TeraGrid. It will also provide researchers in the UNC system and beyond user-friendly access to distributed data collections available through the Data Grid.

• TUCASI data Infrastructure Project (TIP), an effort to deploy a federated data cyberinfrastructure across the campuses in the Research Triangle area. Funded by the Triangle Universities Center for Advanced Studies, Inc. (TUCASI), the project aims to meet the growing data storage and management needs of UNC-Chapel Hill, Duke and NC State through an interoperable data repository. Eventually, TIP will provide researchers at Triangle area universities with over a petabyte of data storage capacity and may also link to business-operated storage “clouds.” As DICE’s Moore explained, “TIP may serve as a gateway linking business storage clouds and institutional repositories, and will make migration of data between these two worlds much more feasible.”

• Carolina Digital Repository, a project with the UNC-Chapel Hill library to store and provide access to digital collections. The project provides local storage managed by iRODS and a back up of records at RENCI.

• National Archives and Records Administration Transcontinental Persistent Archive Prototype (NARA TPAP). This project to develop, implement and test a nationwide data management infrastructure federates seven independent data grids. RENCI is one of the experimental nodes on the system and works with other NARA TPAP partners to test the persistent archive prototype and the tools and policies which support it.

• DataNet Federation Consortium, a five-year, $20 million project proposal recently submitted to the National Science Foundation that aims to create a nationwide data management infrastructure that will make it easier for researchers across a wide range of science and engineering disciplines to access, use and manage diverse data sets.  The proposal addresses the need to create federated data collections that can be shared through the Internet across disciplines and institutions. If funded, the consortium will build consensus on how to manage, access and store the enormous data files being created daily by sensors and other real-time data streams, medical imaging equipment, genomic studies, statistical studies of populations and more.

More information:
DICE Center: http://dice.unc.edu
RENCI: http://www.renci.org/
SILS: http://sils.unc.edu/
iRODS: http://www.irods.org

Media contact: Karen Green, (919) 445-9648 (office), (919) 619-8213 (mobile), kgreen@renci.org