South Big Data Hub partners on development of new nationwide data storage network under NSF grant

The Open Storage Network will enable researchers to manage data more efficiently than ever before.


The South Big Data Hub is one of four regional big data hub partners awarded a $1.8 million grant from the National Science Foundation (NSF) for the initial development of a data storage network over the next two years. A collaborative team will combine their expertise, facilities, and research challenges to develop the Open Storage Network (OSN). The OSN will enable academic researchers across the nation to work with and share their data more efficiently than ever before, according to the NSF announcement.

The project, led by Alex Szalay of Johns Hopkins University in the South Hub region, leverages key data storage partners throughout the U.S. These partners include the National Data Service and members representing each of the four NSF-funded Big Data Regional Innovation Hubs (BD Hubs): the South Big Data Hub at the Renaissance Computing Institute (RENCI) and the Georgia Institute of Technology, the West Big Data Hub at the San Diego Supercomputer Center (SDSC), the Midwest Big Data Hub at the National Center for Supercomputer Applications (NCSA), and the Northeast Big Data Hub at the Massachusetts Green High Performance Computing Center (MGHPCC) and Pittsburgh Supercomputing Center (PSC).

Christine Kirkpatrick, executive director of the National Data Service and co-chair of the Big Data Hubs’ Data Sharing and Cyberinfrastructure Working Group, anticipates that the OSN will provide the technical connecting fabric that the working group has needed to join use cases brought forward by the hubs’ government, academic, non-profit, and industry partners. 

“There are hundreds of issues to solve before we have a ‘datanet’ as efficient and cooperatively well organized as the internet,” Kirkpatrick said. “But just as there are many complexities to solve in the long-term, these solutions can be facilitated by the simplest of additions – here the establishment of a connected storage network. Sharing, reproducibility, replication, and data management all fundamentally rely on a place to store data.”

NSF’s investment in the OSN builds on a seed grant by Schmidt Futures — a philanthropic initiative founded by former Google Chairman Eric Schmidt — to enable the data transfer systems for the new network. These systems are designed to be low-cost, high-throughput, large-capacity, and capable of matching the speed of a 100-gigabit network connection with only a small number of nodes. This configuration will help to ensure that OSN can eventually be deployed in many universities across the U.S. to leverage prior investments and establish sustainable management for the overall storage network.

About the South Big Data Hub

The South Big Data Innovation Hub is a research coordination network that catalyzes and strengthens public-private partnerships to apply data science and analytics to scientific, societal, and economic challenges important to the region and the nation. The South Hub is one network out of four hubs across the country, launched by NSF in partnership with our host institutions, including the University of North Carolina at Chapel Hill and Georgia Tech.​

What to expect at the 2018 iRODS User Group Meeting

Interested in iRODS? Register for the meeting at  irods.org/ugm2018

DURHAM, NC – The worldwide iRODS user community will gather here June 5 – 7 for the iRODS User Group Meeting (UGM), three days of learning, sharing of use cases, and discussions of new capabilities that have been added to the integrated Rule Oriented Data System (iRODS) in the last year.

The meeting will take place at the Durham Convention Center and is sponsored by Western Digital, Quantum, DDN, RENCI, and the iRODS Consortium, the membership-based foundation that leads development and support of iRODS. Meeting attendees will learn about new features such as storage tiering, automated ingest, and OpenID authentication, according to Jason Coposky, executive director of the iRODS Consortium.

The first version of the iRODS storage tiering framework was released in February and allows iRODS to automatically move data between identified tiers of storage within a configured tiering group based on performance, availability, and data recovery requirements. Using this new framework, users can label selected storage resources with metadata tags to define their place in a storage tiering group as well as how long the data should reside in that tier before migrating to the next tier.

The iRODS automated ingest framework provides an enterprise solution that solves two major data management challenges: putting existing data under management; and ingesting new incoming data from disparate sources. Based on the Python iRODS client and Redis Queue, a Python library for queueing and processing jobs, the framework can scale up to meet the demands of data coming off instruments, satellites, or parallel filesystems.

The OpenID authentication plugin allows users to login to iRODS using their existing OpenID credentials. The OpenID system is a method for using a single username and password to sign in to multiple accounts. With the new plugin, iRODS now supports OpenID, GSI, Kerberos, PAM, and native password authentication on a per user basis.

“A lot of our efforts in the last year have focused on improving, streamlining and simplifying the user experience for particular enterprise use cases,” said Coposky.  “We are now using iRODS as the framework to create and ship flexible, off the shelf solutions.”

As always with the annual UGM, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature 21 talks from users in the U.S. and Europe. Among the use cases and deployments to be featured are:

  • Implementing a Storage Abstraction Service with iRODS, Bibliothèque nationale de France (BnF, National Library of France). As part of its efforts to preserve, enrich, and make available the national heritage of France, BnF developed a system called SPAR (Système de Préservation et d’Archivage Réparti) to support and structure its digital preservation efforts. SPAR is now responsible for well over 8 million digital packages, which could be books, musical albums, videos, etc. SPAR uses a Storage Abstraction Service, or SAS, implemented with iRODS. SPAR with iRODS allows easy and transparent duplication of data among remote sites, so that if data is lost, entire documents can be recovered.
  • National Institute for Environmental Health Sciences (NIEHS) Data Commons, NIEHS, U.S. National Institutes of Health. The NIEHS Data Commons is a system for accessing, sharing, and integrating research data and metadata. An iRODS data grid provides the Commons with policy-based data management to support ingest, indexing, provenance tracking, and analysis of NIEHS data sets. To develop the Commons, NIEHS collaborates with the iRODS Consortium on issues such as the MetaLnx web interface, message queue-based indexing, metadata templates, and virtual collections.
  • The Brain Image Library, Pittsburgh Supercomputer Center, Carnegie Mellon University. The Brain Image Library (BIL) is a national public resource in the U.S. enabling researchers to deposit, analyze, mine, share, and interact with large data sets of brain images. It is part of a comprehensive brain cyberinfrastructure initiative by the U.S. National Institutes of Health. The BIL uses iRODS for data registration and metadata management for brain image data sets uploaded to the library. A team at the Pittsburgh Supercomputer Center, which is home to the library, has deployed a prototype iRODS filesystem scanner to rapidly register large multi-terabyte trees of microscopy data into the BIL iRODS database.
  • iRODS for Clinical and Instrument Data Lifecycle Management and Archiving, Genentech/ Roche Molecular Systems. Genetech, the biotechnology company owned by the Swiss multinational healthcare corporation Roche, will present on their uses of iRODS automated ingest, the storage tiering framework, and data virtualization capabilities. Their use cases for these features include: integration with a large data transfer platform to support data replication; clinical data management to enable easy access to data and to streamline data management; and management of the instrument data lifecycle.

About the iRODS Consortium
The iRODS Consortium is a membership organization that supports the development of the Integrated Rule-Oriented Data System (iRODS), free open source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, please visit www.renci.org.

Tagged , , |

Report focuses on rethinking flood analytics

Aerial photo of flooding caused by Hurricane Katrina (2005).

Floods are the most common, most frequent and most costly type of disaster in the United States. A flood-resilient nation uses state-of-the-art analytics and data tools to help reduce or eliminate fatalities, minimize disruptions and reduce economic losses, according to a new report co-authored by the Coastal Resilience Center of Excellence (CRC).

Recommendations for advancing the current state of flood analytics are presented in a report, “Rethinking Flood Analytics: Proceedings from the 2017 Flood Analytics Colloquium,” written by the CRC and RENCI after last fall’s “Rethinking Flood Analytics” Colloquium. The event was sponsored jointly by the CRC, RENCI and the U.S. Department of Homeland Security (DHS) Science and Technology Directorate (S&T).

Experts from across various sectors and disciplines—some in the flood prevention and emergency response fields, others from as far afield as demography, satellite technology and news media—gathered to discuss what the future could hold at the Colloquium last November.

Attendees discussed solutions and approaches to flood prediction and impact analytics that ranged from enhancements to current systems to entirely new types of technology. The event centered around ways to improve work done as part of the DHS S&T’s Flood Apex Program. A multi-disciplinary group of technical specialists and end users discussed flood analytics through presentations, group discussions, and “open mic” sessions, while addressing challenges and gaps in the field.

“The method for quantifying flood risk is changing and the potential for doing a much better job of addressing that risk through analytics has increased dramatically,” according to the report. “Some techniques, such as numerical modeling, have been part of flood risk analysis for years. While important, evolutionary changes in the existing tools typically result in only incremental improvement.

“The most dramatic advances tend to come from techniques not previously considered in connection with flood risk analysis,” including big data, artificial intelligence, remote sensing, social media and the internet of things.

Among the conclusions for developing a more flood-resilient nation are:

  • Avoid risk by protecting the most important assets from flooding and by considering where residents live and where development takes place.
  • Invest in mitigation by understanding the real costs of flood disasters, valuing the real benefits of flood mitigation and implementing actions to achieve those benefits.
  • Transfer and accepting risk by such actions as purchasing flood insurance and implementing mechanisms to cope with residual risk. 

DHS S&T created the Flood Apex Program in 2014 to bring new and emerging technologies together to increase communities’ resilience to flood events and to provide predictive analytic tools for floods. The goals of the Program, which is managed by the First Responders Group of DHS S&T, are to reduce fatalities and property losses from future flood events, increase community resilience to flooding and develop better investment strategies to prepare for, respond to, recover from and mitigate flood hazards.

Tagged , |

iRODS Consortium continues to grow, signs on to OpenSFS

Quantum, NetApp join consortium that helps sustain the iRODS open source platform

 CHAPEL HILL, NC – Two companies involved in data storage and cloud-based data services recently became the 17th and 18th members of the iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS).  Read more

Tagged , , , |

Tracking entrepreneurship in the Research Triangle Park and beyond

by Anne Johnson

Platform offers insights on the forces that drive regional economies

Why do businesses spread like kudzu in some places but wither on the vine in others?

It’s an important question for anyone considering where to start a new business or seeking to cultivate a strong economy for their region. But it’s not an easy question to answer.  Read more

Data Matters™ Short Course Series Returns to Hunt Library in August

Registration now open at datamatters.org

 We live in a data-driven world, and as researchers, business professionals, and government policymakers struggle to stay on top of the latest data science trends and practices, the Data Matters™ Short Course Series returns to offer a week full of education and training.  Read more

Tagged , , |

Registration now open for June iRODS User Group Meeting

Registration discounts through April 1; visit irods.org

DURHAM, NC – Users of the integrated Rule Oriented Data System (iRODS) will come to Durham, NC from points around the globe to attend the 2018 iRODS User Group Meeting (UGM) June 5 – 7.

Read more

Tagged , |

Delving into the data from Hurricane Maria

Data and water scientists aim to learn from an unparalleled natural disaster.

Among the many problems faced by residents of Puerto Rico in the aftermath of Hurricane Maria is a lack of clean drinking water; this poses health risks for people who have already endured unprecedented hardship.

The storm and its aftermath also provided a distinctive occasion for an interdisciplinary research team, including RENCI experts, to collect data to understand how the storm impaired the island’s water resources. Through a grant from the National Science Foundation (NSF), the team is developing a software system to archive and share information about drinking water quality in some of the most devastated areas of Puerto Rico, and assessing how disruption in services affects water quality and relates to disease outbreaks.

Read more

NSF-sponsored workshop to focus on data lifecycle training for grad students and postdocs

Travel and accommodations provided; applications due March 15

For today’s graduate and post-doctoral students, conducting research often starts by trying to make sense of the many tools, technologies, and work environments used in data-intensive research and computing.

Fortunately, there is help in navigating this new research landscape.  Read more

Tagged , , , |

iRODS Consortium announces University of Groningen as newest member

The University of Groningen (UG) Center for Information Technology (CIT) is the newest member of the iRODS Consortium, the membership-based organization that leads efforts to develop, support, and sustain the integrated Rule-Oriented Data System (iRODS).

UG, a research university with a global outlook, is deeply rooted in the northern Netherlands town of Groningen, known as the City of Talent. The University ranks among the top 100 in several important ranking lists. It boasts a student population of about 30,000, both locally and internationally, and employs 5,500 full-time faculty and staff. Its Center for Information Technology (CIT) serves as the university’s IT center and promotes the sophisticated use of IT in higher education and research. CIT’s 200 employees manage the IT facilities and support processes for all students and staff members. Read more

Tagged , , |
Page 1 of 4312345...102030...Last »