News | RENCI

RENCI and NOAA Collaborate to Advance Coastal Safety

Published: January 17, 2025

The University of North Carolina at Chapel Hill’s (UNC-CH) Renaissance Computing Institute (RENCI) and Coastal Resilience Center (CRC) are making waves in Earth data science through their collaboration with the National Oceanic and Atmospheric Administration (NOAA). This partnership improves methods to assess and address coastal community risks by incorporating observations into the ADCIRC storm surge model to more accurately predict historical water levels across the Atlantic, Gulf of Mexico, and Caribbean coasts, increasing safety for the individuals living there. Recent acknowledgements highlight their pivotal role in advancing coastal safety and community resilience:

Coastal Ocean Reanalysis (CORA)

At the heart of this collaboration is the Coastal Ocean Reanalysis (CORA) dataset, a joint effort between teams from NOAA, the University of Hawaii, and UNC-CH. RENCI played a pivotal role by contributing its computational expertise to generate the new dataset. CORA provides predicted hourly historic water level and wave information at 500-meter increments along the coast, using observations dating back more than 40 years. Significantly, the model predictions, guided by observations, cover many areas where pre-existing data was sparse, capturing water level variability that will inform future assessments and needs in coastal regions.

Essentially, the data bridges gaps in existing coastal data, offering both precision and breadth to equip decision-makers with critical insights for flood risk and resilience planning and assessment.

CORA Gains Validation and Recognition

In June 2024, a NOAA article about CORA noted that researchers were able to provide preliminary validation for the dataset. A follow-up article in January 2025 noted that the publicly available (via NOAA’s Open Data Dissemination platform) dataset is already being used to address gaps in available data to improve coastal community flood risk assessment and planning. The article also notes that future uses of the dataset will enhance NOAA tools, like their Sea Level Calculator and High Tide Flooding Outlooks, as well as the National Water Model for comprehensive coastal flood mapping. Additionally, NOAA is looking to expand the dataset to include flood risk assessments for the U.S. West Coast, Hawaii, and Alaska by 2026.

In addition to these online accolades, during the 2024 American Geophysical Union (AGU) meeting, Dr. Rick Spinrad, Undersecretary of Commerce for Oceans and Atmospheres and Administrator for NOAA, delivered a keynote address celebrating NOAA’s many achievements. Among the highlights was the improved safety and awareness afforded to coastal communities thanks to CORA, a sentiment that was also published in NOAA’s 2024 Report, focused on how the agency was working to build a Climate Ready Nation.

Moving Forward

As NOAA continues to refine and expand CORA, the already vast applications will increase, improving coastal floodplain evaluations, supporting resilience planning, and offering decision-makers the ability to better protect their communities. By providing actionable insights into sea level rise and coastal inundation, RENCI and others involved in CORA are setting a new standard for improved climate resilience.

UNC researchers awarded up to $10M to leverage data science to accelerate cancer diagnosis and optimize delivery of precision oncology

Published: December 5, 2024

A team of UNC-Chapel Hill researchers has been awarded up to $10 million in Advanced Research Projects Agency for Health (ARPA-H) funding to develop the Cancer Identification and Precision Oncology Center (CIPOC). The project is designed to improve cancer diagnosis and support personalized treatments by quickly aggregating and analyzing a wide range of health data, including electronic health records, histopathological and radiological images, insurance claims and geographic information.

Specifically, CIPOC will facilitate the development of an oncology health learning system that utilizes AI-ready data to generate real-time identification of new cancer cases, support patient recruitment for research, recommend precision cancer care, and help improve cancer care equity and quality. It also will create an accessible, adaptable system for health providers across diverse locations and resources.

The project is led by four principal investigators across Carolina:

Ashok Krishnamurthy, PhD, director of the Renaissance Computing Institute (RENCI) and data science core lead.
Jennifer Elston Lafata, PhD, professor in the Division of Pharmaceutical Outcomes and Policy at the UNC Eshelman School of Pharmacy and innovation and optimization partners lead.
Caroline Thompson, PhD, MPH, associate professor of epidemiology at UNC Gillings School of Global Public Health and rapid identification core lead.
Melissa Troester, PhD, MPH, professor of epidemiology at UNC Gillings and precision oncology core lead.

“CIPOC is a multi-disciplinary project that will significantly advance not just rapid cancer identification and precision oncology but also health data science and informatics,” said Krishnamurthy, a research professor of computer science at UNC-Chapel Hill. “The approaches we are developing can be used in other areas of health care, which is possible because CIPOC brings together diverse expertise across a number of fields to work together on a common goal.”

The project will organize and facilitate collaborative research conducted by faculty, staff and trainees from more than 12 schools, centers, departments and programs at UNC-Chapel Hill with a shared vision to create cutting-edge data tools researchers and practitioners can use at UNC – and in time across North Carolina and the United States – to improve the diagnosis and treatment of cancer.

“While precision oncology has made major advances in recent years, translation of these innovations to practice has lagged behind as has our ability to monitor, track, and therefore understand and plan for needed cancer-related services,” said Thompson, a UNC Lineberger Comprehensive Cancer Center member. “By accelerating the identification of cancer cases and developing innovative informatics tools to make improved, precision recommendations for care, this project can advance the provision of equitable care services and delivery.”

The three-year project will focus on building an oncology learning health system at UNC Health, with the potential to expand across North Carolina and nationally. A learning health system integrates scientific evidence, data and culture into daily care with a commitment to continuous improvement and innovation. The goal is to produce high-quality and high-value care that is equitable across diverse populations.

“As part of our efforts, we are forming a panel of nationally recognized experts and advisors. This panel will provide our team with ongoing feedback and serve as an independent sounding board. Their input is crucial to ensuring the usability and acceptability of our processes and products,” said Lafata, co-lead of the UNC Lineberger’s Cancer Care Quality Initiative. “This step is essential given our focus on accelerating academic discovery, optimizing cancer care delivery and supporting public health reporting. Additionally, these advisers will help us minimize any inherent biases in our work.”

CIPOC will utilize AI tools, including large language modeling, to quickly standardize, harmonize and link structured and unstructured data from multiple sources, enabling more precise tracking and treatment for different cancer types.

It also will develop an AI-driven virtual multidisciplinary tumor board to support the provision of precision oncology care. Studies have shown multidisciplinary tumor boards, in which a group of experts in different specialties review and discuss patients’ medical conditions and treatment options, can improve cancer outcomes. The board will use AI-ready datasets, including electronic health record-derived clinical data and histopathological and radiological images, to help inform prediction of risk and tumor progression as well as treatment decision making.

“We want to make precision oncology more widely available to North Carolinians. This project aims to develop tools that will use common medical record data to define care that responds to each patient’s unique tumor biology, reducing the need for additional, costly testing,” said Troester, co-leader of the UNC Lineberger Cancer Epidemiology Program.

CIPOC will make its data tools open source, allowing them to be scaled and adapted by health systems of any size, thus improving the use of clinical data for research and cancer care across a broad spectrum of communities. This innovation aligns with ARPA-H’s national goals to strengthen health care system resilience and equity.

The development and submission of the ARPA-H proposal was supported by the UNC Office of Research Development, with oversight by Nathan Blouin, MBC, CRA, assistant vice chancellor for research development, and Nate Warren, PhD, research development manager.

GRAU DATA joins the iRODS Consortium

Published: July 15, 2024

GRAU DATA, a software company headquartered in Schwaebisch-Gmuend, Germany, has joined the iRODS Consortium, the membership-based organization that leads development and support of the integrated Rule-Oriented Data System (iRODS).

Since 2007, GRAU DATA has developed software products that simplify the management and protection of data for companies, research institutions, and government agencies. With specialized solutions in data archiving, data protection, and metadata-driven search, the company focuses on security, scalability, and user-friendliness in its software products.

iRODS is an open-source software used to store, manage, and share large amounts of data and metadata. By providing mechanisms for defining rules for data storage, processing, and distribution, iRODS supports interoperability and scalability of data infrastructures. The iRODS Consortium guides iRODS development priorities and facilitates support, education, and collaboration opportunities for iRODS users.

David Cerf, Chief Data Evangelist at GRAU DATA, highlights how GRAU DATA’s products dovetail with iRODS to enhance the services and functionalities available to the worldwide iRODS user community.

“Our solutions help iRODS users cut storage costs by approximately 50% and make better use of unstructured data for AI and analytics,” said Cerf. For example, a GRAU DATA product called MetadataHub helps iRODS users turn unstructured data and its embedded metadata into valuable insights for analytics. “Getting the most out of unstructured data is crucial for improving data quality and speeding up results,” Cerf noted. “MetadataHub automates data preparation to make it AI-ready, improves training models, and sets up downstream applications while maintaining data lineage and governance.”

Through its reporting feature, MetadataHub also gives users a comprehensive view of their data landscape to better manage storage resources, reduce costs, and save time on storage management. “Ultimately, these solutions provide iRODS users with a significant advantage in efficiency, insight, and strategic value,” said Cerf. “We’re excited to become part of the iRODS Consortium.”

Terrell Russell, Executive Director of the iRODS Consortium, expressed enthusiasm in welcoming the Consortium’s newest member. “GRAU DATA clearly has a deep well of expertise in understanding the challenges organizations face in handling, using, and protecting large data collections,” said Russell. “We look forward to further enhancing our collaboration to help organizations effectively leverage all of the strengths that iRODS has to offer.”

The iRODS software has been deployed at thousands of locations worldwide for long-term management of data in various industries such as the oil and gas industry, biosciences, physical sciences, archives, and media and entertainment. The development team of the iRODS Consortium is based at the Renaissance Computing Institute (RENCI), which is affiliated with the University of North Carolina at Chapel Hill, USA. To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about GRAU DATA, please visit graudata.com.

Leading data science expert joins RENCI as deputy director

Published: June 5, 2024

Rebecca Boyles, MSPH, currently the founding director of the Center for Data Modernization Solutions at RTI International, will join the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill as deputy director on June 24, RENCI Director Ashok Krishnamurthy, PhD, announced today.

Boyles’ leadership of the Center for Data Modernization at RTI International focuses on bridging the research and information technology gap by applying a data ecosystem perspective that enables researchers to maximize the value of their data assets. Boyles also has worked closely already as a partner to RENCI, in particular as a leader on both NHLBI BioData Catalyst and the NIH HEAL Data Stewardship Group, two important projects that help researchers harness the power of data.

As RENCI’s deputy director, Boyles will take responsibility for RENCI’s research division by managing and enhancing research partnerships with faculty at UNC-Chapel Hill, Duke University, and North Carolina State University; building relationships between RENCI and Triangle area businesses; and leading efforts to bring new federal research funding to RENCI and its partner institutions. She will also apply her trademark skills in developing fit-for-purpose solutions that enable researchers to use data for the public good.

“Rebecca is an exceptional leader with deep expertise in building data science teams and executing on innovative and impactful projects,” said Krishnamurthy. “We have worked with her on a number of joint projects, and this history shows us that she will be able to make significant strategic contributions at RENCI and in partnership with UNC and our broader research community.”

In addition to her passion for data science, research, and information technology, Boyles has also enabled strong strategic growth at organizations throughout her career. While a data scientist at the National Institute of Environmental Health Sciences, Boyles clarified the strategic vision for the environmental health science data ecosystem, leveraging existing data assets to respond to timely public health issues. She identified opportunities to catalyze scientific advancements in chemical safety and public health through interactions with broad stakeholder groups. She also liaised with NIH leadership and served as science officer on the Big Data 2 Knowledge (BD2K) program including the Data Discovery Index, Frameworks for Community-Based Standards, and The Center for Predictive Computational Phenotyping.

“I am thrilled to join RENCI’s efforts to tackle intractable, long-standing problems by driving the future of scientific computing in collaboration with their partner institutions,” said Boyles. “I look forward to bringing my background in environmental health and biomedical research, along with my experience partnering with diverse groups, to contribute to the pursuit of novel and effective solutions.”

Boyles holds an MSPH in Environmental Science and Engineering from the Gillings School of Public Health at UNC-Chapel Hill, along with a BA in Biology from UNC-Chapel Hill. Her areas of expertise include data modernization, FAIR data principles, data and modeling applications, data analysis and data management, data integration, and data strategy and implementation.

Download a picture of Rebecca Boyles.

What to expect at the iRODS 2024 User Group Meeting

Published: May 21, 2024

The worldwide iRODS community will gather in Amsterdam, NL from May 28-31

Members of the iRODS user community will meet at the Amsterdam Science Park in Amsterdam, NL for the 16th Annual iRODS User Group Meeting to participate in four days of learning, sharing use cases, and discussing new capabilities that have been added to iRODS in the last year.

The event, sponsored by SURF, RENCI, Globus, and Hays, will provide in-person and virtual options for attendance. An audience of over 100 participants representing dozens of academic, government, and commercial institutions is expected to join.

“We are excited to connect with our user community to learn more about the impact and utility of iRODS on a global scale in fields such as public health, materials science, biotechnology, and more.” said Terrell Russell, executive director of the iRODS Consortium. “In addition to learning from one another’s deployments and use cases, the 2024 iRODS User Group Meeting will provide opportunities to network with users around the world and sow the seeds for future collaboration.”

In May, the iRODS Consortium and RENCI announced the release of iRODS 4.3.2. Along with preparation for work on 5.0.0 and important bug fixes for the 4.3 series, notable updates include the new GenQuery2 parser allowing for richer metadata queries into the catalog, fixes for keyword combinations and bad inputs, a number of documentation additions, and a few new deprecation declarations.

Another new feature is the S3 API v0.2.0. Many software libraries, tools, and applications now read and write the S3 protocol directly. Last year, the iRODS Consortium announced that the then-new iRODS S3 API could present iRODS via the S3 protocol, and shared details about the requirements, design, and initial implementation. This year, users will hear about the first two releases, the implementation of various endpoints, and the state of Multipart transfers.

During last year’s UGM, users were presented an overview and demonstration of exploratory work with further authentication services such as OAuth 2.0, OpenID Connect, and the iRODS HTTP API. At this year’s event, the iRODS Consortium will share updates through the first three releases of the HTTP API, including optimizations and setting the iRODS server up as an OpenID Connect Protected Resource.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature over 15 talks from users around the world. Among the use cases and deployments to be featured are:

iRODS Security Challenges Within an Enterprise Environment. Dow. Dow’s focus on data security necessitates a tailored approach for their internal users, leading to the development of the Scientific Data Management System (SDMS) Query Tool (SQT) — a user-friendly tool designed to facilitate secure access to specific datasets. The current gap with Metalnx for general users is that there is too much control over modifying data and collections. Additionally, it is difficult to synchronize the iRODS users to our existing Azure Security groups for permission management. This talk outlines the development of a Querying Tool utilizing the iRODS C++ API as a backend to communicate with iRODS. The talk will highlight the need for robust security architecture for Enterprise scale applications and where we are hoping to take the project to in the future.

Sharing data in a multi-system multi-role environment centered on iRODS. SURF and Erasmus University Rotterdam. SURF, the cooperative association of Dutch educational and research institutions, offers data infrastructure and services to the research communities. Some of its services are based on iRODS and are often used as building blocks for data platforms. One increasingly common architectural component in those platforms is a web portal where researchers can discover data using project specific queries. Once the data are found, they are made available to the researcher, directly, for example, with a download link or indirectly, triggering a copy to a computing environment where they are analyzed. The implementation of such workflow is time consuming. Its maintenance in the long term is often jeopardized by limited support available within the project and design choices too tailored for that use case makes its adoption by other organizations too difficult. We think that it is possible to model that workflow in a generic way as a reusable modular component and in a way flexible enough to support even the more stringent requirements associated with sensitive data. The component relies on iRODS and links together multiple web portals and repositories through an API layer based on FastAPI. We present here a proof of concept developed within the GUTS project, in collaboration with the project’s data management team and the research support.

Integration of iRODS in a Federated IT Service through HTTP and Python API. CC-IN2P3. The Federated IT Service (FITS) project, a collaborative endeavor between the IN2P3 computing center and French national HPC Center named IDRIS, addresses the challenge of managing the escalating data volumes generated by research infrastructures. The project aims to consolidate computing and storage resources while maintaining control over hosting expenses and minimizing the ecological footprint of digital technologies. Within the FITS project, iRODS was selected as the storage pooling solution, leveraging its established use within the IN2P3 Computing Centre. This implementation enables project users to seamlessly access their data without being aware of its physical location.

iRODS-based system turbocharged next-gen sequencing analysis during pandemic and beyond. National Institute for Public Health and the Environment (RIVM). The Dutch National Institute for Public Health and the Environment (RIVM) has numerous projects in various scientific domains that generate next generation sequencing data. Bioinformatics plays an important role in analyzing and interpreting this sequencing data. To support these analyses, we developed a platform that consists of a High Performance Compute (HPC) cluster, a Linux Scientific Workspace for software development and a Data Management System (DMS) based on iRODS. On top of this DMS, we also created a Job Engine: a tightly integrated process automation tool that manages the automated analyses of sequencing data on the HPC.

Bookending this year’s UGM are two in-person events for those who hope to learn more about iRODS. On May 28, the Consortium is offering beginner and advanced training sessions. After the conference, on May 31, users have the chance to register for a troubleshooting session, devoted to providing one-on-one help with an existing or planned iRODS installation or integration.

Registration for both physical and virtual attendance will remain open until the beginning of the event. Learn more at this year’s UGM at irods.org/ugm2024.

About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

UNC Advances Hurricane-driven Flood Prediction Capabilities for Coastal Communities

Published: March 1, 2024

On September 14, 2018, Hurricane Florence made landfall in the Wrightsville Beach area of coastal North Carolina. While the storm was a category 1, it caused catastrophic flooding throughout much of the state. The record amount of rain from the system combined with an already saturated soil. Rivers overflowed their banks, storm surge inundated coastal areas, and the water had nowhere to go. It was a rare compound flooding scenario that will be studied and remembered for a long time.

It is difficult to model compound flooding – fluvial (river), pluvial (surface flooding unrelated to rivers), and oceanic storm surge interaction – impacts, but this scenario is faced annually by communities in the path of tropical and extratropical storm systems. Unfortunately, the difficulty of modeling and understanding these events impedes already difficult hurricane decision-making, leaving countless communities at increased risk, and there is evidence that these compound flooding events may occur more frequently in the future (e.g., Wahl, T., et al. “Increasing risk of compound flooding from storm surge and rainfall for major US cities.” Nature Climate Change 5.12 (2015): 1093-1097.). But a new modeling approach for river representation in the widely used coastal model ADCIRC may help change that, providing predictions and insights to the decision-makers working to keep their communities safe during storm-related flood events.

The Renaissance Computing Institute (RENCI), University of North Carolina (UNC) Center for Natural Hazards Resilience, and Institute of Marine Sciences (IMS) at UNC-Chapel Hill, combined efforts under a grant from the National Oceanic and Atmospheric Administration (NOAA) to develop a better modeling approach for the compound flooding caused by these interconnected water systems. The resulting model advancement will help scientists represent river channel size variations and provide better insights into interactions between river channels and floodplains.

Current Models:

There are several models used to understand and predict coastal inundation scenarios, but two models are primarily used to understand flooding:

ADCIRC is developed by a consortium of researchers in academia, government, and industry, with activities centered and coordinated at both UNC-Chapel Hill and Notre Dame. It is the most widely used storm surge modeling and analysis platform. In fact, FEMA uses the model for coastal flood insurance studies, defining storm surge levels for coastal insurance rates. However, the standard trapezoidal river channel representation used in ADCIRC only accounts for structures down to 30 m, with smaller structures (small rivers, man-made waterways, inlets, estuaries, etc.) creating a more burdensome computation. This creates inaccuracies when modeling compound flood events.
HEC-RAS, a fluvial modeling system developed by the Army Corps of Engineers, accurately models river systems and has been the primary system used for real-time prediction of river flow and stage by the NOAA River Forecast Centers. It was originally developed as a model for inland river systems, where coastal waters do not reach.

As a result, we currently have two unique and independently accurate models, one for storm surge flooding, and one for fluvial systems, but neither adequately accounts for impacts captured by the other. This means communities that fall into both flood risk zones are left outside our current ability to model and understand their unique circumstances.

Modeling Compound Flooding

The team’s new riverine feature in ADCIRC, led by Dr. Shintaro Bunya (a research scientist with UNC-Chapel Hill’s IMS and DHS-funded Coastal Resilience Center) and Prof. Rick Luettich (Earth, Marine, and Environmental Sciences (EMES) faculty member, Director of UNC-Chapel Hill’s IMS, and principal investigator of the Coastal Resilience Center), represents fluvial channels and man-made waterways using elongated, one-dimensional elements in the channel direction. The depth of the river and the height of the river bank are then specified at the same location. Previously not possible in ADCIRC, this “discontinuous” elevation permits a more accurate simulation of water flow and more easily accounts for smaller structures. The new river feature seamlessly fits into existing two-dimensional ADCIRC models and is as accurate at modeling fluvial flooding as HEC-RAS. The technique details and applications were recently published here.

Already, the model has proven its worth. The new river feature was demonstrated in a real-world application (see the figure below) using a large, ocean scale ADCIRC grid for detailed simulations along the North Carolina coastal region. The coastal river network, with about 200 m along channel resolution in the Neuse River, is represented by the narrow elements, detailed in insets A and B. The entire ADCIRC grid is shown in inset C. The orange-red colors show the predicted maximum water level contour in a Hurricane Florence (2018) simulation, and the plot in the upper right shows a comparison of observed versus predicted high water marks along the Neuse River. The agreement between observations and predictions is very high, indicating that this new approach to river channel representation in ADCIRC will be highly beneficial in predicting future flooding river flow conditions and their impacts on coastal flooding.

Figure. Real-world example of the new channel network feature in ADCIRC.

This new model has the potential to provide better predictions for communities where evacuation decisions can be the hardest to make, in the hope that North Carolina and other coastal states are less likely to be caught off guard by the flood risks in these compound flooding events.

IT4Innovations National Supercomputing Center joins the iRODS Consortium

Published: February 8, 2024

IT4Innovations National Supercomputing Center at VSB – Technical University of Ostrava, which is based in the Czech Republic, has become the newest member of the iRODS Consortium. The consortium brings together businesses, research organizations, universities, and government agencies from around the world to ensure the sustainability of the iRODS software as a solution for distributed storage, transfer, and management of data. Members work with the consortium to guide further development and innovation, expand its user and developer communities, and provide adequate support and educational resources.

IT4Innovations is the leading research, development, and innovation center active in the fields of High-Performance Computing (HPC), Data Analysis (HPDA), Quantum Computing (QC), and Artificial Intelligence (AI) and their application to other scientific fields, industry, and society. Since 2013, IT4Innovations has been operating the most powerful supercomputing systems in the Czech Republic, which are provided to Czech and foreign research teams from academia and industry.

Integrated Rule-Oriented Data System (iRODS) is an open-source software that is used by research, commercial and government organizations around the world. The iRODS software allows you to store, manage and share large amounts of data, including their metadata, between different organizations and platforms and provides a mechanism for defining rules for their storage, processing and distribution. iRODS is designed to support collaboration, interoperability and scalability of data infrastructures.

Martin Golasowski, senior researcher at IT4Innovations, summarizes the benefits of membership in the iRODS Consortium: “The demand for a comprehensive solution for fast and efficient data transfer between locations is increasing across the European scientific community. Membership in the iRODS Consortium will enable us to communicate directly with the development team of this solution and provide us with access to the latest features and support in providing these tools not only to the scientific community.”

“iRODS provides a virtual file system for various types of data storage, metadata management, and, last but not least, a mechanism for federating geographically distant locations for data transfer. These features are used in the LEXIS Platform, which simplifies the use of powerful supercomputers to run complex computational tasks through a unified graphical interface or using a specialized application interface. The transfer of large volumes of data between supercomputers and data storage is then performed automatically and transparently for those using iRODS and other data management technologies,” adds Martin Golasowski.

“We are very excited to have our friends in the Czech Republic join the Consortium,” said Terrell Russell, Executive Director of the iRODS Consortium. “Their expertise and collaborative insights have already made iRODS better for everyone. We look forward to continued progress working alongside IT4Innovations.”

The iRODS software has been deployed at thousands of locations worldwide for long-term management of PB data in various industries such as the oil and gas industry, biosciences, physical sciences, archives, and media and entertainment industry. The development team of the iRODS Consortium is based at the Renaissance Computing Institute (RENCI), which is affiliated with the University of North Carolina at Chapel Hill, USA. To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about IT4Innovations National Supercomputing Center, please visit www.it4i.cz/en.

Exploring the power of distributed intelligence for resilient scientific workflows

Published: November 28, 2023

New project led by USC Information Sciences Institute seeks to ensure resilience in workflow management systems

*Image AI generated by author using DALL-E*.

Future computational workflows will span distributed research infrastructures that include multiple instruments, resources, and facilities to support and accelerate scientific discovery. However, the diversity and distributed nature of these resources makes harnessing their full potential difficult. To address this challenge, a team of researchers from the University of Southern California (USC), the Renaissance Computing Institute (RENCI) at the University of North Carolina, and Oak Ridge, Lawrence Berkeley and Argonne National Laboratories have received a grant from the U.S. Department of Energy (DOE) to develop the fundamentals of a computational platform that is fault tolerant, robust to various environmental conditions and adaptive to workloads and resource availability. The grant is planned for five years and includes $8.75 million of funding.

“Researchers are faced with challenges at all levels of current distributed systems, including application code failures, authentication errors, network problems, workflow system failures, filesystem and storage failures and hardware malfunctions,” said Ewa Deelman, research professor, research director at the USC Information Sciences Institute and the project PI. “Making the computational platform performant and resilient is essential for empowering DOE researchers to achieve their scientific pursuits in an efficient and productive manner.”

A variety of real-world DOE scientific workflows will drive the research – from instrument workflows involving telescope and light source data to domain simulation workflows that perform molecular dynamics simulations. “Of particular interest are edge and instrument-in-the-loop computing workflows,” said co-PI Anirban Mandal, assistant director for network research and infrastructure at RENCI. “We expect a growing role for automation of these workflows executing on the DOE Integrated Research Infrastructure (IRI). With these essential tools, DOE scientists will be more productive and the time to discovery will be decreased.”

*Fig. 1: SWARM research program elements.*

Swarm intelligence

Key to the project is swarm intelligence, a term derived from the behavior of social animals (e.g., ants) that collectively achieve success by working in groups. Swarm Intelligence, or SI, in computing refers to a class of artificial intelligence (AI) methods used to design and develop distributed systems that emulate the desirable features of these social animals – flexibility, robustness and scalability.

“In Swarm Intelligence, agents currently have limited computing and communication capabilities and can suffer from slow convergence and suboptimal decisions,” said Prasanna Balaprakash, director of AI programs and distinguished R&D staff scientist at Oak Ridge, and co-PI of the newly funded project. “Our aim is to enhance traditional SI-based control and autonomy methods by exploiting advancements in AI techniques and in high-performance computing.”

The enhanced metasystem, called SWARM (Scientific Workflow Applications on Resilient Metasystem), will enable robust execution of DOE-relevant scientific workflows such as astronomy, genomics, molecular dynamics and weather modeling across a continuum of resources – from edge devices near sensors and instruments through wide-area networks to leadership-class systems.

Distributed workflows and challenges

The project develops a distributed approach to workflow development and profiling. The research team will develop an experimental platform where DOE scientists will submit jobs and workflows to a distributed workload pool. Once a set of workflows becomes available in the workflow pool, the agents need will estimate each task’s characteristics and the resource requirements with continual learning capability. “Such methods enhance the capabilities of the agents. The research will include mathematically rigorous performance modeling and online continual learning methods.” remarked Krishnan Raghavan, an assistant computer scientist in Argonne’s Mathematics and Computer Science division and a co-PI of SWARM.

In SWARM there is no central controller: the agents must reach a consensus on the best resource allocation. “In imitation of biological swarms, we will investigate how coalitions can adapt to various fault tolerance strategies and can reassign tasks, if necessary,” said Argonne senior computer scientist Franck Cappello, who is leading the development efforts on fault recovery and adaptation algorithms. Here the agents will coordinate decision-making for optimal resource allocation while minimizing communication between agents such as by formation of hierarchies and by adoption of adaptive communication strategies.

Evaluation

To demonstrate the efficacy of the swarm intelligence-inspired approach, the team will evaluate the method by swarm simulations, by emulation and prototyping on testbeds. “We will re-imagine how workflows can be managed to improve both compute and networking at micro and macro levels”, said Mariam Kiran, Group Leader for Quantum Communications and Networking at ORNL.

This article was written in collaboration with USC ISI, RENCI, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, and Argonne National Laboratory.

RENCI to showcase latest technological innovations at SC23

Published: November 11, 2023

Every sector of society is undergoing a historic transformation driven by big data. RENCI is committed to transforming data into discoveries by partnering with leading universities, government, and the private sector to create tools and technologies that facilitate data access, sharing, analysis, management, and archiving.

Each year, the Supercomputing conference provides the leading technical program for professionals and students in the HPC community, as measured by impact, at the highest academic and professional standards. RENCI will host a booth (#1663) at SC23 where team members will share collaborative research projects and cyberinfrastructure efforts aimed at helping people use data to drive discoveries.

A full schedule of sessions at the RENCI booth can be found on our website.

18th Workshop on Workflows in Support of Large-Scale Science

Anirban Mandal, the Assistant Director of Network Research & Infrastructure Group at RENCI and co-PI of the DOE-funded Poseidon project, will co-chair the18th Workshop on Workflows in Support of Large-Scale Science (WORKS), taking place November 12 -13. WORKS 2023 focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies.

iRODS 4.3.1, HTTP, OIDC, and S3

The open source iRODS (Integrated Rule-Oriented Data System) data management platform presents a virtual filesystem, metadata catalog, and policy engine designed to give organizations maximum control and flexibility over their data management practices and enforcement. As iRODS has always defined its own protocol and RPC API, interoperability with other protocols has been left to application developers and administrators. This year’s releases of iRODS 4.3.1 as well as standalone APIs exposing iRODS via HTTP and S3 help new users use their existing, familiar tools to integrate with an iRODS Zone.

iRODS will host a free mini-workshop on Monday, November 13 at 9 AM ET to cover the above efforts and give a glimpse of where the team is headed next. Additionally, iRODS team members will present talks on these topics and be available for further discussion at the RENCI booth on the exhibit floor from November 14-16.

iRODS in the Cloud: Organizational Data Management

iRODS Executive Director Terrell Russell will give a talk at the Google booth (#443) on November 16 at 12:30 PM MT. This talk will give an overview of the philosophy of iRODS as well as some examples of how running iRODS in the Google Cloud can help get a handle on the metadata and bookkeeping associated with an enterprise deployment.

FABRIC Status and FPGA Drop-In

The NSF-funded FABRIC project recently completed installation of a unique network infrastructure connection, called the TeraCore — a ring spanning the continental U.S. — which boasts data transmission speeds of 1.2 Terabits per second (Tbps), or one trillion bits per second. FABRIC previously established preeminence with its cross-continental infrastructure, but the project has now hit another milestone as the only testbed capable of transmitting data at these speeds—the highest being twelve times faster than what was available before.

FABRIC leadership team members Ilya Baldin and Paul Ruth will present a talk at the RENCI booth on the current status of the testbed and future plans for development at the below times. Each of the talks is followed by a 30 minute office hours session at the RENCI booth for anyone wanting a one-on-one discussion or help with account setup.

Tuesday, November 14 at 11:00 AM MT
Wednesday, November 15 at 2:00 PM MT
Thursday, November 16 at 10:30 AM MT

In conjunction with ESnet and IIT, the FABRIC team will host an FPGA drop-in at the RENCI booth on Wednesday, November 15 at 11:00 AM MT. Those interested in running FPGA-based experiments on FABRIC are encouraged to stop-by for a discussion during the block. ESnet smartNIC, a fully open source P4 + FPGA development environment for FABRIC developers is fully deployed in the NSF FABRIC testbed. Attendees will get a chance to meet the developers, ask questions and get a 1:1 explanation of how to do P4 development on FABRIC, without any prior FPGA design experience. The team will cover everything from “hello world” tutorials, to deep dives on the Verilog architecture, DPDK and other driver software.

FABRIC at INDIS 2023

FABRIC will be represented at the 2023 INDIS Workshop Technical Session on Tuesday, November 14 at 2 PM MT at the SCinet Theater on the exhibit floor. PI Ilya Baldin will talk about FABRIC as part of a panel and a number of FABRIC users will show demos of their FABRIC experiments.

Unleashing the Power within Data Democratization: Needs, Challenges, and Opportunities

On Thursday, November 16 at 1:30 PM MT, FABRIC PI Ilya Baldin will sit on a panel discussing the needs, challenges, and opportunities of the data science community leveraging the existing cyberinfrastructures and software tools while strategizing on what is missing to connect an open network of institutions, including resource-disadvantaged institutions.

A full list of FABRIC activities at SC23 is available on the FABRIC website.

About RENCI

RENCI (Renaissance Computing Institute) develops and deploys advanced technologies to enable research discoveries and practical innovations. RENCI partners with researchers, government, and industry to engage and solve the problems that affect North Carolina, our nation, and the world. RENCI is an institute of the University of North Carolina at Chapel Hill.

RENCI awarded NSF grant to develop cyberinfrastructure training program for X-ray scientists

Published: November 3, 2023

Enhancing the ability of scientists to use the latest computing and data tools will help quicken the pace of scientific discoveries

RENCI scientists and collaborators from Cornell University and University of Southern California (USC) have been awarded a $1 million, three-year grant from the National Science Foundation (NSF) to develop an innovative training program for scientists who use the Cornell High Energy Synchrotron Source (CHESS) X-ray facility. The program will be designed to help the scientists increase their computing skills, awareness and literacy with an ultimate goal of accelerating scientific innovations in synchrotron X-ray science.

A RENCI team headed by Anirban Mandal, assistant director of the Network Research & Infrastructure Group (NRIG), will lead the CyberInfrastructure Training and Education for Synchrotron X-Ray Science (X-CITE) project. It will bring together experts in cyberinfrastructure, X-ray science and other related areas from RENCI, Cornell University and USC to develop an innovative training program for researchers using CHESS, an NSF-supported high-intensity X-ray source at Cornell. CHESS is used to conduct research in materials science, physics, chemistry, biology, environmental science and other areas.

“Scientists don’t always have the computing and data expertise necessary to fully harness the instruments, data and computing tools available to transform data into insights and knowledge,” said Mandal. “We want to help reduce barriers so that scientists can effectively utilize computing capabilities and data resources at CHESS as well as cyberinfrastructure resources available through national computing and data services.”

Teaching scientists about computing tools

To get scientists up to speed on computing and data tools, the training program will cover programming essentials, systems fundamentals, distributed computing with the cyberinfrastructure ecosystem, X-ray science software and issues of data curation and applying the FAIR data principles of findability, accessibility, interoperability and reusability.

“As scientific instruments have become more sophisticated, there has been an explosion in the volume and rate of data produced by scientific facilities like CHESS,” said Mandal. “The data generated no longer fits on a laptop, and there are now computational models and AI methods that scientists can use to steer experiments based on the results they are getting. It is very difficult for scientists to keep pace with all these new capabilities.”

Mandal points out that it is important for scientists to get up to date on FAIR principles because federal research funding agencies are planning to roll out new mandates requiring scientists to share the data they generate. This will require designing metadata and figuring out how to push data into repositories in a way that makes it findable and usable by other researchers — tasks that scientists might not be accustomed to doing.

Drawing on RENCI’s expertise

The RENCI team will focus on developing common computer science modules for Python and other programming languages. This work will leverage RENCI’s expertise in this area, including Senior Research Software Developer Erik Scott’s experience as an instructor for the student program within the CI Compass project. The USC team, led by Research Professor of Computer Science Ewa Deelman, will contribute distributed computing training materials. Training materials for the specialized X-ray science software used at CHESS will be the focus of the Cornell team, which is led by Matthew Miller, associate director of CHESS.

The X-Cite training materials and activities will be available in several formats, including self-paced modules, videos, cyberinfrastructure catalogs, in-person instruction sessions, CHESS user workshops and tutorials offered at scientific conferences. The project team will also develop a coordination network to help disseminate the training materials, communicate the cyberinfrastructure needs for the X-ray science community and discuss best practices for training.