What to expect at the iRODS 2023 User Group Meeting

The worldwide iRODS community will gather in Chapel Hill, NC from June 13 – 16

Members of the iRODS user community will meet at UNC-Chapel Hill in North Carolina for the 15th Annual iRODS User Group Meeting to participate in four days of learning, sharing use cases, and discussing new capabilities that have been added to iRODS in the last year.

The event, sponsored by RENCI, Omnibond, Globus, and Hays, will provide in-person and virtual options for attendance. An audience of over 100 participants representing dozens of academic, government, and commercial institutions is expected to join.

“The robust list of presentations at the 2023 iRODS User Group Meeting illustrates the impact and utility of iRODS on a global scale, from talks on unique applications of the software to demos of innovative clients and integrations.” said Terrell Russell, executive director of the iRODS Consortium. “We are excited to host the user community in our hometown of Chapel Hill, NC and provide opportunities for learning, networking, and collaboration throughout the week.”

In May, the iRODS Consortium published the 2023 Technology Roadmap, which documents the state of the technical direction chosen for the iRODS data management software. A notable focus named in this plan is to make implementing the iRODS Protocol less complicated by designing a new HTTP API.

Other plans include updates to the GenQuery interface and the iCommands client. The iRODS GenQuery interface has long defined the way users and administrators can search the iRODS namespace, its storage systems, users, and metadata, while honoring the iRODS permission model. The next generation of GenQuery, GenQuery2, is now available for experimentation. The current iRODS iCommands are a culmination of many years of effort, but they are beginning to show their age, especially in terms of design and extensibility. The iRODS Consortium aims to create a brand new CLI that focuses on using modern libraries (iRODS or otherwise), modern C++, being extensible and modular, and providing a single binary.

During last year’s UGM, users learned about a new PAM (pluggable authentication module) plugin, a universal implementation for all authentication methods. At this year’s event, the iRODS Consortium will provide an overview and demonstration of exploratory work with further authentication services such as OAuth 2.0, OpenID Connect, and the new iRODS HTTP API, and how integrations with these services may work best in the future.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature over 15 talks from users around the world. Among the use cases and deployments to be featured are:

  • GoCommands: A cross-platform command-line client for iRODS. CyVerse / University of Arizona. The diversity of scientific computing platforms has increased significantly, ranging from small devices like Raspberry Pi to large computing clusters. However, accessing iRODS data on these varied platforms remains a common but challenging requirement. The official command-line tool for iRODS, iCommands, is limited to a few platforms like CentOS7 and Ubuntu 18/20. As a result, users on other platforms like MacOS, Windows, and Raspberry Pi OS have no straightforward performant means of accessing iRODS. GoCommands is another command-line tool for iRODS designed to address the portability issue of iCommands. Written in Go programming language, building its executable for diverse platforms is straightforward. The tool is a single executable that does not require any dependency installation. Pre-built binaries for MacOS, Linux (any distros), and Windows, regardless of their CPU architectures, are already available. In addition, the tool does not require elevated privileges for installation and run. This makes it possible for users on nearly any platform to access iRODS.
  • iBridges: A comprehensive way of interfacing with iRODS. Utrecht University, Wageningen University, and University of Groningen. iRODS is a rich middleware providing means to facilitate data management for research. It implements all necessary concepts like resources, metadata, permissions, and rules. However, in research most of the concepts are still new. Hence, researchers and their support staff are challenged using the current interfaces and tools to 1) learn about those concepts and 2) familiarize themselves with the different APIs and command line interfaces. This creates the need for a steep learning curve for researchers and research supporters, slowing down the adoption of iRODS. To ease the usage of iRODS we present iBridges. iBridges is a standalone desktop application, written in Python, to provide users of Windows, Linux, and MacOS with a graphical user interface (GUI) to interact with iRODS servers. The tool is agnostic to any rules/policies in the server. Out-of-the-box iBridges supports three main functions: browsing and manipulating data objects, upload/download data, and searching data collections.
  • ManGO: A web portal and framework built on top of iRODS for active research data management. KU Leuven. At the University of Leuven. Belgium, we are building the infrastructure and software layers to leverage iRODS as a major building block in active research data management. This involves various workflows and processing of data and metadata during the lifetime of a research project. One of the important components consists of a modular and adaptable web portal built using the iRODS Python client. Given the wide range of use cases, the web framework employs some classical architectural patterns to decouple specialized domain specific needs from the core system. It also has features that make it behave like a content management system, including a (view) template override system that make the representation of collections and data objects dependent on for example specific metadata or collection structure. Metadata is a prime focus to steer many aspects of this framework along its core use for research data, and a considerable effort was also put in a user friendly metadata schema management system. In this talk, we will present the current status as well as near future plans.
  • iRODS Object Store on Galaxy Server: Application of iRODS to a Real Time, Multi-user System. Penn State University. Galaxy is an open-source platform for data analysis that enables users to 1) Use tools from various domains through its graphical web interface, 2) Run code in interactive environments such as Jupyter or RStudio, 3) Manage data by sharing and publishing results, workflows, and visualizations, and 4) Ensure reproducibility by capturing the necessary information to repeat data analyses. To store data Galaxy utilizes ObjectStore as its data virtualization layer. It abstracts Galaxy’s domain logic for data persistence technology. Currently, Galaxy mainly uses a Disk ObjectStore for data persistence. To extend Galaxy’s data persistence capabilities, we had previously extended Galaxy’s ObjectStore to support iRODS. In this work, we discuss the steps in deploying iRODS Object Store on the USA-based Galaxy server (usegalaxy.org) and the challenges we faced. To the best of our knowledge, after CyVerse, this is one of the few applications of iRODS to a real time, multi-user system.

Bookending this year’s UGM are two in-person events for those who hope to learn more about iRODS. On June 13, the Consortium is offering beginner and advanced training sessions. After the conference, on June 16, users have the chance to register for a troubleshooting session, devoted to providing one-on-one help with an existing or planned iRODS installation or integration.

Registration for both physical and virtual attendance will remain open until the beginning of the event. Learn more at this year’s UGM at irods.org/ugm2023

About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

BioData Catalyst manuscript published in JAMIA

On May 16th, the Journal of the American Medical Informatics Association (JAMIA) published the BioData Catalyst (BDC) Consortium’s manuscript, authored by several members of the BDC Consortium, about the ecosystem and some of its early successes.

RENCI’s role in the BDC ecosystem is to develop tools and apps for machine learning; deep learning models; semantic search; and visualizing, annotating and analyzing biomedical images.

Reaching consensus around the manuscript’s content illustrates BDC’s commitment to collaboration and building a strong and thriving community of practice. The publication provides public acknowledgment of NHLBI’s continued investment and contributions to data science and establishes the seminal reference for publications of future scientific advances destined to come from research conducted in the BDC ecosystem.

Read more on the BDC website: https://biodatacatalyst.nhlbi.nih.gov/latest-updates/2023-05-19_bdc-marker-paper-announcement.

Ashok Krishnamurthy Named Director of RENCI

Ashok Krishnamurthy has been appointed director of the Renaissance Computing Institute (RENCI). Krishnamurthy previously served as RENCI’s deputy director and has been engaged in the leadership team that has guided RENCI over the past 10 years. In addition, he has served as interim director of RENCI since last July when previous director Stan Ahalt assumed deanship of the new UNC School of Data Science and Society.

“The momentum behind RENCI’s projects and research has grown rapidly over the last few years, and the road ahead is limitless,” said Ashok Krishnamurthy, director of RENCI. “I’m excited to step into this role and collaborate with our expanding workforce to build a collective vision and roadmap for the future.”

Krishnamurthy has decades of experience as both a researcher and an administrator, with special emphasis on forming and advancing interdisciplinary teams. He collaborates with researchers in informatics, health, and social sciences to develop projects and programs that leverage the power of data science and scalable computing to solve challenging problems. He advises students and mentors post-doctoral scholars and junior investigators and is deeply involved in managing and enhancing research partnerships with other institutions and businesses. 

Krishnamurthy holds a bachelor’s degree in electrical engineering from the Indian Institute of Technology, and a master’s and PhD in electrical and computer engineering from the University of Florida. Krishnamurthy’s research over the years has been funded by the National Science Foundation, the National Institutes of Health, the Department of Defense, the Defense Advanced Research Projects Agency, and the Department of Energy. 

The mission of RENCI is to develop and deploy data science cyberinfrastructure that helps researchers in academia, government, and business use data to drive discoveries, make informed decisions, and spur economic development. As director, Krishnamurthy is responsible for all operations that bring that mission to life, including managing five research groups of over 80 researchers and the Advanced Cyberinfrastructure Support Group (ACIS). In addition, he holds appointments as a research professor in the Department of Computer Science and as co-director for Informatics and Data Science (IDSci) at the North Carolina Translational and Clinical Sciences Institute (NC TraCS). 

“Over the next few years, in addition to our continued excellence in fields such as climate science, clinical informatics, and network research infrastructure, I see RENCI emerging as a leader in team science,” said Krishnamurthy. “With our years of experience in creating efficient interdisciplinary teams, RENCI is uniquely positioned to provide expertise on incorporating the most effective research practices, coordination and outreach efforts, and technology and tools to projects on UNC’s campus, in the Triangle, and beyond.”

“[Ashok] brings cross-cutting and highly technical expertise, deep collaborative ties across Carolina, the region, and the world, and a wonderful leadership style,” said Penny Gordon-Larsen, interim vice chancellor for research, in an email sent to UNC deans, department heads, and directors. “We are confident that under his leadership, RENCI will continue its trajectory of excellence in the development and deployment of advanced technologies to advance research discoveries and practical innovations.” 

Council for Scientific and Industrial Research joins iRODS Consortium

Collaboration supports data management for research and economic development in South Africa

CHAPEL HILL, NC – The Council for Scientific and Industrial Research (CSIR) has joined the iRODS Consortium, the membership-based organization that leads development and support of the integrated Rule-Oriented Data System (iRODS).

CSIR is a research organization that advances technologies to accelerate socioeconomic prosperity in South Africa. Established through an Act of Parliament in 1945, CSIR supports public and private sectors through directed research in areas such as health, mobility, agriculture, manufacturing, mining, and chemistry.

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization.

Efficient and effective handling of data is crucial to the research enterprise and supporting data management is an important part of CSIR’s function. CSIR provides high-performance computing capability, high-speed network capacity, and a national research data infrastructure to provide seamless access for the research and education communities of South Africa through its National Integrated Cyber Infrastructure System (NICIS) initiative. NICIS includes the Centre for High Performance Computing (CHPC), which provides massive parallel processing capabilities and services; the South African National Research Network (SANReN), which provides high-speed connectivity and advanced networking services; and the Data Intensive Research Initiative of South Africa (DIRISA), which supports sound data management practices and efficient data-driven scientific and engineering discoveries.

“At DIRISA, we use iRODS to manage data stored in our infrastructure,” said Sthembiso Mkhwanazi, Senior Project Manager for DIRISA. “iRODS enables DIRISA to effectively manage users’ data.”

The iRODS Consortium provides a production-ready distribution and professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“We have been working with CSIR for a few years and are excited they have joined the Consortium,” said Terrell Russell, executive director of the iRODS Consortium. “Their projects and collaborations are very important to the socioeconomic prosperity of South Africa and we are happy to be a part of that ongoing effort.”

In addition to CSIR, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, CUBI at Berlin Institute of Health, DataDirect Networks,  InfiniteTactics, KU Leuven, Maastricht University, Minnesota Supercomputing Institute at the University of Minnesota, the National Institute of Environmental Health Sciences, NetApp, Omnibond, OpenIO, RENCI, SoftIron, the SURF cooperative, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about CSIR, please visit https://www.csir.co.za.

ResearchSpace and iRODS Partner to Enable Virtual File Storage and Metadata Solutions for Institutional Research Data Management

ResearchSpace, provider of the RSpace electronic research notebook and sample management system, today announced an integration with the iRODS, a data discovery, workflow automation, secure collaboration, and data virtualization platform, to ensure the integrity of links between RSpace and the wide range of resources it connects with via the virtualized file tracking and metadata management enabled by iRODS.

RSpace is a digital research content creation and management system which integrates with a range of data sources and outputs such as file storage systems, specialist research tools, and generic apps like Microsoft 365 and Slack to enhance researcher workflow. The integration with iRODS addresses the Achilles’ heel of this otherwise powerful ecosystem; namely that a dependence on file paths results in a risk to the integrity of links between RSpace and connected tools. The integration with iRODS addresses this weakness by utilizing the capability of iRODS to assign identifiers to files. The next stage of the integration will focus on enabling support for management of metadata associated with content created and managed in RSpace.

 “We were excited to see the level of interest in an iRODS-RSpace integration at this year’s iRODS User Group Meeting. The initial integration addresses the ‘broken links problem’ universities view as a major concern with digital research platforms that enable connectivity with multiple resources,” said Terrell Russell, Executive Director of the iRODS Consortium. “Future work utilizing iRODS’ metadata management capability will make iRODS-RSpace an even more compelling end to end file and data solution supporting integrated management of files, data, and metadata. This will serve the growing demand for a comprehensive data management solution we’re seeing in universities around the world.”

 “We’ve been interested in a collaboration with iRODS for a long time,” said Rory Macneil, Chief Executive of ResearchSpace. The trigger to move forward was a recent tender for electronic research notebooks issued by a leading European university. The core requirement was for ‘an electronic research notebook linked to external storage and data management solutions designed so as to ensure maximum integrity of the links between the ERN and the external solutions via integration with the iRODS virtual file management system’. Following success with the tender and growing awareness of the iRODS integration we’ve seen a surge in interest from other universities that understand the importance of a comprehensive approach to research data management.”

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About ResearchSpace

ResearchSpace provides a digital research platform comprised of tightly integrated electronic research notebook and sample management modules. RSpace integrates with a range of data sources and outputs such as file storage systems, specialist research tools, and generic apps like Microsoft 365 and Slack to enhance researcher workflows. The RSpace ERN & Inventory platform is currently in use at institutions across the globe.

Spectra Logic and iRODS Consortium Partner to Provide Glacier-Like Tier of Storage for Data-Driven Organizations

Spectra Logic, a global leader in data management and data storage solutions, today announced a collaboration with the iRODS Consortium to create a joint solution built upon Spectra Vail® software, Spectra BlackPearl® S3 storage and the iRODS data management platform. The combined solution enables customers to use industry-standard cloud interfaces for on-premises disk and on-premises glacier* storage with object tape, while unlocking multi-site/multi-cloud capabilities.

The iRODS integration with BlackPearl S3 allows organizations to leverage the performance and cost benefits of on-premises glacier storage as disk or tape to access “cold” data and automate workflows, while the integration with Vail provides access to cloud services across multiple clouds. Spectra Vail software and BlackPearl S3 storage have been tested with the iRODS S3 storage resource plugin to fully support the Amazon® S3 abstraction that iRODS delivers. The new functionality is available as part of the iRODS 4.2.11 release.

“Organizations that need an on-prem glacier tier will see many benefits with the interoperability between BlackPearl S3 and the iRODS data management platform,” said David Feller, Spectra Logic vice president of product management and solutions engineering. “Organizations will be able to take full advantage of on-prem storage and the public, private and hybrid cloud by leveraging the Vail and iRODS integration.”

“The combined Spectra Logic and iRODS solution will enable organizations that rely heavily on tape to archive petabytes of valuable digital data economically and efficiently in a glacier-like tier,” said Terrell Russell, executive director of the iRODS Consortium. “We look forward to a lasting collaboration with Spectra Logic that will help our mutual customers drive innovation and accelerate business results.”

*Amazon Glacier is a registered trademark of Amazon Technologies, Inc.

# # #

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About Spectra Logic
Spectra Logic develops a full range of Attack Hardened™ data management and data storage solutions for a multi-cloud world. Dedicated solely to data storage innovation for more than 40 years, Spectra Logic helps organizations modernize their IT infrastructures and protect and preserve their data with a broad portfolio of solutions that enable them to manage, migrate, store and preserve business data long-term, along with features to make them ransomware resilient, whether on-premises, in a single cloud, across multiple clouds, or in all locations at once. To learn more, visit www.spectralogic.com.

RENCI named as partner in NSF award to study coastal community resilience

RENCI’s expertise in hurricane modeling will be a key asset in the University of Delaware’s $16.5 million NSF award to study interplay between coastal resilience, equity, and economic prosperity

Adapted with permission from University of Delaware.

It’s been five years since Hurricane Harvey brought catastrophic rain, extensive flooding and more than $125 billion in damages to Texas in August 2017.

Other hurricanes and hazards have occurred since, with increased frequency and intensity. Yet coastal communities nationwide continue to grapple with disaster resilience.

How to cope with these hazards is complicated. The issues are multi-faceted: people who live along the coast have homes, jobs, families, while businesses have infrastructure and employees, all of which contribute to the region’s economic prosperity. When disasters occur, not everyone is affected equally. And, as climate change worsens, the effects of hazards like hurricanes are increasingly felt farther inland.

Now, the Disaster Research Center at the University of Delaware has been awarded $16.5 million from the National Science Foundation to lead a multi-institutional effort exploring the tension and tradeoffs between a community’s goals of managing hurricane risk while also achieving equity and economic prosperity.

The UD-led hub—Coastal Hazards, Equity, Economic prosperity and Resilience (CHEER)—is one of five NSF-funded projects announced recently as part of the agency’s Coastlines and People program, which is infusing $51 million in research funding to protect the natural, social and economic resources of U.S. coasts, and to help create more resilient coastal communities.

The work will require intense input from public policy, sociology, meteorology, engineering and other disciplines.

“The Disaster Research Center at UD has a long and successful track record of interdisciplinary research, analysis and problem-solving focused on some of society’s most complex challenges, so it is fitting that they will lead this latest effort,” said UD President Dennis Assanis. “Through collaboration with institutions nationwide, the CHEER hub will help make coastal communities more resilient in the face of growing threats from climate change.”

The five-year project will be led by Rachel Davidson, a core DRC faculty member and UD professor of civil and environmental engineering. Co-principal investigators include Sarah DeYoung, core DRC faculty member and associate professor of sociology and criminal justice at UD; Linda Nozick, professor and director of civil and environmental engineering at Cornell University; Brian Colle, professor and division head of atmospheric sciences at Stony Brook University; and Meghan Millea, professor of economics at East Carolina University.

Davidson explained that addressing the desire for economic prosperity typically hasn’t been considered when examining resilience to past hazards.

“We’ve framed the problem narrowly and said people shouldn’t build in these places, they should just be smart … but minimizing risk is never a community’s only goal,” said Davidson. “The question we’re asking now is whether there are ways to facilitate the growth that communities want in a way that’s smart enough that we’re not creating dangerous situations down the road.”

Reframing the problem of hurricane risk

There is much to consider. Over 128 million people in the United States live along the coast, according to the National Oceanic and Atmospheric Administration’s Office for Coastal Management.

Researchers involved in the work will advance methods to model long-term hurricane hazards in a way that accounts for climate change and incorporates multiple hazards, such as wind, rain, storm surge and waves. They will develop a framework to design and evaluate different policy interventions for achieving sustainable equity, prosperity and resilience.

The project is about looking at these interactions holistically and reframing the problem of hurricane risk as part of a community’s normal activities and development to gain greater insight on possible solutions.

“If you only focus on the problem from shortly before the hurricane winds start to a little bit after the winds stop and the cleanup, you could miss important information, such as how that community has grown over the last 50 years,” said Davidson.

For example, development in the Houston, Texas area over recent decades fueled great economic growth, including in Harris County, where the population grew by 31% during a 15-year period. The area economy grew, and population increases led to greater resources being available to the region for resilience measures. However, that development also increased flooding and exacerbated losses during Hurricane Harvey in 2017, because many of the newer neighborhoods were built in floodplains and former natural areas were by then covered with asphalt, leaving the water nowhere to go.

There are many ways the government, the insurance industry or other agencies can intervene to help communities achieve better hurricane risk management (think buyouts, grants, national flood insurance programs or post-event response and recovery investments). But it’s not always obvious what combination of options makes the most sense.

The research team will create computer models to evaluate how different people and agencies interact and how specific policies will play out in real life, to help communities and other agencies make better decisions.

“We’re looking for a win-win situation. We’re looking for policies where homeowners in general are better off, the insurance industry is profitable, and government agencies don’t get stuck with large, unplanned expenditures,” said Davidson.

Equity in focus

Woven throughout the project is the issue of equity. It’s been well documented that poorer people, minorities and others, such as those who are medically fragile, tend to experience worse impacts and have a harder time recovering from losses. Even mitigation, response and recovery processes designed to help can be inequitable. Davidson and her collaborators want to change this.

“One of the things we’re focusing on is renters and how they experience disasters differently,” she said.

For example, if a policy goal is to avoid as much dollar loss as possible, in practice that might mean investing mostly in the wealthiest properties because strengthening a single house could save a lot of money, Davidson said. Whereas, if equity is a goal, perhaps minimizing the percentage of loss each household experiences is better or, in the case of renters, making sure as few people as possible are displaced from their homes.

This ability to add equity into the equation is new, a result of a shift in thinking, Davidson continued.

“We went from thinking about what would be best for a community overall to realizing that in real life, each household, government agency and insurer is making decisions from their own perspective,” she said. “We started looking at each stakeholder as an individual and representing how they make their own decisions and interact with others. This sets us up to address equity because we’re already explicitly looking at different viewpoints. Now we can easily ask how the loss will be distributed across different households and communities and whether it is equitable.”

Overarching this work is climate change, a dynamic factor that may look different 30 years from now than it does today. For instance, there is growing evidence that hurricanes are causing more inland precipitation and damage as our climate is changing, but hurricanes often have been considered a worry only for people living on the coast. Considering inland effects of hurricanes will help expand this viewpoint. Additionally, the researchers plan to use computer vision and machine learning to automate the ability to create detailed descriptions of existing houses, so that it can be used to better estimate losses.

Interdisciplinary and cross-institution expertise

Jointly funded with the Established Program to Stimulate Competitive Research (EPSCoR), the work will include contributions from researchers at UD, Cornell University, Boston University, University of Florida, University of North Carolina at Chapel Hill, Stony Brook, University of Oklahoma, East Carolina University, Texas A&M and North Carolina State University. 

In forging the team, Davidson said they started with a core group of researchers who had collaborated previously, then added expertise from UD and externally. In addition to Davidson and Joe Trainor, DRC core faculty and professor in the Biden School of Public Policy and Administration, a 2019 and 2020 disaster science cluster hire at UD added A.R. Siders, assistant professor in the Biden School and geography, Shangjia Dong, assistant professor of civil and environmental engineering, and DeYoung to DRC’s core faculty, all of whom bring critical perspectives to the project.

“That six UD faculty from four colleges are on the grant, three from the cluster hire, really highlights the successes that come from supporting interdisciplinary work at UD, for the students, the science, and for making an impact in communities,” said Tricia Wachtendorf, DRC co-director and sociology professor, who is also on the project.

The project will provide opportunities for postdocs to undergraduate students to participate in research-based mentoring and quick response fieldwork training, including summer fellows from the Bill Anderson Fund and the McNair Scholars, national organizations that support students from underrepresented groups.

The group is noticeably diverse, in terms of gender, race, age, discipline and geography—an effort Davidson called deliberate, to ensure the team’s values are reflected in “the way we work, the kind of work we do, and the students we bring on the project.”

A partnership with SimCenter, an NSF-funded center, will help ensure the tools created continue beyond the grant-funding period, while connections with organizations, such as FEMA, will help transfer the team’s results to practice and DRC IT! modules will help engage researchers and the public.

As the UNC-Chapel Hill participant in this large multi-institutional NSF Hub, the Renaissance Computing Institute (RENCI) will contribute to two main project thrusts: 1) characterization of coastal hazards and risks, and 2) management of knowledge used and generated by the Hub.  Computer modeling of coastal hazards and risk levels will provide core data inputs to the Hub’s physical structure losses and economic estimation models, leveraging RENCI’s long-standing expertise in applications of the ADCIRC storm surge model and in statistical modeling of hurricane impacts. The Hub will generate substantial amounts of data, information, and new knowledge that need to be shared across the Hub thrust areas, as well as to external groups and teams. RENCI’s expertise in data and knowledge management will be essential to the success of the CHEER Hub awardees working as an interdisciplinary team, in efforts to develop the broader impacts envisaged by the Hub, and in assessing project progress towards its goals.

“RENCI is very excited to continue our research collaboration with UD and expand our work and expertise into new areas of research with potentially very important outcomes,” said investigator Brian Blanton, Director of Earth Data Sciences (EDS) at RENCI. “Coastal North Carolina’s experiences with hurricane disasters will certainly help drive some of the research, with the potential to improve our living in the coastal zone with better understanding of how policies and science can better interact.”

NSF FABRIC Project Completes Phase 1, Enabling Early Testing of Unprecedented Large-scale Network Experiments

All Phase 1 sites and connections have been successfully installed to create the basis for the international FABRIC infrastructure

The NSF-funded FABRIC project has made steady progress establishing the groundbreaking network testbed infrastructure to reimagine the way large amounts of data are generated, stored, analyzed, and transmitted across the world. The team recently announced the completion of Phase 1 of the project, marking the successful installation of all Phase 1 sites after overcoming supply chain delays and other challenges due to COVID-19. With the required hardware, software, storage, and fiber optic connections in place, the FABRIC system is available for early users to build and test novel large-scale experiments. 

FABRIC aims to support a wide variety of cyberinfrastructure research activities aimed at reimagining what the future internet may do for distributed protocols, systems, cybersecurity, and science applications. Today, affordable advanced computational and storage technologies are far more accessible and pervasive than when the internet was first built, and FABRIC capitalizes on these technological advances to build an infrastructure where the new internet can be reimagined and tried at scale.

“FABRIC is based on the idea that the ‘intelligence’ of a network–storage and computational programmability–does not have to be limited to the edges, but rather, data storage and processing can be integrated into the network, something that the internet doesn’t support today,” said FABRIC principal investigator (PI) Ilya Baldin, Director of Network & Research Infrastructure at RENCI. Baldin further elaborated that incorporating data storage and processing into the infrastructure allows users unprecedented freedom to design new types of experimental networks with different properties and test for improvements over current networks against unique scientific workloads. 

The FABRIC infrastructure includes the development sites at the Renaissance Computing Institute (RENCI)/UNC-Chapel Hill, University of Kentucky (UK), and Lawrence Berkeley National Laboratory (Berkeley Lab) and the production sites at Clemson University, University of California San Diego (UCSD), Florida International University (FIU), University of Maryland/Mid-Atlantic Crossroad (MAX), University of Utah, University of Michigan, University of Massachusetts Amherst/Massachusetts Green High Performance Computing Center (MGHPCC), Great Plains Network (GPN), National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign (UIUC), and Texas Advanced Computing Center (TACC). The FABRIC testbed is built on Energy Sciences Network’s (ESnet’s) network and fiber infrastructure, with production sites on its backbone in Washington, DC; Starlight; Salt Lake City; and Dallas. 

“The team has had a challenging job coordinating the construction of FABRIC over ESnet’s fiber network while the network itself was being upgraded simultaneously to ESnet6. The teamwork within the project and collaborations with the research and education network community have been very helpful in completing this phase,” said FABRIC co-PI Inder Monga, Executive Director of ESnet. 

FABRIC has over 200 users on the system testing the feasibility of new infrastructure and performing other experiments at the production sites. With the completion of Phase 1, the FABRIC team has made significant progress toward improving and enhancing the user experience–FABRIC now has operational user services, including graphical and programmatic user interfaces for accessing the system, user feedback processes, monthly tutorials to guide new users through registration, and an interactive user forum to encourage community support and engagement. Additionally, the FABRIC team has developed a measurement framework for the purpose of measuring facility operational parameters on a very fine level, and users play an active role in providing feedback on the features of the framework that they find useful. 

“This approach of engaging users throughout the development of FABRIC enables continuous optimization of the system, and significant improvements have been made over the past year alone thanks to user feedback,” said FABRIC co-PI Anita Nikolich, Director of Research and Technology Innovation and Research Scientist at the UIUC School of Information Sciences. 

Further emphasizing the significance of early users and other key contributors, FABRIC co-PI Kuang-Ching (KC) Wang, Professor and C. Tycho Howle Endowed Chair of Collaborative Computing at Clemson University, stated, “FABRIC’s true value resides in the scientific experiments it enables. Through engagement with the many early FABRIC users, we have gained valuable understanding of what the researchers need to be productive in developing and running experiments with confident and repeatable results. Our FABRIC student interns have also been highly instrumental in partnering with scientists from different disciplines and creating a wide range of templates that are ready to help early users now.”

Additionally, the FABRIC infrastructure can now support much richer experiments than what was possible just a year ago. With the installation of new sites and the completion of control software, experiments are more complex, robust, and realistic, allowing users to tap into more resources at more locations. 

“FABRIC is providing us an opportunity to explore ways to integrate AI-driven security algorithms into the lowest levels of the network infrastructure,” said FABRIC user Phil Porras, a Program Director, SRI Fellow, and leader of SRI’s Internet Security Group in the Computer Science Laboratory at SRI International. “We envision future networks with the intelligence to combat malicious traffic within the packet switching hardware itself, and FABRIC has been extremely useful in accelerating this research.”

Key principles of FABRIC’s design include flexibility, scalability, and expandability. In Phase 2, the FABRIC team plans to incorporate additional sites across the country and develop high-speed connectivity between them, allowing for increasingly richer experiments. Additionally, the team is building the functionality for hybrid operation to allow users to scale their experiments beyond the testbed and connect their experiments to the real world. While previous testbeds allowed for either isolated ‘sandbox’ experiments or observational real-world experiments on the internet, FABRIC will provide the ability for both; in addition, FABRIC will help bridge the experiments from a sandbox environment to real-world experiments on the Internet, allowing researchers to test their ideas in a controlled environment and then see how they play out in the real world. As more and more scientists want to work with real-time streaming data, FABRIC will become even more important by providing a place where experiments with scalable real-time in-network processing and filtering of data can be undertaken. This will pave the way to build future production networks friendly to scientific data needs, accelerating discovery and innovation in many disciplines.

“Connecting FABRIC to national research facilities, testbeds, cloud providers, and the current internet will enable a unique environment for experimentation with real-world users and data,” said FABRIC co-PI Jim Griffioen, Professor of Computer Science and Director of the Laboratory for Advanced Networking at UK. “By interconnecting existing facilities and infrastructure, FABRIC will encourage developers to imagine completely new types of services that can be deployed in support of real-world user communities.”

FABRIC is expected to be fully operational and open to researchers in October 2023. 

FABRIC is supported in part by a Mid-Scale RI-1 NSF award under Grant No. 1935966, and the core team consists of researchers from the Renaissance Computing Institute (RENCI) at UNC-Chapel Hill, University of Illinois-Urbana Champaign (UIUC), University of Kentucky (UK), Clemson University, Energy Sciences Network (ESnet) at Lawrence Berkeley National Laboratory (Berkeley Lab), and Virnao LLC. 

EduHeLx: A Cloud-based Programming Platform for Data Science Education

The EduHeLx pilot experiment informed future thinking about incorporating cloud-based technologies in UNC-CH courses, including courses in the new UNC-CH School of Data Science & Society (SDSS)

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. HeLx offers a suite of tools, capabilities, and workspaces enabling research communities to deploy custom data science workspaces securely in the cloud. 

EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. Previously, students were required to download a course’s programming software onto their own computers, and instructors had to work one-on-one with students to troubleshoot issues throughout the semester; this was so time-consuming that it took away from teaching time and derailed course schedules, especially in computer science courses with 250+ students. With EduHeLx, infrastructure setup is not required by instructors or students–students can access a course’s programming software in the cloud without the need to download it, thus saving a significant amount of class time. 

Read more

The Biomedical Data Translator Consortium Provides Progress Updates in Latest Companion Publications

The Translator Consortium details new features, functionality, and applications of the Translator system and its underlying data model, the Biolink Model

The Biomedical Data Translator (Translator) Consortium has announced recent progress in two companion publications–“Progress toward a universal biomedical data translator” and “Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science”–in Clinical and Translational Science.

The Translator system is a knowledge graph-based platform built for combining, searching, and ‘reasoning’ over biomedical data to derive knowledge and accelerate clinical discovery. The Translator project, funded by the National Center for Advancing Translational Sciences (NCATS), addresses challenges presented by the exponential growth of diverse, siloed, and non-standardized data sets. In the Translator system, rich data from many heterogeneous sources are brought together in one place in a standardized format, allowing users to pose novel scientific inquiries and accelerating innovative translational research in ways not previously possible. The ability to search across different data types and knowledge sources is a result of the Translator Consortium’s adherence to common ontologies and standards, including the Biolink Model.

In “Progress toward a universal biomedical data translator,” the authors detail the system’s updated architecture and capabilities developed since the Consortium’s 2018 companion publications, “Toward a universal biomedical data translator” and “The Biomedical Data Translator Program: Conception, culture, and community.” The most notable update is the launch of a fully unified and harmonized system, whereas Translator previously consisted of disconnected knowledge graphs and tools. This achievement was accomplished largely through the adoption and implementation of standards and references across teams for the integration of new knowledge sources.

“What really sets Translator apart is the amount of data being integrated,” said Anne Thessen, PhD, semantic engineer at the University of Colorado Anschutz Medical Campus. “Bringing together data that already exists might seem easy, but all the work of deciphering meaning and interpreting the data on top of getting hundreds of collaborators working together is one of the most challenging projects I’ve worked on.”

Karamarie Fecho, RENCI collaborator and biomedical consultant at Copperline Professional Solutions added, “Translator is unique both sociologically and technically. Sociologically, the program has fostered an atmosphere of true collaboration, where Translator team members are eager to engage in discussion and share knowledge and resources, as well as to collectively troubleshoot Translator tools and services, regardless of who ‘owns’ them. In terms of technology, Translator is open source, and Translator team members have developed and implemented novel approaches for openly exposing patient data in a manner that is not only regulatory compliant, but completely devoid of regulatory hurdles from the end user perspective.”

In the companion publication, “Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science,” the authors describe the Biolink Model and its role in the Translator project and other initiatives. Biolink Model provides a universal, open source data model intended to standardize ontologies, naming conventions for nodes/entities in knowledge graphs, and the relationships between entities. Additionally, it maps comparable elements between ontologies, allowing disparate data sets to be compared and searched across. 

“It is inspiring to see experts from a wide variety of domains communicate and collaborate on a shared model,” said Sierra Moxon, data architect and software developer at Lawrence Berkeley National Laboratory (LBNL). “Biolink Model establishes a common language to communicate with, and that’s the first step to solving hard problems together.”

“One of the main needs of Translator was a common dialect for organizing, representing, and exchanging knowledge between knowledge providers, subject matter experts, and machines,” said Deepak Unni, a former Software Developer at LBNL. “Biolink Model addresses this need by providing a harmonized data model that tackles challenges with knowledge representation and provides a foundation upon which intelligent applications can be built.”  

About the NCATS Biomedical Data Translator Program

The NCATS Biomedical Data Translator Program was launched in October 2016, with funding from the National Center for Advancing Translational Sciences, a center within the National Institutes of Health. This work is supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under grant numbers: Other Transaction Awards OT2TR003434, OT2TR003436, OT2TR003428, OT2TR003448, OT2TR003427, OT2TR003430, OT2TR003433, OT2TR003450, OT2TR003437, OT2TR003443, OT2TR003441, OT2TR003449, OT2TR003445, OT2TR003422, OT2TR003435, OT3TR002026, OT3TR002020, OT3TR002025, OT3TR002019, OT3TR002027, OT2TR002517, OT2TR002514, OT2TR002515, OT2TR002584, OT2TR002520; and contract number 75N95021P00636. Additional funding was provided by the Intramural Research Program at NCATS (ZIA TR000276-05). For a complete list of Translator teams and collaborators, visit https://ncats.nih.gov/translator/projects. Any opinions expressed in this press release are those of the Translator community writ large and do not necessarily reflect the views of NCATS, individual Translator team members, or affiliated organizations and institutions.