Spectra Logic and iRODS Consortium Partner to Provide Glacier-Like Tier of Storage for Data-Driven Organizations

Spectra Logic, a global leader in data management and data storage solutions, today announced a collaboration with the iRODS Consortium to create a joint solution built upon Spectra Vail® software, Spectra BlackPearl® S3 storage and the iRODS data management platform. The combined solution enables customers to use industry-standard cloud interfaces for on-premises disk and on-premises glacier* storage with object tape, while unlocking multi-site/multi-cloud capabilities.

The iRODS integration with BlackPearl S3 allows organizations to leverage the performance and cost benefits of on-premises glacier storage as disk or tape to access “cold” data and automate workflows, while the integration with Vail provides access to cloud services across multiple clouds. Spectra Vail software and BlackPearl S3 storage have been tested with the iRODS S3 storage resource plugin to fully support the Amazon® S3 abstraction that iRODS delivers. The new functionality is available as part of the iRODS 4.2.11 release.

“Organizations that need an on-prem glacier tier will see many benefits with the interoperability between BlackPearl S3 and the iRODS data management platform,” said David Feller, Spectra Logic vice president of product management and solutions engineering. “Organizations will be able to take full advantage of on-prem storage and the public, private and hybrid cloud by leveraging the Vail and iRODS integration.”

“The combined Spectra Logic and iRODS solution will enable organizations that rely heavily on tape to archive petabytes of valuable digital data economically and efficiently in a glacier-like tier,” said Terrell Russell, executive director of the iRODS Consortium. “We look forward to a lasting collaboration with Spectra Logic that will help our mutual customers drive innovation and accelerate business results.”

*Amazon Glacier is a registered trademark of Amazon Technologies, Inc.

# # #

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About Spectra Logic
Spectra Logic develops a full range of Attack Hardened™ data management and data storage solutions for a multi-cloud world. Dedicated solely to data storage innovation for more than 40 years, Spectra Logic helps organizations modernize their IT infrastructures and protect and preserve their data with a broad portfolio of solutions that enable them to manage, migrate, store and preserve business data long-term, along with features to make them ransomware resilient, whether on-premises, in a single cloud, across multiple clouds, or in all locations at once. To learn more, visit www.spectralogic.com.

RENCI named as partner in NSF award to study coastal community resilience

RENCI’s expertise in hurricane modeling will be a key asset in the University of Delaware’s $16.5 million NSF award to study interplay between coastal resilience, equity, and economic prosperity

Adapted with permission from University of Delaware.

It’s been five years since Hurricane Harvey brought catastrophic rain, extensive flooding and more than $125 billion in damages to Texas in August 2017.

Other hurricanes and hazards have occurred since, with increased frequency and intensity. Yet coastal communities nationwide continue to grapple with disaster resilience.

How to cope with these hazards is complicated. The issues are multi-faceted: people who live along the coast have homes, jobs, families, while businesses have infrastructure and employees, all of which contribute to the region’s economic prosperity. When disasters occur, not everyone is affected equally. And, as climate change worsens, the effects of hazards like hurricanes are increasingly felt farther inland.

Now, the Disaster Research Center at the University of Delaware has been awarded $16.5 million from the National Science Foundation to lead a multi-institutional effort exploring the tension and tradeoffs between a community’s goals of managing hurricane risk while also achieving equity and economic prosperity.

The UD-led hub—Coastal Hazards, Equity, Economic prosperity and Resilience (CHEER)—is one of five NSF-funded projects announced recently as part of the agency’s Coastlines and People program, which is infusing $51 million in research funding to protect the natural, social and economic resources of U.S. coasts, and to help create more resilient coastal communities.

The work will require intense input from public policy, sociology, meteorology, engineering and other disciplines.

“The Disaster Research Center at UD has a long and successful track record of interdisciplinary research, analysis and problem-solving focused on some of society’s most complex challenges, so it is fitting that they will lead this latest effort,” said UD President Dennis Assanis. “Through collaboration with institutions nationwide, the CHEER hub will help make coastal communities more resilient in the face of growing threats from climate change.”

The five-year project will be led by Rachel Davidson, a core DRC faculty member and UD professor of civil and environmental engineering. Co-principal investigators include Sarah DeYoung, core DRC faculty member and associate professor of sociology and criminal justice at UD; Linda Nozick, professor and director of civil and environmental engineering at Cornell University; Brian Colle, professor and division head of atmospheric sciences at Stony Brook University; and Meghan Millea, professor of economics at East Carolina University.

Davidson explained that addressing the desire for economic prosperity typically hasn’t been considered when examining resilience to past hazards.

“We’ve framed the problem narrowly and said people shouldn’t build in these places, they should just be smart … but minimizing risk is never a community’s only goal,” said Davidson. “The question we’re asking now is whether there are ways to facilitate the growth that communities want in a way that’s smart enough that we’re not creating dangerous situations down the road.”

Reframing the problem of hurricane risk

There is much to consider. Over 128 million people in the United States live along the coast, according to the National Oceanic and Atmospheric Administration’s Office for Coastal Management.

Researchers involved in the work will advance methods to model long-term hurricane hazards in a way that accounts for climate change and incorporates multiple hazards, such as wind, rain, storm surge and waves. They will develop a framework to design and evaluate different policy interventions for achieving sustainable equity, prosperity and resilience.

The project is about looking at these interactions holistically and reframing the problem of hurricane risk as part of a community’s normal activities and development to gain greater insight on possible solutions.

“If you only focus on the problem from shortly before the hurricane winds start to a little bit after the winds stop and the cleanup, you could miss important information, such as how that community has grown over the last 50 years,” said Davidson.

For example, development in the Houston, Texas area over recent decades fueled great economic growth, including in Harris County, where the population grew by 31% during a 15-year period. The area economy grew, and population increases led to greater resources being available to the region for resilience measures. However, that development also increased flooding and exacerbated losses during Hurricane Harvey in 2017, because many of the newer neighborhoods were built in floodplains and former natural areas were by then covered with asphalt, leaving the water nowhere to go.

There are many ways the government, the insurance industry or other agencies can intervene to help communities achieve better hurricane risk management (think buyouts, grants, national flood insurance programs or post-event response and recovery investments). But it’s not always obvious what combination of options makes the most sense.

The research team will create computer models to evaluate how different people and agencies interact and how specific policies will play out in real life, to help communities and other agencies make better decisions.

“We’re looking for a win-win situation. We’re looking for policies where homeowners in general are better off, the insurance industry is profitable, and government agencies don’t get stuck with large, unplanned expenditures,” said Davidson.

Equity in focus

Woven throughout the project is the issue of equity. It’s been well documented that poorer people, minorities and others, such as those who are medically fragile, tend to experience worse impacts and have a harder time recovering from losses. Even mitigation, response and recovery processes designed to help can be inequitable. Davidson and her collaborators want to change this.

“One of the things we’re focusing on is renters and how they experience disasters differently,” she said.

For example, if a policy goal is to avoid as much dollar loss as possible, in practice that might mean investing mostly in the wealthiest properties because strengthening a single house could save a lot of money, Davidson said. Whereas, if equity is a goal, perhaps minimizing the percentage of loss each household experiences is better or, in the case of renters, making sure as few people as possible are displaced from their homes.

This ability to add equity into the equation is new, a result of a shift in thinking, Davidson continued.

“We went from thinking about what would be best for a community overall to realizing that in real life, each household, government agency and insurer is making decisions from their own perspective,” she said. “We started looking at each stakeholder as an individual and representing how they make their own decisions and interact with others. This sets us up to address equity because we’re already explicitly looking at different viewpoints. Now we can easily ask how the loss will be distributed across different households and communities and whether it is equitable.”

Overarching this work is climate change, a dynamic factor that may look different 30 years from now than it does today. For instance, there is growing evidence that hurricanes are causing more inland precipitation and damage as our climate is changing, but hurricanes often have been considered a worry only for people living on the coast. Considering inland effects of hurricanes will help expand this viewpoint. Additionally, the researchers plan to use computer vision and machine learning to automate the ability to create detailed descriptions of existing houses, so that it can be used to better estimate losses.

Interdisciplinary and cross-institution expertise

Jointly funded with the Established Program to Stimulate Competitive Research (EPSCoR), the work will include contributions from researchers at UD, Cornell University, Boston University, University of Florida, University of North Carolina at Chapel Hill, Stony Brook, University of Oklahoma, East Carolina University, Texas A&M and North Carolina State University. 

In forging the team, Davidson said they started with a core group of researchers who had collaborated previously, then added expertise from UD and externally. In addition to Davidson and Joe Trainor, DRC core faculty and professor in the Biden School of Public Policy and Administration, a 2019 and 2020 disaster science cluster hire at UD added A.R. Siders, assistant professor in the Biden School and geography, Shangjia Dong, assistant professor of civil and environmental engineering, and DeYoung to DRC’s core faculty, all of whom bring critical perspectives to the project.

“That six UD faculty from four colleges are on the grant, three from the cluster hire, really highlights the successes that come from supporting interdisciplinary work at UD, for the students, the science, and for making an impact in communities,” said Tricia Wachtendorf, DRC co-director and sociology professor, who is also on the project.

The project will provide opportunities for postdocs to undergraduate students to participate in research-based mentoring and quick response fieldwork training, including summer fellows from the Bill Anderson Fund and the McNair Scholars, national organizations that support students from underrepresented groups.

The group is noticeably diverse, in terms of gender, race, age, discipline and geography—an effort Davidson called deliberate, to ensure the team’s values are reflected in “the way we work, the kind of work we do, and the students we bring on the project.”

A partnership with SimCenter, an NSF-funded center, will help ensure the tools created continue beyond the grant-funding period, while connections with organizations, such as FEMA, will help transfer the team’s results to practice and DRC IT! modules will help engage researchers and the public.

As the UNC-Chapel Hill participant in this large multi-institutional NSF Hub, the Renaissance Computing Institute (RENCI) will contribute to two main project thrusts: 1) characterization of coastal hazards and risks, and 2) management of knowledge used and generated by the Hub.  Computer modeling of coastal hazards and risk levels will provide core data inputs to the Hub’s physical structure losses and economic estimation models, leveraging RENCI’s long-standing expertise in applications of the ADCIRC storm surge model and in statistical modeling of hurricane impacts. The Hub will generate substantial amounts of data, information, and new knowledge that need to be shared across the Hub thrust areas, as well as to external groups and teams. RENCI’s expertise in data and knowledge management will be essential to the success of the CHEER Hub awardees working as an interdisciplinary team, in efforts to develop the broader impacts envisaged by the Hub, and in assessing project progress towards its goals.

“RENCI is very excited to continue our research collaboration with UD and expand our work and expertise into new areas of research with potentially very important outcomes,” said investigator Brian Blanton, Director of Earth Data Sciences (EDS) at RENCI. “Coastal North Carolina’s experiences with hurricane disasters will certainly help drive some of the research, with the potential to improve our living in the coastal zone with better understanding of how policies and science can better interact.”

NSF FABRIC Project Completes Phase 1, Enabling Early Testing of Unprecedented Large-scale Network Experiments

All Phase 1 sites and connections have been successfully installed to create the basis for the international FABRIC infrastructure

The NSF-funded FABRIC project has made steady progress establishing the groundbreaking network testbed infrastructure to reimagine the way large amounts of data are generated, stored, analyzed, and transmitted across the world. The team recently announced the completion of Phase 1 of the project, marking the successful installation of all Phase 1 sites after overcoming supply chain delays and other challenges due to COVID-19. With the required hardware, software, storage, and fiber optic connections in place, the FABRIC system is available for early users to build and test novel large-scale experiments. 

FABRIC aims to support a wide variety of cyberinfrastructure research activities aimed at reimagining what the future internet may do for distributed protocols, systems, cybersecurity, and science applications. Today, affordable advanced computational and storage technologies are far more accessible and pervasive than when the internet was first built, and FABRIC capitalizes on these technological advances to build an infrastructure where the new internet can be reimagined and tried at scale.

“FABRIC is based on the idea that the ‘intelligence’ of a network–storage and computational programmability–does not have to be limited to the edges, but rather, data storage and processing can be integrated into the network, something that the internet doesn’t support today,” said FABRIC principal investigator (PI) Ilya Baldin, Director of Network & Research Infrastructure at RENCI. Baldin further elaborated that incorporating data storage and processing into the infrastructure allows users unprecedented freedom to design new types of experimental networks with different properties and test for improvements over current networks against unique scientific workloads. 

The FABRIC infrastructure includes the development sites at the Renaissance Computing Institute (RENCI)/UNC-Chapel Hill, University of Kentucky (UK), and Lawrence Berkeley National Laboratory (Berkeley Lab) and the production sites at Clemson University, University of California San Diego (UCSD), Florida International University (FIU), University of Maryland/Mid-Atlantic Crossroad (MAX), University of Utah, University of Michigan, University of Massachusetts Amherst/Massachusetts Green High Performance Computing Center (MGHPCC), Great Plains Network (GPN), National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign (UIUC), and Texas Advanced Computing Center (TACC). The FABRIC testbed is built on Energy Sciences Network’s (ESnet’s) network and fiber infrastructure, with production sites on its backbone in Washington, DC; Starlight; Salt Lake City; and Dallas. 

“The team has had a challenging job coordinating the construction of FABRIC over ESnet’s fiber network while the network itself was being upgraded simultaneously to ESnet6. The teamwork within the project and collaborations with the research and education network community have been very helpful in completing this phase,” said FABRIC co-PI Inder Monga, Executive Director of ESnet. 

FABRIC has over 200 users on the system testing the feasibility of new infrastructure and performing other experiments at the production sites. With the completion of Phase 1, the FABRIC team has made significant progress toward improving and enhancing the user experience–FABRIC now has operational user services, including graphical and programmatic user interfaces for accessing the system, user feedback processes, monthly tutorials to guide new users through registration, and an interactive user forum to encourage community support and engagement. Additionally, the FABRIC team has developed a measurement framework for the purpose of measuring facility operational parameters on a very fine level, and users play an active role in providing feedback on the features of the framework that they find useful. 

“This approach of engaging users throughout the development of FABRIC enables continuous optimization of the system, and significant improvements have been made over the past year alone thanks to user feedback,” said FABRIC co-PI Anita Nikolich, Director of Research and Technology Innovation and Research Scientist at the UIUC School of Information Sciences. 

Further emphasizing the significance of early users and other key contributors, FABRIC co-PI Kuang-Ching (KC) Wang, Professor and C. Tycho Howle Endowed Chair of Collaborative Computing at Clemson University, stated, “FABRIC’s true value resides in the scientific experiments it enables. Through engagement with the many early FABRIC users, we have gained valuable understanding of what the researchers need to be productive in developing and running experiments with confident and repeatable results. Our FABRIC student interns have also been highly instrumental in partnering with scientists from different disciplines and creating a wide range of templates that are ready to help early users now.”

Additionally, the FABRIC infrastructure can now support much richer experiments than what was possible just a year ago. With the installation of new sites and the completion of control software, experiments are more complex, robust, and realistic, allowing users to tap into more resources at more locations. 

“FABRIC is providing us an opportunity to explore ways to integrate AI-driven security algorithms into the lowest levels of the network infrastructure,” said FABRIC user Phil Porras, a Program Director, SRI Fellow, and leader of SRI’s Internet Security Group in the Computer Science Laboratory at SRI International. “We envision future networks with the intelligence to combat malicious traffic within the packet switching hardware itself, and FABRIC has been extremely useful in accelerating this research.”

Key principles of FABRIC’s design include flexibility, scalability, and expandability. In Phase 2, the FABRIC team plans to incorporate additional sites across the country and develop high-speed connectivity between them, allowing for increasingly richer experiments. Additionally, the team is building the functionality for hybrid operation to allow users to scale their experiments beyond the testbed and connect their experiments to the real world. While previous testbeds allowed for either isolated ‘sandbox’ experiments or observational real-world experiments on the internet, FABRIC will provide the ability for both; in addition, FABRIC will help bridge the experiments from a sandbox environment to real-world experiments on the Internet, allowing researchers to test their ideas in a controlled environment and then see how they play out in the real world. As more and more scientists want to work with real-time streaming data, FABRIC will become even more important by providing a place where experiments with scalable real-time in-network processing and filtering of data can be undertaken. This will pave the way to build future production networks friendly to scientific data needs, accelerating discovery and innovation in many disciplines.

“Connecting FABRIC to national research facilities, testbeds, cloud providers, and the current internet will enable a unique environment for experimentation with real-world users and data,” said FABRIC co-PI Jim Griffioen, Professor of Computer Science and Director of the Laboratory for Advanced Networking at UK. “By interconnecting existing facilities and infrastructure, FABRIC will encourage developers to imagine completely new types of services that can be deployed in support of real-world user communities.”

FABRIC is expected to be fully operational and open to researchers in October 2023. 

FABRIC is supported in part by a Mid-Scale RI-1 NSF award under Grant No. 1935966, and the core team consists of researchers from the Renaissance Computing Institute (RENCI) at UNC-Chapel Hill, University of Illinois-Urbana Champaign (UIUC), University of Kentucky (UK), Clemson University, Energy Sciences Network (ESnet) at Lawrence Berkeley National Laboratory (Berkeley Lab), and Virnao LLC. 

EduHeLx: A Cloud-based Programming Platform for Data Science Education

The EduHeLx pilot experiment informed future thinking about incorporating cloud-based technologies in UNC-CH courses, including courses in the new UNC-CH School of Data Science & Society (SDSS)

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. HeLx offers a suite of tools, capabilities, and workspaces enabling research communities to deploy custom data science workspaces securely in the cloud. 

EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. Previously, students were required to download a course’s programming software onto their own computers, and instructors had to work one-on-one with students to troubleshoot issues throughout the semester; this was so time-consuming that it took away from teaching time and derailed course schedules, especially in computer science courses with 250+ students. With EduHeLx, infrastructure setup is not required by instructors or students–students can access a course’s programming software in the cloud without the need to download it, thus saving a significant amount of class time. 

Emphasizing EduHeLx’s benefits, Ashok Krishnamurthy, Interim Director at RENCI and professor of Computer Science at UNC-Chapel Hill, stated, “We could concentrate on the instructional material for the course rather than spending time debugging installations on student’s laptops or other technology problems that unexpectedly crop up during the semester.” Additionally, EduHeLx allows instructors to send all course material through the platform, as well as enable auto-grading for exams and assignments, another time-saving capability that was not previously possible.

As a pilot experiment, UNC Information & Technology Services (ITS) assisted RENCI in applying EduHeLx as the educational platform in the UNC-Chapel Hill Computer Science course, COMP 116: Introduction to Scientific Programming, in Fall 2021 (Stan Ahalt/Ashok Krishnamurthy) and Spring 2022 (John Majikes). ITS provided technical support to deploy EduHeLx on UNC’s Google Cloud and assisted with adding 250+ student accounts; further, ITS provided financial support for the cloud costs to deploy EduHeLx and helped ensure security of the platform. RENCI and ITS both learned a great deal from this experiment, and it aided in informing ITS’ future engagement with cloud-based learning solutions. 

ITS, which manages the University’s Google Cloud Platform (GCP) environment, set up monitoring and essential guardrails to protect University data and advised RENCI on best practices for efficiently managing the resources, said Chuck Crews, Manager of ITS Cloud Operations Group, and John Godehn, ITS Systems Programmer/Specialist.  

“One of the compelling reasons to deploy in the cloud is that you only pay for what you’re using,” instead of paying for resources to sit idle, Crews said. Working in the cloud allows for resources to be deployed, and undeployed, as needed. 

Given the innovative capabilities EduHeLx enables for data science education, the newly launched UNC-Chapel Hill School of Data Science & Society (SDSS) is considering making extensive use of EduHeLx for a range of courses. Dr. Stan Ahalt, Inaugural Dean of the SDSS, reported that the School hopes to use the platform as a mechanism to provide data and computation to students very early in the program, both in existing courses cross-listed with other departments and in new courses developed by the SDSS. Further elaborating on the novel utility of EduHeLx, Ahalt stated, “The ability to stand up an educational platform and reliably provision the data and computation through a relatively simple process will enable us to engage new students seamlessly, as well as provide a tool that will grow with them as they progress in their coursework and research.” 

One of the main focuses of the SDSS is preparing students for an evolving workforce that increasingly demands data science literacy, which necessitates an interdisciplinary approach to integrate data science programming into a wide range of courses, including courses in the humanities and social sciences. Additionally, the SDSS places significant emphasis on their last ‘s’–society; by introducing data science and its applications to students with diverse disciplinary interests, the SDSS can better prepare them to effectively apply data science in their career of choice and maximize their impact on society. With its unique, accessible, and adaptable capabilities, EduHeLx has the potential to serve as a key resource to transform the SDSS’ vision into reality. 

The Biomedical Data Translator Consortium Provides Progress Updates in Latest Companion Publications

The Translator Consortium details new features, functionality, and applications of the Translator system and its underlying data model, the Biolink Model

The Biomedical Data Translator (Translator) Consortium has announced recent progress in two companion publications–“Progress toward a universal biomedical data translator” and “Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science”–in Clinical and Translational Science.

The Translator system is a knowledge graph-based platform built for combining, searching, and ‘reasoning’ over biomedical data to derive knowledge and accelerate clinical discovery. The Translator project, funded by the National Center for Advancing Translational Sciences (NCATS), addresses challenges presented by the exponential growth of diverse, siloed, and non-standardized data sets. In the Translator system, rich data from many heterogeneous sources are brought together in one place in a standardized format, allowing users to pose novel scientific inquiries and accelerating innovative translational research in ways not previously possible. The ability to search across different data types and knowledge sources is a result of the Translator Consortium’s adherence to common ontologies and standards, including the Biolink Model.

In “Progress toward a universal biomedical data translator,” the authors detail the system’s updated architecture and capabilities developed since the Consortium’s 2018 companion publications, “Toward a universal biomedical data translator” and “The Biomedical Data Translator Program: Conception, culture, and community.” The most notable update is the launch of a fully unified and harmonized system, whereas Translator previously consisted of disconnected knowledge graphs and tools. This achievement was accomplished largely through the adoption and implementation of standards and references across teams for the integration of new knowledge sources.

“What really sets Translator apart is the amount of data being integrated,” said Anne Thessen, PhD, semantic engineer at the University of Colorado Anschutz Medical Campus. “Bringing together data that already exists might seem easy, but all the work of deciphering meaning and interpreting the data on top of getting hundreds of collaborators working together is one of the most challenging projects I’ve worked on.”

Karamarie Fecho, RENCI collaborator and biomedical consultant at Copperline Professional Solutions added, “Translator is unique both sociologically and technically. Sociologically, the program has fostered an atmosphere of true collaboration, where Translator team members are eager to engage in discussion and share knowledge and resources, as well as to collectively troubleshoot Translator tools and services, regardless of who ‘owns’ them. In terms of technology, Translator is open source, and Translator team members have developed and implemented novel approaches for openly exposing patient data in a manner that is not only regulatory compliant, but completely devoid of regulatory hurdles from the end user perspective.”

In the companion publication, “Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science,” the authors describe the Biolink Model and its role in the Translator project and other initiatives. Biolink Model provides a universal, open source data model intended to standardize ontologies, naming conventions for nodes/entities in knowledge graphs, and the relationships between entities. Additionally, it maps comparable elements between ontologies, allowing disparate data sets to be compared and searched across. 

“It is inspiring to see experts from a wide variety of domains communicate and collaborate on a shared model,” said Sierra Moxon, data architect and software developer at Lawrence Berkeley National Laboratory (LBNL). “Biolink Model establishes a common language to communicate with, and that’s the first step to solving hard problems together.”

“One of the main needs of Translator was a common dialect for organizing, representing, and exchanging knowledge between knowledge providers, subject matter experts, and machines,” said Deepak Unni, a former Software Developer at LBNL. “Biolink Model addresses this need by providing a harmonized data model that tackles challenges with knowledge representation and provides a foundation upon which intelligent applications can be built.”  

About the NCATS Biomedical Data Translator Program

The NCATS Biomedical Data Translator Program was launched in October 2016, with funding from the National Center for Advancing Translational Sciences, a center within the National Institutes of Health. This work is supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under grant numbers: Other Transaction Awards OT2TR003434, OT2TR003436, OT2TR003428, OT2TR003448, OT2TR003427, OT2TR003430, OT2TR003433, OT2TR003450, OT2TR003437, OT2TR003443, OT2TR003441, OT2TR003449, OT2TR003445, OT2TR003422, OT2TR003435, OT3TR002026, OT3TR002020, OT3TR002025, OT3TR002019, OT3TR002027, OT2TR002517, OT2TR002514, OT2TR002515, OT2TR002584, OT2TR002520; and contract number 75N95021P00636. Additional funding was provided by the Intramural Research Program at NCATS (ZIA TR000276-05). For a complete list of Translator teams and collaborators, visit https://ncats.nih.gov/translator/projects. Any opinions expressed in this press release are those of the Translator community writ large and do not necessarily reflect the views of NCATS, individual Translator team members, or affiliated organizations and institutions.

What to expect at the iRODS 2022 User Group Meeting

The worldwide iRODS community will gather in Leuven, Belgium, from July 5 – 8 

Members of the iRODS user community will meet at KU Leuven in Belgium for the 14th Annual iRODS User Group Meeting to participate in four days of learning, sharing use cases, and discussing new capabilities that have been added to iRODS in the last year.

The event, sponsored by KU Leuven, RENCI, Vlaams Supercomputer Centrum, and Fujifilm, will provide in-person and virtual options for attendance. An audience of over 100 participants representing dozens of academic, government, and commercial institutions is expected to join.

“We are excited to meet in-person for the first time in three years to learn about the global impact of iRODS in fields such as life sciences, healthcare, cybernetics, and more,” said Terrell Russell, executive director of the iRODS Consortium. “In addition to hearing talks from our user community, the 2022 iRODS User Group Meeting will provide users the chance to network and collaborate throughout the week.”

In June, the iRODS Consortium and RENCI announced the release of iRODS 4.3.0. Along with supporting two additional operating systems, a notable new feature in the release is Delay Server Migration. The iRODS Delay Server can now be safely moved from one iRODS server to another without requiring a restart, which will provide administrators with flexibility when the system is under continuous load.

Another new feature is programmable authentication workflows. In the past, iRODS has supported various authentication methods such as native authentication, GSI, Kerberos, OpenID, with new authentication methods implemented as shared libraries that needed to be installed on the client and server side, often requiring patches for existing client libraries. The iRODS Consortium, in collaboration with SURF, has implemented an authentication plugin for iRODS 4.3.0 “pam_interactive” that enables the flexibility of fully-fledged PAM (pluggable authentication module) authentication flows.

During last year’s UGM, users learned about the Python iRODS 1.0.0 client and the S3 Resource plugin. Version 1.1.4 of the Python iRODS client is now available, and includes fixes for the XML protocol, connection reuse, the anonymous user, ticket enhancements, and compatibility with iRODS talking directly to S3. The iRODS S3 Resource Plugin has been extended to honor the Glacier semantics of an S3 storage system including reacting appropriately to responses that indicate the data requested will be available later. 

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature over 20 talks from users around the world. Among the use cases and deployments to be featured are:

  •  Data Management Environment at the National Cancer Institute. Frederick National Laboratory for Cancer Research. An efficient and cost-effective mechanism is required to store and manage the large heterogeneous datasets generated by high throughput technologies such as Next Generation Sequencing, Cryo-Electron Microscopy, and High Content Imaging. Tier 1 storage is expensive, and Tier 2 devices used standalone do not lend themselves well to discovering and disseminating datasets. The Data Management Environment (DME), a data management platform for storing, sharing, and managing high-value scientific datasets, was developed at the National Cancer Institute to close this gap. DME addresses the long-term data management needs of research labs and cores at NCI per the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles for data management. It supports S3 compatible object store, as well as file system-based storage. DME uses iRODS as the metadata management layer enabling virtualization of backend storage, replacement of storage providers with zero impact on users, and transparent migration of data across providers. The granular permissions scheme provided by iRODS coupled with DME’s authentication and authorization mechanism enables researchers to share data with collaborators securely. This talk will give an overview of the capabilities and architecture of the Data Management Environment and discuss how DME has leveraged iRODS to deliver enhanced data management and storage management capabilities.
  • iRODS speaks SFTP: More ways to securely transfer your data. CyVerse / University of Arizona. The need for compliance and data encryption during transfer is a strict requirement for many science domains that are working with confidential data. Realizing this unmet need for secure and encrypted transfers for CyVerse users, the CyVerse team decided to implement Secure File Transfer Protocol (SFTP) access to iRODS. This approach complements the existing secure data transfer and authentication method currently provided in iRODS via SSL and PAM authentication, which however are challenging to integrate into existing services or research workflows for multiple reasons: requiring changes on iRODS server, firewall configurations, and training users for complex client side installations of icommands. In this talk, the team introduces their work on adding iRODS as a backend storage option for SFTPGo utilizing the Go iRODS library developed at CyVerse.
  • From SRB to iRODS: 20 years of data management at the petabyte scale. CC-IN2P3. CC-IN2P3, a data center hosting services such as computing and data storage for international projects mainly in the fields of subatomic physics and astrophysics, has been using SRB and then iRODS in a wide variety of projects and use cases for the last 20 years. Data management has always been a key activity for a data center such as CC-IN2P3, due to the ever growing size of the projects, their international dimension. This talk will emphasize on the evolution of the data management needs, the pitfalls, the endless migration cycle (both hardware and software) over the years. It will also focus on the ongoing prospects, especially the long term data preservation needs and open science.
  • MrData: An iRODS Based Human Research Data Management System. Max Planck Institute for Biological Cybernetics. MrData is an iRODS based archival system for research medical imaging data, and was built initially to automate collection and archival of data flowing from a Siemens 9.4 Tesla MRI system. Of particular importance to this project was managing metadata related to human subject recruiting in a GDPR compliant manner. The team chose Castellum, a Max Planck developed system specifically for managing human subject data securely and we worked with that team to integrate it with the MrData system. An additional requirement for their team was “mixed use” metadata, information necessary for both subject recruiting and scientific processing. Mixed use metadata, such as handedness, is managed by Castellum but made available by MrData for scientific and archival purposes securely and without manual intervention. The Max Planck team will present an overview of this project, including current production status and future directions. 

Bookending this year’s UGM are two in-person events for those who hope to learn more about iRODS. On July 5, the Consortium is offering beginner and advanced training sessions. After the conference, on July 8, users have the chance to register for a troubleshooting session, devoted to providing one-on-one help with an existing or planned iRODS installation or integration.

Registration will remain open until the beginning of the event. Learn more at this year’s UGM at irods.org/ugm2022

About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.
The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

RENCI Leadership Assigned to Lead Implementation of the UNC School of Data Science & Society

RENCI Director, Stan Ahalt, and Chief Operating Officer, Jay Aikat, will take on secondary appointments as Inaugural Dean and Senior Associate Dean, respectively

RENCI leadership, in coordination with UNC-Chapel Hill leadership, has recently announced that Director, Stan Ahalt, and Chief Operating Officer, Jay Aikat, will be taking on leadership roles as secondary appointments for the launch of the School of Data Science and Society (SDSS) planned for fall 2022. Ahalt will serve as the School’s Inaugural Dean, and Aikat will serve as the Senior Associate Dean. 

Ahalt and Aikat have both been instrumental in spearheading data science efforts on campus for many years, each serving as members or leads on various committees that have led to the creation of the SDSS. Additionally, many others at RENCI have supported the path to the School through work such as: developing curriculum for and teaching the new Introduction to Data Science course; serving on the committee developing the Data Science minor; organizing and supporting the seven subcommittees for data science in 2019-2020 that led to the initial feasibility plan for the SDSS; and more. 

The announcement was made during a RENCI ‘All Hands’ meeting, where Chancellor Kevin Guskiewicz, Provost Chris Clemens, and Senior Associate Vice Chancellor for Research Andy Johns joined to say a few words. 

After the initial announcement of Ahalt and Aikat’s roles in the School, Guskiewicz emphasized why Ahalt is the right person for the job. “Stan is a global leader who we believe is well poised to lead this new school into the future,” said Guskiewicz. “[His] passion for using a team approach in applying data science to society’s most pressing challenges is exactly what we need for the new school.” 

Clemens added to Guskiewicz’s comments stating, “Through his research, teaching, and leadership of RENCI, Stan has a proven track record of bringing together diverse groups of people and collaborating across disciplines for the greater good.” Clemens went on to explain that the School will support the development of multi-disciplinary and flexible research clusters to utilize the variety of expertise and research at Carolina to address timely problems, making Stan’s unique experience crucial for this leadership role. 

“Our established prominence in the natural sciences, humanities, and social sciences uniquely  positions us to build the SDSS as a vessel and venue for interdisciplinary collaboration,” said Clemens, adding that the School will utilize innovative techniques and team science approaches to develop solutions that improve communities.      

In addition to sharing a vision for the School, Guskiewicz and Ahalt both emphasized the plan to establish a mutually beneficial relationship between RENCI and SDSS to seek out opportunities for collaboration and shared innovation. Ahalt noted that, since the beginning, RENCI’s work has focused on solving the most challenging problems affecting our society. 

“RENCI has demonstrated significant and consistent success in identifying pressing societal problems and applying a unique array of skills and expertise to develop and implement solutions,” said Ahalt.

Clemens stated that the SDSS will build upon this work and use RENCI as the premier model for a team science approach that produces real changes for people in our communities. 

Ahalt and Johns shared details for the changes that will happen at RENCI during this period. Ashok Krishnamurthy, currently RENCI’s Deputy Director, will serve as Interim Director. Asia Mieczkowska, currently Deputy Chief Operations Officer, will assume the role of Interim Chief Operations Officer. Ahalt also noted the possibility for RENCI researchers to take on new and interesting roles during this period and as the School develops, with unique opportunities for growth and creativity. 

InfiniteTactics joins iRODS Consortium

Organizations team up to keep pace with expanding data demands

CHAPEL HILL, NC – The iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS), welcomes its newest member, InfiniteTactics.

InfiniteTactics is a veteran-owned IT consulting firm that produces high-end technical solutions in large-scale data sciences support, autonomous system engineering and technical guidance, and custom software solutions. The company supports a diverse range of clients from small business start-ups to the Department of Defense.

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization.

The capability to efficiently manage, use, parse, store, and access data is crucial to InfiniteTactics’ business model, but this becomes more challenging as customers require access to increasingly large stores of data on a daily basis. InfiniteTactics AI Software Engineer Kyle Healy said iRODS offers an opportunity for the company to overcome key file system limitations and maintain excellent performance at competitive prices to support clients’ expanding data demands.

“Data management is an extremely important part of our business and the work we do because data is the integral backbone of data sciences support we provide,” said Healy. “We hope iRODS will help us implement a more efficient underlying file system we can utilize to bring a better product to our customers and provide a more efficient file system for the large-scale data sciences open-source community.”

The iRODS Consortium provides a production-ready distribution and professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“We are very excited to bring InfiniteTactics into the fold,” said Terrell Russell, executive director of the iRODS Consortium. “They are integrating a number of interesting new technologies and their belief in the open-source philosophy made for a quick partnership.”

“The RENCI team is well known for their large scale storage solutions,” said Healy. “Aligning ourselves with a strong partner in the storage solution industry was a great strategic move for us. Secondly, being able to provide our expertise back to the Consortium and improve the overall footprint of the large-scale data sciences open-source community through the Consortium made the decision easy.”

In addition to InfiniteTactics, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, CUBI at Berlin Institute of Health, DataDirect Networks, Emagine IT, KU Leuven, Maastricht University, Minnesota Supercomputing Institute at the University of Minnesota, the National Institute of Environmental Health Sciences, NetApp, Omnibond, OpenIO, RENCI, SoftIron, the SURF cooperative, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about InfiniteTactics, please visit https://infinitetactics.com.

Data Matters short-course series is back for August 2022

Annual data science series returns via Zoom 

Now in its ninth year, Data Matters 2022, a week-long series of one and two-day courses aimed at students and professionals in business, research, and government, will take place August 8 – 12 virtually via Zoom. The short course series is sponsored by the Odum Institute for Research in Social Science at UNC-Chapel Hill, the National Consortium for Data Science, and RENCI.

In recent years, employers’ expectations for a data literate workforce have grown significantly.  According to a 2021 Harvard Business Review report, while 90% of business leaders cite data literacy as key to company success, only 25% of workers feel confident in their data skills. Data Matters helps bridge this gap by providing attendees the chance to learn about a wide range of topics in data science, analytics, visualization, curation, and more from expert instructors.

“With the increase of data science tools being used in sectors such as business, research and government, it is essential that workers seek out educational opportunities that empower them to address new challenges in their field,” said Shannon McKeen, executive director of the National Consortium for Data Science. “Our short-course series has twelve courses that can be tailored to achieve individual data science goals, whether registrants are looking to refresh their knowledge or trying to learn something new in a welcoming, understanding environment.”

Data Matters instructors are experts in their fields from NC State University, UNC-Chapel Hill, Duke University, Cisco, and RENCI. Topics to be covered this year include information visualization, deep learning in Python, exploratory data analysis, statistical machine learning and programming in R, and more. Among the classes available are:

  • Introduction to Programming in R, Jonathan Duggins. Statistical programming is an integral part of many data-intensive careers and data literacy, and programming skills have become a necessary component of employment in many industries. This course begins with necessary concepts for new programmers—both general and statistical—and explores some necessary programming topics for any job that utilizes data. 
  • Overview of AI and Deep Learning, Ashok Krishnamurthy. Many key advances in AI are due to advances in machine learning, especially deep learning. Natural language processing, computer vision, speech translation, biomedical imaging, and robotics are some of the areas that have benefited from deep learning methods. This course is designed to provide an overview of AI, and in particular, deep learning. Topics include the history of neural networks, how advances in data collection and computing have caused a revival in neural networks, different types of deep learning networks and their applications, and tools and software available to design and deploy deep networks.
  • Introduction to Statistical Machine Learning in R, Yufeng Liu. Statistical machine learning and data mining is an interdisciplinary research area which is closely related to statistics, computer sciences, engineering, and bioinformatics. Many statistical machine learning and data mining techniques and algorithms are useful in various scientific areas. This two-day short course will provide an overview of statistical machine learning and data mining techniques with applications to the analysis of real data.
  • Geospatial Analytics Using Python, Laura Tateosian. This course will focus on how to explore, analyze, and visualize geospatial data. Using Python and ArcGIS Pro, students will inspect and manipulate geospatial data, use powerful GIS tools to analyze spatial relationships, link tabular data with spatial data, and map data. In these activities, participants will use Python and the arcpy library to invoke key GIS tools for spatial analysis and mapping.

Data Matters offers reduced pricing for faculty, students, and staff from academic institutions and for professionals with nonprofit organizations. Head to the Data Matters website to register and to see detailed course descriptions, course schedules, instructor bios, and logistical information. 

Registration is now open at datamatters.org. The deadline for registration is August 3 for Monday/Tuesday courses, August 4 for Wednesday courses, and August 7 for Thursday/Friday courses.


About the National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) is a collaboration of leaders in academia, industry, and government formed to address the data challenges and opportunities of the 21st century. The NCDS helps members take advantage of data in ways that result in new jobs and transformative discoveries. The organization connects diverse communities of data science experts to support a 21st century data-driven economy by building data science career pathways and creating a data-literate workforce, bridging the gap between data scientists in the public and private sectors, and supporting open and democratized data. Learn more at datascienceconsortium.org/.

The NCDS is administered by founding member RENCI, a research institute for data science and applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

RENCI and RTI International expand strategic partnership

CHAPEL HILL, N.C. — RTI International (RTI), a nonprofit research institute, has recently established a strategic partnership with UNC-Chapel Hill’s Renaissance Computing Institute (RENCI) to build upon the success of existing collaborations and jointly seek out new research opportunities in areas such as data modernization, data science, and team science solutions. RTI and RENCI have closely collaborated on multiple large-scale team science projects over the years, including NCATS Data Translator, NHLBI BioData Catalyst, and the NIH HEAL Initiative® Data Stewardship Group.

RTI and RENCI have both committed themselves to collaboratively secure funding in the aforementioned areas by:

  • Creating a joint identity for pursuit of opportunities;
  • Identifying and pursuing additional opportunities to expand on existing work;
  • Cooperating in the exchange of information and networking relevant to potential collaborations; and
  • Collaborating on business processes to streamline and simplify joint business development and project delivery.

“This partnership will allow RTI and RENCI to take full advantage of our well-established collaborative relationship, with a focus on strategically aligning ourselves and our expertise to create a unified identity to pursue future funding,” said Becky Boyles, Founding Director of the Center for Data Modernization Solutions at RTI. “It has become increasingly clear how well our organizations work together and complement each other, and we are looking forward to seeing further success with this partnership.”

Stan Ahalt, Director of RENCI, added, “RTI and RENCI have a synergistic relationship that has only strengthened over the years, and this feels like the right time to use this momentum to intentionally coordinate our efforts and make the biggest impact possible in the field of data science. We have shown time and again that our team science approach produces real results, and we know that our combined impact is greater than what we could achieve individually.”

The partnership will serve to enhance and streamline collaborations between the two organizations by creating standard processes, procedures, and marketing materials to emphasize their collective strengths. Karen Davis, Vice President of RTI’s Research Computing Division (RCD), noted, “RTI and RENCI have a long history of collaboration, and this MOU serves as a formal agreement between the organizations to continue expanding upon this groundwork while also signaling to other organizations the high value we place on team science and encouraging them to do the same.”

Ashok Krishnamurthy, Deputy Director of RENCI, further emphasized RENCI and RTI’s combined potential to make a big impact in stating, “This is a very exciting partnership, and we look forward to innovating together by applying data science to solving biological, environmental, and biomedical problems.”

RTI and RENCI are excited to establish this partnership to combine their individual strengths and resources and expand their collective scientific impact. As evidenced by the success of existing collaborations, this partnership will further facilitate the advancement of team science and scientific discovery in NC and beyond.

About RTI

RTI International is an independent, nonprofit research institute dedicated to improving the human condition. RTI’s vision is to address the world’s most critical problems with science-based solutions in pursuit of a better future.

About RENCI

The Renaissance Computing Institute (RENCI) is a research institute at UNC-Chapel Hill launched in 2004 that serves as a living laboratory fostering data science expertise, advancing software development tools and techniques, developing effective cross-disciplinary and cross-sector engagement strategies, and establishing sustainable business models for software and services.