iRODS and Fujifilm partner to provide an archive solution

FUJIFILM Recording Media U.S.A., Inc. and the iRODS Consortium today announce a collaboration and integration, creating a joint solution built upon FUJIFILM Object Archive software and the iRODS data management platform. This joint solution leverages the benefits of a tape storage tier for infrequently accessed “cold” data, providing an automated archiving workflow for research, commercial, and governmental organizations that require storing large – and in most cases, rapidly growing – amounts of data.

With this solution, FUJIFILM Object Archive becomes a deep-tier archive storage target while iRODS provides a data management platform for users who produce massive amounts of research and analytics data.

FUJIFILM Object Archive software has been tested with the iRODS S3 plugin and fully supports the AMAZON S3 abstraction that iRODS provides. In addition to regular AMAZON S3 compatibility, Fujifilm and the iRODS Consortium worked together to add functionality comparable to AMAZON GLACIER to the iRODS S3 Resource Plugin.

This new functionality will be available as part of the upcoming iRODS 4.2.11 release.

Moving appropriate data to tape provides the benefits of air-gap security and scalability with lower data center operating costs and less electricity consumption when compared to other storage solutions. Additionally, FUJIFILM Object Archive software supports the new, higher-capacity LTO-9 tape technology, making the solution potentially even more efficient, economical, and scalable.

“We are very excited to be working with Fujifilm on the AMAZON GLACIER features,” said Terrell Russell, interim executive director of the iRODS Consortium. “Together, we are building a long-term relationship that will be good for our users, and for both organizations.”

“The new interoperability between Fujifilm’s Object Archive software and the iRODS data management platform will greatly benefit organizations who use both products, and potentially create new use cases as well,” said Tom Nakatani, vice president of sales & marketing at FUJIFILM Recording Media U.S.A., Inc. “We are pleased to successfully implement this joint solution for the benefit of our collaborators and users.”

Fujifilm is the world’s leading data tape manufacturer (based on market share). Its FUJIFILM Object Archive software allows objects to be seamlessly written to and read from data tape media with Fujifilm’s OTFormat. Using the industry-standard AMAZON S3-compatible API, Object Archive software offers the same operability as cloud storage and easy long-term retention of data similar to AMAZON GLACIER. By using FUJIFILM Object Archive software to optimize existing storage, organizations can eliminate egress fees, offload cold data to tape, maintain chain of custody, realize low ongoing storage costs, and help protect against cyber threats by providing a physical air-gap to data.

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About Fujifilm

FUJIFILM Recording Media U.S.A., Inc. is FUJIFILM Corporation’s U.S.-based manufacturing, marketing and sales operation for data tape media and data management solutions. The company provides data center customers and enterprise industry partners with a wide range of innovative recording media products and archival solutions. Based on a history of thin-film engineering and magnetic particle science such as Fujifilm’s NANOCUBIC™ and Barium Ferrite technology, Fujifilm creates breakthrough data storage products. Worldwide, Fujifilm and its affiliates have surpassed the 170 million milestone for the number of LTO ULTRIUM data cartridges manufactured and sold since introduction, establishing the company as the leading global manufacturer of mid-range and enterprise data tape.

For more information on FUJIFILM Recording Media products, call 800-488-3854 or go to https://www.fujifilm.com/us/en/business/data-storage. For more information about FUJIFILM Object Archive software, visit http://fujifilmobjectarchive.com.

FUJIFILM Holdings Corporation, Tokyo, Japan, brings cutting edge solutions to a broad range of global industries by leveraging its depth of knowledge and fundamental technologies developed in its relentless pursuit of innovation. Its proprietary core technologies contribute to the various fields including healthcare, highly functional materials, document solutions and imaging products. These products and services are based on its extensive portfolio of chemical, mechanical, optical, electronic and imaging technologies. For the year ended March 31, 2021, the company had global revenues of $21 billion, at an exchange rate of 106 yen to the dollar. The Fujifilm global family of companies is committed to responsible environmental stewardship and good corporate citizenship. For more information, please visit: www.fujifilmholdings.com

FUJIFILM, OBJECT ARCHIVE, and NANOCUBIC are the trademarks and registered trademarks of FUJIFILM Corporation and its affiliates.

AMAZON, AMAZON GLACIER and AMAZON S3 are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.

LTO and ULTRIUM are registered trademarks of Hewlett Packard Enterprise, IBM and Quantum in the United States and/or other countries.

© 2021 FUJIFILM Recording Media U.S.A. Inc. All Rights Reserved

RENCI named as partner in NSF institute to establish new field of imageomics

Imageomics Institute will advance computational methods for studying Earth’s biodiversity

RENCI has been named as a partner on an ambitious new effort to use images of living organisms as the basis for understanding biological processes of life on Earth. The project, to be led by faculty from The Ohio State University’s Translational Data Analytics Institute, has been awarded a $15 million grant from the National Science Foundation as part of NSF’s Harnessing the Data Revolution initiative.

The new entity, which will be called the Imageomics Institute, aims to establish imageomics as a new field of study that has the potential to transform biomedical, agricultural and basic biological sciences. Similar to genomics before it, which applied computation to the study of the human genome, imageomics will leverage computer science to help scientists extract meaning from an otherwise unwieldy amount of natural image data.

“There are many more species out there than scientists have been able to study in-depth,” said Jim Balhoff, a Senior Research Scientist at RENCI who will lead the RENCI component of the project. “If we can leverage machine learning to interpret images of living organisms, that would provide a scalable way to process large amounts of information about species, complementing the work of trained wildlife biologists.”

The Institute’s scientists will apply machine learning techniques to large collections of digital images from museums, labs and other institutions, as well as photos taken by scientists in the field, camera traps, drones and even members of the public who have uploaded their images to platforms such as eBird, iNaturalist and Wildbook. By training algorithms to extract biologically meaningful information from these images, researchers aim to generate new knowledge about organisms and species, including insights about how they evolve and interact within ecosystems.

Critical to this effort is the ability to categorize features of living organisms with standardized, vocabularies, known as a bio-ontologies, that can be “understood” by computers. Having served as a key contributor on the Phenoscape team for several previous NSF-funded projects, Balhoff is steeped in the art of encoding biological information in computable ways.

“There’s a lot of work going on with machine learning, and one of the key pieces of this project is to develop ways to incorporate ontology-based knowledge into machine learning processes,” said Balhoff. “We’re providing expertise in bio-ontologies to incorporate what we know about anatomical relationships into this image analysis system.”

This approach could ultimately enable a computer to identify key features in an image, such as an eye, mouth or dorsal fin, and then use automated reasoning to check that the interpretation makes anatomical sense. Repeating this process for large collections of images can give scientists a powerful platform for investigating new or previously understudied species or help them better understand the relationships between organisms.

As an inaugural institute for data-intensive discovery in science and engineering within NSF’s Harnessing the Data Revolution initiative, the Imageomics Institute will be part of a broader effort to form a national collaborative research network dedicated to computation-enabled discovery.

In addition to The Ohio State University and RENCI, the project will involve biologists and computer scientists from Tulane University, Virginia Tech, Duke University, and Rensselaer Polytechnic Institute; senior personnel from Ohio State, Virginia Tech and six additional institutions; and collaborators from more than 30 universities and organizations around the world.

RENCI to join researchers in a collaboration to increase reliability and efficiency of DOE scientific workflows by leveraging artificial intelligence and machine learning methods

Poseidon will use AI/ML-based techniques to simulate, model, and optimize scientific workflow performance on large, distributed DOE computing infrastructures.

The Department of Energy (DOE) advanced Computational and Data Infrastructures (CDIs) – such as supercomputers, edge systems at experimental facilities, massive data storage, and high-speed networks – are brought to bear to solve the nation’s most pressing scientific problems, including assisting in astrophysics research, delivering new materials, designing new drugs, creating more efficient engines and turbines, and making more accurate and timely weather forecasts and climate change predictions. 

Increasingly, computational science campaigns are leveraging distributed, heterogeneous scientific infrastructures that span multiple locations connected by high-performance networks, resulting in scientific data being pulled from instruments to computing, storage, and visualization facilities.

This image shows the terrain height – an important factor in weather modeling – across almost all of North America with spatial resolution of 4km. Poseidon tools will help improve workflows and lead to even more efficient weather forecasts through reliable and efficient execution of weather models.

Credit: Jiali Wang, Argonne National Laboratory

However, since these federated services infrastructures tend to be complex and managed by different organizations, domains, and communities, both the operators of the infrastructures and the scientists that use them have limited global visibility, which results in an incomplete understanding of the behavior of the entire set of resources that science workflows span. 

“Although scientific workflow systems like Pegasus increase scientists’ productivity to a great extent by managing and orchestrating computational campaigns, the intricate nature of the CDIs, including resource heterogeneity and the deployment of complex system software stacks, pose several challenges in predicting the behavior of the science workflows and in steering them past system and application anomalies,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator (PI). “Our new project, Poseidon, will provide an integrated platform consisting of algorithms, methods, tools, and services that will help DOE facility operators and scientists to address these challenges and improve the overall end-to-end science workflow.”

Under a new DOE grant, Poseidon aims to advance the knowledge of how simulation and machine learning (ML) methodologies can be harnessed and amplified to improve the DOE’s computational and data science.

Research institutions collaborating on Poseidon include the University of Southern California, the Argonne National Laboratory, the Lawrence Berkeley National Laboratory, and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill.

Poseidon will add three important capabilities to current scientific workflow systems — (1) predicting the performance of complex workflows; (2) detecting and classifying infrastructure and workflow anomalies and “explaining” the sources of these anomalies; and (3) suggesting performance optimizations. To accomplish these tasks, Poseidon will explore the use of novel simulation, ML, and hybrid methods to predict, understand, and optimize the behavior of complex DOE science workflows on DOE CDIs. 

Poseidon will explore hybrid solutions where data collected from DOE and NSF testbeds, as well as from an ML simulator, will be strategically inputted into an ML training system.

High Performance computing systems, such as planned Aurora at the Argonne Leadership Computing Facility, are integral pieces of DOE CDIs. 

Credit: Argonne National Laboratory

“In addition to creating a more efficient timeline for researchers, we would like to provide CDI operators with the tools to detect, pinpoint, and efficiently address anomalies as they occur in the complex DOE facilities landscape,” said Anirban Mandal, Poseidon co-PI, assistant director for network research and infrastructure at RENCI, University of North Carolina at Chapel Hill. “To detect anomalies, Poseidon will explore real-time ML models that sense and classify anomalies by leveraging underlying spatial and temporal correlations and expert knowledge, combine heterogeneous information sources, and generate real-time predictions.”

RENCI will play a pivotal role in the Poseidon project. RENCI researchers Cong Wang and Komal Thareja will lead project efforts in data acquisition from the DOE CDI and NSF testbeds (FABRIC and Chameleon Cloud) and emulation of distributed facility models, enabling ML model training and validation on the testbeds and DOE CDI. Additionally, Poseidon co-PI Anirban Mandal will lead the project portion on performance guidance for optimizing workflows.

Successful Poseidon solutions will be incorporated into a prototype system with a dashboard that will be used for evaluation by DOE scientists and CDI operators. Poseidon will enable scientists working on the frontier of DOE science to efficiently and reliably run complex workflows on a broad spectrum of DOE resources and accelerate time to discovery.

Furthermore, Poseidon will develop ML methods that can self-learn corrective behaviors and optimize workflow performance, with a focus on explainability in its optimization methods. 

Working together, the researchers behind Poseidon will break down the barriers between complex CDIs, accelerate the scientific discovery timeline, and transform the way that computational and data science are done.

Please visit the project website for more information.

Data Matters short-course series returns in August 2021

Annual short-course series aims to bridge the data literacy gap

Now in its eighth year, Data Matters 2021, a week-long series of one and two-day courses aimed at students and professionals in business, research, and government, will take place August 9 – 13 virtually via Zoom. The short course series is sponsored by the Odum Institute for Research in Social Science at UNC-Chapel Hill, the National Consortium for Data Science, and RENCI.

Although the need for data literacy has grown exponentially for employers over the last few years, many academic institutions are struggling to keep up. According to a 2021 report from Forrester, 81% of recruiters rated data skills and data literacy as important capabilities for candidates, while only 48% of academic planners reported that their institution currently has specific data skills initiatives set up. Data Matters helps bridge this gap by providing attendees the chance to learn about a wide range of topics in data science, analytics, visualization, curation, and more from expert instructors.

“As our society becomes more data-driven, we’ve seen a greater need for workers in environments such as industry, health, and law to have a basic understanding of data science techniques and applications,” said Shannon McKeen, executive director of the National Consortium for Data Science. “The Data Matters short-course series allows us to meet the high demand for data science education and to provide pathways for both recent graduates and current professionals to bridge the data literacy gap and enrich their knowledge.”

Data Matters instructors are experts in their fields from NC State University, UNC-Chapel Hill, Duke University, Cisco, Blue Cross NC, and RENCI. Topics to be covered this year include information visualization, data curation, data mining and machine learning, programming in R, systems dynamics and agent-based modeling, and more. Among the classes available are:

  • Introduction to Programming in R, Jonathan Duggins. Statistical programming is an integral part of many data-intensive careers and data literacy, and programming skills have become a necessary component of employment in many industries. This course begins with necessary concepts for new programmers—both general and statistical—and explores some necessary programming topics for any job that utilizes data. 
  • Text Analysis Using R, Alison Blaine. This course explains how to clean and analyze textual data using R, including both raw and structured texts. It will cover multiple hands-on approaches to getting data into R and applying analytical methods to it, with a focus on techniques from the fields of text mining and Natural Language Processing.
  • Using Linked Data, Jim Balhoff. Linked data technologies provide the means to create flexible, dynamic knowledge graphs using open standards. This course offers an introduction to linked data and the semantic web tools underlying its use. 
  • R for Automating Workflow & Sharing Work, Justin Post. The course provides participants an introduction to utilizing R for writing reproducible reports and presentations that easily embed R output, using online repositories and version control software for collaboration, creation of basic websites using R, and the development of interactive dashboards and web applets. 

Data Matters offers reduced pricing for faculty, students, and staff from academic institutions and for professionals with nonprofit organizations. Head to the Data Matters website to register and to see detailed course descriptions, course schedules, instructor bios, and logistical information. 

Registration is now open at datamatters.org. The deadline for registration is August 5 for Monday/Tuesday courses, August 7 for Wednesday courses, and August 8 for Thursday/Friday courses.


About the National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) is a collaboration of leaders in academia, industry, and government formed to address the data challenges and opportunities of the 21st century. The NCDS helps members take advantage of data in ways that result in new jobs and transformative discoveries. The organization connects diverse communities of data science experts to support a 21st century data-driven economy by building data science career pathways and creating a data-literate workforce, bridging the gap between data scientists in the public and private sectors, and supporting open and democratized data.The NCDS is administered by founding member RENCI. Learn more at datascienceconsortium.org/.

Tagged |

RENCI joins researchers across the US in supporting NSF Major Facilities with data lifecycle management efforts through new NSF-funded Center of Excellence

When it comes to research, having a strong cyberinfrastructure that supports advanced data acquisition, storage, management, integration, mining, visualization, and computational processing services can be vital. However, building cyberinfrastructures (CI) — especially ones that aim to support multiple varied and complex scientific facilities — is a challenge.

In 2018, a team of researchers from institutions across the country came together to launch a pilot program aimed at creating a model for a Cyberinfrastructure Center of Excellence (CI CoE) for the National Science Foundation’s (NSF) Major Facilities. The goal was to identify how the center could serve as a forum for the exchange of CI knowledge across varying fields and facilities, establish best practices for different NSF Major Facilities’ CI, provide CI expertise, and address CI workforce development and sustainability.

“Over the past few years, my colleagues and I have worked to provide expertise and support for the NSF Major Facilities in a way that accelerates the data lifecycle and ensures the integrity and effectiveness of the cyberinfrastructure,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator. “We are proud to contribute to the overall NSF CI ecosystem and to work with the NSF Major Facilities on solving their CI challenges together, understanding that our work may help support the sustainability and progress of the Major Facilities’ ongoing research and discovery.”

Five NSF Major Facilities were selected for the pilot: the Arecibo Observatory, the Geodetic Facility for the Advancement of Geoscience, the National Center for Atmospheric Research, the National Ecological Observatory Network, and the Seismological Facilities for the Advancement of Geoscience and EarthScope. As the pilot progressed, the program expanded to engage additional NSF Major Facilities.

The pilot found that Major Facilities differ in types of data captured, scientific instruments used, data processing and analyses conducted, and policies and methods for data sharing and use. However, the study also found that there are commonalities between the various Major Facilities in terms of the data lifecycle (DLC). As a result, the pilot developed a DLC model that captured the stages that data within a Major Facility goes through. The model includes stages for 1) data capture; 2) initial processing near the instrument(s); 3) central processing at data centers or clouds; 4) data storage, curation, and archiving; and 5) data access, dissemination, and visualization. Finding these commonalities helped the pilot program develop common challenges and standardized practices for establishing overarching CI requirements and to develop a blueprint for a CI CoE that can address the pressing Major Facilities DLC challenges.

Now, with a new NSF award, the pilot program has begun phase two and become CI CoE: CI Compass, An NSF Center of Excellence dedicated to navigating the Major Facilities’ data lifecycle. CI Compass will apply its three years of initial evaluation and analyses for an improved CI, as needed for the NSF’s Major Facilities.

The research institutions collaborating on CI Compass include the University of Southern California, the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, the University of Notre Dame, Indiana University, Texas Tech University, and the University of Utah.

RENCI will play a pivotal role in the success of CI Compass by leading working groups that offer expertise and services to NSF Major Facilities for processing, data movement, data storage, curation, and archiving elements of the Major Facilities DLC.   

“Cyberinfrastructure is a critical element for fulfilling the science missions for the NSF Major Facilities and a primary goal of CI Compass is to partner with Major Facilities to enhance and evolve their CI,” said Anirban Mandal, assistant director for network research and infrastructure at the Renaissance Computing Institute at University of North Carolina at Chapel Hill, and co-principal investigator and associate director of the project. “In the process, CI Compass will not only act as a ‘knowledge sharing’ hub for brokering connections between CI professionals at Major Facilities, but also will disseminate the knowledge to the broader NSF CI community.”

RENCI team members, in particular Ilya Baldin, who is also PI for the NSF FABRIC project, will offer expertise in networking and cloud computing for innovative Major Facilities CI architecture designs. Under Mandal’s leadership as associate director of CI Compass, RENCI will also be responsible for continuous internal evaluation of the project and measuring the impact of CI Compass on the Major Facilities and the broader CI ecosystem. Erik Scott will take a lead role in CI Compass working groups for data storage, curation, archiving and identity management, while Laura Christopherson will lead the efforts in project evaluation.


Working together, the CI Compass team will enhance the overall NSF CI ecosystem by providing expertise where needed to enhance and evolve the Major Facilities CI, capturing and disseminating CI knowledge and best practices that power scientific breakthroughs for Major Facilities, and brokering connections to enable knowledge sharing between and across Major Facilities CI professionals and the broader CI community. 

Visit ci-compass.org to learn more about the project.


This project is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. The pilot effort was funded by CISE/OAC and the Division of Emerging Frontiers in the Directorate for Biological Sciences under grant #1842042.

iRODS Consortium announces leadership transitions

The Renaissance Computing Institute (RENCI) – the founding member that administers the iRODS Consortium – announced today that Jason Coposky has officially resigned from his post as iRODS Consortium Executive Director effective June 11, 2021.

Coposky has been at RENCI for fifteen years and has served as the Executive Director of the Consortium for the last five and a half years and as Chief Technologist for five years before that. In these leadership roles, Coposky managed the software development team, directed the full software development lifecycle, and coordinated code hardening, testing, and application of formal software engineering practices. He also built and nurtured relationships with existing and potential consortium members and served as the chief spokesperson on iRODS development and strategies to the worldwide iRODS community. The Consortium has more than tripled in size under his leadership.  

In addition to growing the community, Coposky has been instrumental in turning the open source iRODS platform into enterprise software that is now deployed as a data management and data sharing solution at businesses, research centers, and government agencies in the U.S., Europe, and Asia. 

Terrell Russell, who has also been working on iRODS software since the development team transitioned to RENCI in 2008 and has held the role of Chief Technologist for the past five and a half years, has been named Interim Executive Director. 

For more information on iRODS and the iRODS Consortium, please visit irods.org.

Minnesota Supercomputing Institute joins iRODS Consortium

University of Minnesota institute aims to use iRODS to support HIPAA compliance

CHAPEL HILL, NC – The Minnesota Supercomputing Institute (MSI) of the University of Minnesota has become the newest member of the iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS).

MSI provides state-of-the art compute and storage solutions to accelerate scientific inquiry at the University of Minnesota and beyond. Its high-performance computing resources, specialized hardware, visualization tools, and dedicated consultants support data intensive research in any area of science, engineering, and the humanities.

“Our users have large data sets so we are always interested in new ways to help users manage their data,” said Edward Munsell, system administrator at MSI. “After we started testing iRODS, we quickly saw how it could help us with our compliance concerns around our new HIPAA cluster. We will be using iRODS to help log user activity on files, ensure files are only accessed according to our policies, and to automate processing of data off of instruments.”

Looking ahead, the team plans to expand its use of iRODS to help users manage and share their data in MSI’s second tiered storage system. Munsell noted that the open source nature of iRODS makes it possible to customize the system for MSI’s specific needs by building in additional logging and permissions handling. In addition, the team plans to take advantage of iRODS’ capability to automatically extract and tag objects with metadata, helping MSI move closer to completely automating some user workflows.

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“We are excited to welcome the University of Minnesota to the iRODS community,” said Jason Coposky, executive director of the iRODS Consortium. “The variety and complexity of their use cases requires a solution with significant flexibility, and we believe iRODS will be able to meet their current and future needs. UMN will be a valuable member of the community, and we look forward to a long relationship.”

“We decided to become a community member because we see the value that iRODS has and we want to help support the software,” said Munsell. “I have worked with many from the team while we were evaluating iRODS, and have been impressed with their responsiveness, knowledge, and willingness to help.”

In addition to UMN, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, Cloudian, DataDirect Networks, KU Leuven, Maastricht University, the National Institute of Environmental Health Sciences, NetApp, OpenIO, RENCI, SoftIron, the SURF cooperative, SUSE, the Swedish National Infrastructure for Computing, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about the Minnesota Supercomputing Institute, please visit https://msi.umn.edu.

What to expect at the 2021 iRODS User Group Meeting

iRODS users and consortium members will gather virtually from June 8-11 

The worldwide iRODS user community will connect online from June 8 – 11 for the 13th Annual iRODS User Group Meeting – four days of learning, sharing of use cases, and discussions of new capabilities that have been added to iRODS in the last year.

The virtual event, sponsored by Wellcome Sanger Institute, Globus, GRAU DATA, SoftIron, and RENCI, will be a collection of live talks and panels with Q&A. An audience of over 200 participants representing dozens of academic, government, and commercial institutions is expected to join.

“The 2021 iRODS User Group Meeting features an impressive list of presentations from our user community, including talks in fields such as healthcare, agriculture, and education,” says Jason Coposky, Executive Director at iRODS. “The structure of our virtual event ensures that attendees get plenty of opportunities to network and collaborate throughout the week, while learning how users have utilized iRODS across the globe.”

Meeting attendees will learn about new updates such as the Python iRODS client, C++ REST API, and the Zone Management Tool, according to Coposky. On June 11, the last day of the meeting, the Consortium team will run an iRODS Troubleshooting session, where participants can receive one-on-one help with an existing or planned iRODS installation or integration.

The iRODS Consortium and RENCI are gearing up to release iRODS 4.2.9. A notable addition within the release is the introduction of logical locking by providing additional replica status values within the catalog. Previously, replicas in iRODS could only be marked ‘good’ or ‘stale,’ which did not capture the states of when data was in flight, or incomplete. The new intermediate and locked states for iRODS replicas will be used to provide protection from uncoordinated writes into the system.

The iRODS NFSRODS 2.0.0 client was released last month. The NFSRODS client presents the iRODS virtual file system as NFSv4.1, which allows iRODS to be surfaced into any existing infrastructure with just a mount command, while still preserving the server-side policies enforced by the iRODS API. The new version provides significant performance improvements and caching, and resolves a few small bugs. 

During last year’s UGM, users learned about policy composition, which allowed them to configure multiple policies together without the need to touch the code. This feature is now ready and will be released with iRODS 4.2.9, including a growing library of implemented policies which will include future support for indexing and publishing.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature 30 talks from users around the world. Among the use cases and deployments to be featured are:

  • The Research Data Management System at the University of Groningen: architecture, solution engines, and challenges. University of Groningen. The University of Groningen has developed the RUG Research Data Management System, powered by iRODS, to store, share data, and allow collaborative research projects. The system is developed based on open source solutions, providing access to the stored data by means of command line, webdav, and a web-based graphical user interface. The system provides a number of key functionalities and technical solutions such as metadata templates management, data policies, data provenance, and audit and uses existing iRODS functionalities and tools, such as the iRODS audit plugin and the iRODS Python rule-engine.
  • Frictionless Data for iRODS. Earlham Institute. The international wheat community has embraced the omics era and is producing larger and more heterogeneous datasets at a rapid pace in order to produce better varieties via breeding programs. These programs, especially in the pre-breeding space, have encouraged wheat communities to make these datasets available more openly. However, the consistent and standardized collection and dissemination of data based on rich metadata remains difficult as so much of this information is stored in papers and supplementary information. In response to this concern, the Earlham Institute has built Grassroots, an infrastructure including portals to deliver large scale datasets with semantically marked-up metadata to power FAIR data in crop research.
  • Go-iRODSClient, iRODS FUSE Lite, and iRODS CSI Driver: Accessing iRODS in Kubernetes. CyVerse / University of Arizona. As developers are increasingly adopting the cloud-native paradigm for application development, Kubernetes has become the dominant platform for orchestrating their cloud-native services. To facilitate iRODS access in Kubernetes, CyVerse has developed an iRODS Container Storage Interface (CSI) Driver, which provides on-demand data access using multiple connectivity modes to the iRODS server and exposes a file system interface to Kubernetes pods, thereby allowing cloud-native services to access iRODS without manually staging data within the containers. During this talk, the researchers will introduce the design and functionalities of the iRODS CSI Driver, as well as two sub-projects: Go-iRODSClient and iRODS FUSE Lite.
  • iRODS and NIEHS Environmental Health Science. NIEHS / NIH. NIEHS continues to leverage iRODS and has contributed to two important capabilities, indexing/pluggable search and pluggable publication. NIEHS will feature work on integrating search with the standard file and metadata indexing capability and describe how targeted search features are easily added. NIEHS will feature work on publishing and demonstrate how iRODS data collections and metadata can be published to the GEO repository. NIEHS will feature the ability to publish Data Repository Service bundles and serve them through a GA4GH-compliant interface. NIEHS will also discuss the NIH Gen3 platform and highlight opportunities and features of interest in the areas of rich metadata, metadata templates, and authorization and authentication via NIH Data Passport standards.

On Wednesday, the iRODS UGM will host a panel called “Storage Chargeback: Policy and Pricing,” featuring researchers from CyVerse, Wellcome Sanger Institute, and the iRODS Consortium discussing the opportunities, the costs, and the complexities involved in servicing customer requests to bring their storage into an existing managed software stack or environment.

Registration for the Virtual iRODS UGM will remain open throughout the week. See the registration page for details.


About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.
The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

UNC-Chapel Hill and RTI International selected to provide data management, stewardship to NIH-funded researchers focused on the opioid and pain management public health crises

NIH HEAL Initiative Data Stewardship Group will support researchers in making their data FAIR (findable, accessible, interoperable and reusable)  

Healthcare data has become increasingly easy to create, collect and store over the last decade. However, the industry is still working toward next steps in unlocking the potential of that collected data: preparing the data in such a way that it can be found and accessed; breaking down storage silos while also maintaining patient privacy; and teaching researchers, policymakers, physicians and patients how to effectively analyze and make use of the wealth of data that can inform decisions and policy.  

The NIH Helping to End Addiction Long-term InitiativeSM, or NIH HEAL InitiativeSM, is an aggressive, transagency effort to speed scientific solutions to stem the national opioid public health crisis. Recognizing the need to capitalize on the data their researchers are gathering in support of this mission, the NIH HEAL Initiative is providing up to $21.4 million over five years to the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill and RTI International (RTI) to help researchers successfully and securely prepare and sustain data from more than 500 studies. 

RENCI and RTI will work in partnership with the HEAL-funded team at the University of Chicago that is building a cloud-based platform to allow HEAL researchers, other investigators and advocates, health care providers, and policymakers to easily find NIH HEAL Initiative research results and data and use them to inform their own research, practice, policies and programs. 

According to Rebecca G. Baker, Ph.D., director of the NIH HEAL Initiative, data is the currency of lifesaving and evidence-based practice. 

“To be maximally useful, data must be findable to support new research and secondary analyses, as well as to guide education and policy about pain and addiction,” said Baker. “Preparing data to be easily discoverable can be a challenging and resource-intensive task. While most recognize the need to make data FAIR, not all research teams have the resources or expertise to do this. The RENCI/RTI group, in partnership with the Chicago team, will be available to HEAL-funded investigators to augment efforts where needed.”

Sharing HEAL-generated results and associated data as rapidly as possible will allow the broader community to ask and answer new research questions; conduct secondary analyses; and address fast-evolving challenges that surround pain management, opioid use and misuse, and overdose. NIH HEAL Initiative data are highly diverse and include imaging/microscopy, behavior, genomics, pharmacokinetics, and more. 

“Providing efficient and secure access for investigators to combine data from different studies should give us a much more accurate overall picture of how challenges around pain management and addiction can be addressed,” said Stan Ahalt, director of RENCI. “Given the urgency of HEAL’s mission, we are thankful to be able to provide expertise that can facilitate discovery of important elements hidden within the data.”  

To bring these hidden elements to light, the RENCI/RTI team will study the existing NIH HEAL Initiative data efforts and collaborations and through engagement with HEAL investigators will produce use cases and requirements for working across diverse data types. 

“We will ensure that the ecosystem architecture is purpose-built and that the ecosystem team provides the on-hand expertise to address HEAL’s needs as the research evolves,” said Rebecca Boyles, director and senior scientist in the Research Computing Division at RTI International. 


About RENCI

The Renaissance Computing Institute (RENCI) develops and deploys advanced technologies to enable research discoveries and practical innovations. RENCI partners with researchers, government, and industry to engage and solve the problems that affect North Carolina, our nation, and the world. An institute of the University of North Carolina at Chapel Hill, RENCI was launched in 2004 as a collaboration involving UNC Chapel Hill, Duke University, and North Carolina State University.

About RTI International

RTI International is an independent, nonprofit research institute dedicated to improving the human condition. Clients rely on us to answer questions that demand an objective and multidisciplinary approach — one that integrates expertise across the social and laboratory sciences, engineering and international development. We believe in the promise of science, and we are inspired every day to deliver on that promise for the good of people, communities and businesses around the world. For more information, visit www.rti.org.

SoftIron® Joins the iRODS Consortium; certifies HyperDrive® Compatibility with iRODS Architecture

SoftIron Ltd., the leader in task-specific data center solutions, today announced that it has joined the Integrated Rule-Oriented Data System (iRODS) Consortium, which supports the development of free open source software for data discovery, workflow automation, secure collaboration, and data virtualization. In joining the consortium, whose data management platform is used globally by research, commercial and governmental organizations, SoftIron has certified that its open source, Ceph-based HyperDrive™ Storage Appliances are compatible with the iRODS Architecture.

“With the open-source nature of Ceph and its ‘Swiss Army Knife’ capabilities that combine file, block, and object storage within the same infrastructure, we think that SoftIron’s HyperDrive storage appliances are a perfect complement to organizations using iRODS, and who want to scale their storage in a supported, simplified, flexible way,” said Phil Straw, CEO of SoftIron. “And, we’re especially pleased to formalize our membership this week, to coincide with BioData World Congress.” The event hosts some of the world’s leading life science organizations – many of whom use iRODS as a key data management platform in pharmaceutical research – enabling collaboration in their pursuit to solve some of the world’s great challenges. Phil continues; “These organizations are already using open source iRODS to advance their mission critical research, so we’re excited to showcase what SoftIron and Ceph can do to provide them with performance, flexibility and scalability gains, as well as reducing their total cost of ownership.”

“SoftIron and its strong orientation to open source is a great addition to the iRODS ecosystem,” said Jason Coposky, Executive Director of the iRODS Consortium. “Ceph has been gaining traction with both vendors and end-user organizations engaged with iRODS. Welcoming SoftIron, which purpose-builds hardware to optimize every aspect of Ceph, as a member brings immense value to that ecosystem. We look forward to collaborating with SoftIron as we work together to bring added capability, and flexibility to the iRODS community.”

In order to give iRODS users and others in life, bio and pharmaceutical sciences a perspective in using open source Ceph as part of their operational foundation, SoftIron’s Andrew Moloney, VP of Strategy, will be presenting this week during the BioData World Congress. His presentation, titled, “Redefining Software-Defined Storage – All the Performance, Without the Complexity,” will discuss some of the most important drivers of HPC storage growth, the operational challenges in storage infrastructure, and various infrastructural approaches for building software-defined storage architectures. Andrew’s talk will be available at 12.30pm GMT, November 9th, 2020. For more information, or a copy of the talk, please email info@softiron.com.

SoftIron® is the world-leader in task-specific appliances for scale-out data center solutions. Their superior, purpose-built hardware is designed, developed and assembled in California, and they are the only manufacturer to offer auditable provenance. SoftIron’s HyperDrive® software-defined, enterprise storage portfolio runs at wire-speed and is custom-designed to optimize Ceph. HyperSwitch™ is their line of next-generation, top-of-rack switches built to maximize the performance and flexibility of SONiC. HyperCast™ is their high-density, concurrent 4K transcoding solution, for multi-screen, multi-format delivery. SoftIron unlocks greater business value for enterprises by delivering best-in-class products, free from software and hardware lock-in. For more information visit www.SoftIron.com.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill. Current members of the iRODS Consortium, in addition to SoftIron, include RENCI, Bayer, the U.S. National Institute of Environmental Health Sciences, DataDirect Networks, Western Digital, the Wellcome Sanger Institute, Utrecht University, MSC, University College London, the Swedish National Infrastructure for Computing, University of Groningen, SURF, NetApp, Texas Advanced Computing Center (TACC), Cloudian, Maastricht University, University of Colorado, Boulder, SUSE, Agriculture Victoria, OpenIO, KU Leuven, the Bibliothèque et Archives nationales du Québec, CINES, and four additional anonymous members.