What to expect at the iRODS 2022 User Group Meeting

The worldwide iRODS community will gather in Leuven, Belgium, from July 5 – 8 

Members of the iRODS user community will meet at KU Leuven in Belgium for the 14th Annual iRODS User Group Meeting to participate in four days of learning, sharing use cases, and discussing new capabilities that have been added to iRODS in the last year.

The event, sponsored by KU Leuven, RENCI, Vlaams Supercomputer Centrum, and Fujifilm, will provide in-person and virtual options for attendance. An audience of over 100 participants representing dozens of academic, government, and commercial institutions is expected to join.

“We are excited to meet in-person for the first time in three years to learn about the global impact of iRODS in fields such as life sciences, healthcare, cybernetics, and more,” said Terrell Russell, executive director of the iRODS Consortium. “In addition to hearing talks from our user community, the 2022 iRODS User Group Meeting will provide users the chance to network and collaborate throughout the week.”

In June, the iRODS Consortium and RENCI announced the release of iRODS 4.3.0. Along with supporting two additional operating systems, a notable new feature in the release is Delay Server Migration. The iRODS Delay Server can now be safely moved from one iRODS server to another without requiring a restart, which will provide administrators with flexibility when the system is under continuous load.

Another new feature is programmable authentication workflows. In the past, iRODS has supported various authentication methods such as native authentication, GSI, Kerberos, OpenID, with new authentication methods implemented as shared libraries that needed to be installed on the client and server side, often requiring patches for existing client libraries. The iRODS Consortium, in collaboration with SURF, has implemented an authentication plugin for iRODS 4.3.0 “pam_interactive” that enables the flexibility of fully-fledged PAM (pluggable authentication module) authentication flows.

During last year’s UGM, users learned about the Python iRODS 1.0.0 client and the S3 Resource plugin. Version 1.1.4 of the Python iRODS client is now available, and includes fixes for the XML protocol, connection reuse, the anonymous user, ticket enhancements, and compatibility with iRODS talking directly to S3. The iRODS S3 Resource Plugin has been extended to honor the Glacier semantics of an S3 storage system including reacting appropriately to responses that indicate the data requested will be available later. 

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature over 20 talks from users around the world. Among the use cases and deployments to be featured are:

  •  Data Management Environment at the National Cancer Institute. Frederick National Laboratory for Cancer Research. An efficient and cost-effective mechanism is required to store and manage the large heterogeneous datasets generated by high throughput technologies such as Next Generation Sequencing, Cryo-Electron Microscopy, and High Content Imaging. Tier 1 storage is expensive, and Tier 2 devices used standalone do not lend themselves well to discovering and disseminating datasets. The Data Management Environment (DME), a data management platform for storing, sharing, and managing high-value scientific datasets, was developed at the National Cancer Institute to close this gap. DME addresses the long-term data management needs of research labs and cores at NCI per the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles for data management. It supports S3 compatible object store, as well as file system-based storage. DME uses iRODS as the metadata management layer enabling virtualization of backend storage, replacement of storage providers with zero impact on users, and transparent migration of data across providers. The granular permissions scheme provided by iRODS coupled with DME’s authentication and authorization mechanism enables researchers to share data with collaborators securely. This talk will give an overview of the capabilities and architecture of the Data Management Environment and discuss how DME has leveraged iRODS to deliver enhanced data management and storage management capabilities.
  • iRODS speaks SFTP: More ways to securely transfer your data. CyVerse / University of Arizona. The need for compliance and data encryption during transfer is a strict requirement for many science domains that are working with confidential data. Realizing this unmet need for secure and encrypted transfers for CyVerse users, the CyVerse team decided to implement Secure File Transfer Protocol (SFTP) access to iRODS. This approach complements the existing secure data transfer and authentication method currently provided in iRODS via SSL and PAM authentication, which however are challenging to integrate into existing services or research workflows for multiple reasons: requiring changes on iRODS server, firewall configurations, and training users for complex client side installations of icommands. In this talk, the team introduces their work on adding iRODS as a backend storage option for SFTPGo utilizing the Go iRODS library developed at CyVerse.
  • From SRB to iRODS: 20 years of data management at the petabyte scale. CC-IN2P3. CC-IN2P3, a data center hosting services such as computing and data storage for international projects mainly in the fields of subatomic physics and astrophysics, has been using SRB and then iRODS in a wide variety of projects and use cases for the last 20 years. Data management has always been a key activity for a data center such as CC-IN2P3, due to the ever growing size of the projects, their international dimension. This talk will emphasize on the evolution of the data management needs, the pitfalls, the endless migration cycle (both hardware and software) over the years. It will also focus on the ongoing prospects, especially the long term data preservation needs and open science.
  • MrData: An iRODS Based Human Research Data Management System. Max Planck Institute for Biological Cybernetics. MrData is an iRODS based archival system for research medical imaging data, and was built initially to automate collection and archival of data flowing from a Siemens 9.4 Tesla MRI system. Of particular importance to this project was managing metadata related to human subject recruiting in a GDPR compliant manner. The team chose Castellum, a Max Planck developed system specifically for managing human subject data securely and we worked with that team to integrate it with the MrData system. An additional requirement for their team was “mixed use” metadata, information necessary for both subject recruiting and scientific processing. Mixed use metadata, such as handedness, is managed by Castellum but made available by MrData for scientific and archival purposes securely and without manual intervention. The Max Planck team will present an overview of this project, including current production status and future directions. 

Bookending this year’s UGM are two in-person events for those who hope to learn more about iRODS. On July 5, the Consortium is offering beginner and advanced training sessions. After the conference, on July 8, users have the chance to register for a troubleshooting session, devoted to providing one-on-one help with an existing or planned iRODS installation or integration.

Registration will remain open until the beginning of the event. Learn more at this year’s UGM at irods.org/ugm2022

About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.
The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

RENCI Leadership Assigned to Lead Implementation of the UNC School of Data Science & Society

RENCI Director, Stan Ahalt, and Chief Operating Officer, Jay Aikat, will take on secondary appointments as Inaugural Dean and Senior Associate Dean, respectively

RENCI leadership, in coordination with UNC-Chapel Hill leadership, has recently announced that Director, Stan Ahalt, and Chief Operating Officer, Jay Aikat, will be taking on leadership roles as secondary appointments for the launch of the School of Data Science and Society (SDSS) planned for fall 2022. Ahalt will serve as the School’s Inaugural Dean, and Aikat will serve as the Senior Associate Dean. 

Ahalt and Aikat have both been instrumental in spearheading data science efforts on campus for many years, each serving as members or leads on various committees that have led to the creation of the SDSS. Additionally, many others at RENCI have supported the path to the School through work such as: developing curriculum for and teaching the new Introduction to Data Science course; serving on the committee developing the Data Science minor; organizing and supporting the seven subcommittees for data science in 2019-2020 that led to the initial feasibility plan for the SDSS; and more. 

The announcement was made during a RENCI ‘All Hands’ meeting, where Chancellor Kevin Guskiewicz, Provost Chris Clemens, and Senior Associate Vice Chancellor for Research Andy Johns joined to say a few words. 

After the initial announcement of Ahalt and Aikat’s roles in the School, Guskiewicz emphasized why Ahalt is the right person for the job. “Stan is a global leader who we believe is well poised to lead this new school into the future,” said Guskiewicz. “[His] passion for using a team approach in applying data science to society’s most pressing challenges is exactly what we need for the new school.” 

Clemens added to Guskiewicz’s comments stating, “Through his research, teaching, and leadership of RENCI, Stan has a proven track record of bringing together diverse groups of people and collaborating across disciplines for the greater good.” Clemens went on to explain that the School will support the development of multi-disciplinary and flexible research clusters to utilize the variety of expertise and research at Carolina to address timely problems, making Stan’s unique experience crucial for this leadership role. 

“Our established prominence in the natural sciences, humanities, and social sciences uniquely  positions us to build the SDSS as a vessel and venue for interdisciplinary collaboration,” said Clemens, adding that the School will utilize innovative techniques and team science approaches to develop solutions that improve communities.      

In addition to sharing a vision for the School, Guskiewicz and Ahalt both emphasized the plan to establish a mutually beneficial relationship between RENCI and SDSS to seek out opportunities for collaboration and shared innovation. Ahalt noted that, since the beginning, RENCI’s work has focused on solving the most challenging problems affecting our society. 

“RENCI has demonstrated significant and consistent success in identifying pressing societal problems and applying a unique array of skills and expertise to develop and implement solutions,” said Ahalt.

Clemens stated that the SDSS will build upon this work and use RENCI as the premier model for a team science approach that produces real changes for people in our communities. 

Ahalt and Johns shared details for the changes that will happen at RENCI during this period. Ashok Krishnamurthy, currently RENCI’s Deputy Director, will serve as Interim Director. Asia Mieczkowska, currently Deputy Chief Operations Officer, will assume the role of Interim Chief Operations Officer. Ahalt also noted the possibility for RENCI researchers to take on new and interesting roles during this period and as the School develops, with unique opportunities for growth and creativity. 

InfiniteTactics joins iRODS Consortium

Organizations team up to keep pace with expanding data demands

CHAPEL HILL, NC – The iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS), welcomes its newest member, InfiniteTactics.

InfiniteTactics is a veteran-owned IT consulting firm that produces high-end technical solutions in large-scale data sciences support, autonomous system engineering and technical guidance, and custom software solutions. The company supports a diverse range of clients from small business start-ups to the Department of Defense.

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization.

The capability to efficiently manage, use, parse, store, and access data is crucial to InfiniteTactics’ business model, but this becomes more challenging as customers require access to increasingly large stores of data on a daily basis. InfiniteTactics AI Software Engineer Kyle Healy said iRODS offers an opportunity for the company to overcome key file system limitations and maintain excellent performance at competitive prices to support clients’ expanding data demands.

“Data management is an extremely important part of our business and the work we do because data is the integral backbone of data sciences support we provide,” said Healy. “We hope iRODS will help us implement a more efficient underlying file system we can utilize to bring a better product to our customers and provide a more efficient file system for the large-scale data sciences open-source community.”

The iRODS Consortium provides a production-ready distribution and professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“We are very excited to bring InfiniteTactics into the fold,” said Terrell Russell, executive director of the iRODS Consortium. “They are integrating a number of interesting new technologies and their belief in the open-source philosophy made for a quick partnership.”

“The RENCI team is well known for their large scale storage solutions,” said Healy. “Aligning ourselves with a strong partner in the storage solution industry was a great strategic move for us. Secondly, being able to provide our expertise back to the Consortium and improve the overall footprint of the large-scale data sciences open-source community through the Consortium made the decision easy.”

In addition to InfiniteTactics, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, CUBI at Berlin Institute of Health, DataDirect Networks, Emagine IT, KU Leuven, Maastricht University, Minnesota Supercomputing Institute at the University of Minnesota, the National Institute of Environmental Health Sciences, NetApp, Omnibond, OpenIO, RENCI, SoftIron, the SURF cooperative, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about InfiniteTactics, please visit https://infinitetactics.com.

Data Matters short-course series is back for August 2022

Annual data science series returns via Zoom 

Now in its ninth year, Data Matters 2022, a week-long series of one and two-day courses aimed at students and professionals in business, research, and government, will take place August 8 – 12 virtually via Zoom. The short course series is sponsored by the Odum Institute for Research in Social Science at UNC-Chapel Hill, the National Consortium for Data Science, and RENCI.

In recent years, employers’ expectations for a data literate workforce have grown significantly.  According to a 2021 Harvard Business Review report, while 90% of business leaders cite data literacy as key to company success, only 25% of workers feel confident in their data skills. Data Matters helps bridge this gap by providing attendees the chance to learn about a wide range of topics in data science, analytics, visualization, curation, and more from expert instructors.

“With the increase of data science tools being used in sectors such as business, research and government, it is essential that workers seek out educational opportunities that empower them to address new challenges in their field,” said Shannon McKeen, executive director of the National Consortium for Data Science. “Our short-course series has twelve courses that can be tailored to achieve individual data science goals, whether registrants are looking to refresh their knowledge or trying to learn something new in a welcoming, understanding environment.”

Data Matters instructors are experts in their fields from NC State University, UNC-Chapel Hill, Duke University, Cisco, and RENCI. Topics to be covered this year include information visualization, deep learning in Python, exploratory data analysis, statistical machine learning and programming in R, and more. Among the classes available are:

  • Introduction to Programming in R, Jonathan Duggins. Statistical programming is an integral part of many data-intensive careers and data literacy, and programming skills have become a necessary component of employment in many industries. This course begins with necessary concepts for new programmers—both general and statistical—and explores some necessary programming topics for any job that utilizes data. 
  • Overview of AI and Deep Learning, Ashok Krishnamurthy. Many key advances in AI are due to advances in machine learning, especially deep learning. Natural language processing, computer vision, speech translation, biomedical imaging, and robotics are some of the areas that have benefited from deep learning methods. This course is designed to provide an overview of AI, and in particular, deep learning. Topics include the history of neural networks, how advances in data collection and computing have caused a revival in neural networks, different types of deep learning networks and their applications, and tools and software available to design and deploy deep networks.
  • Introduction to Statistical Machine Learning in R, Yufeng Liu. Statistical machine learning and data mining is an interdisciplinary research area which is closely related to statistics, computer sciences, engineering, and bioinformatics. Many statistical machine learning and data mining techniques and algorithms are useful in various scientific areas. This two-day short course will provide an overview of statistical machine learning and data mining techniques with applications to the analysis of real data.
  • Geospatial Analytics Using Python, Laura Tateosian. This course will focus on how to explore, analyze, and visualize geospatial data. Using Python and ArcGIS Pro, students will inspect and manipulate geospatial data, use powerful GIS tools to analyze spatial relationships, link tabular data with spatial data, and map data. In these activities, participants will use Python and the arcpy library to invoke key GIS tools for spatial analysis and mapping.

Data Matters offers reduced pricing for faculty, students, and staff from academic institutions and for professionals with nonprofit organizations. Head to the Data Matters website to register and to see detailed course descriptions, course schedules, instructor bios, and logistical information. 

Registration is now open at datamatters.org. The deadline for registration is August 3 for Monday/Tuesday courses, August 4 for Wednesday courses, and August 7 for Thursday/Friday courses.


About the National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) is a collaboration of leaders in academia, industry, and government formed to address the data challenges and opportunities of the 21st century. The NCDS helps members take advantage of data in ways that result in new jobs and transformative discoveries. The organization connects diverse communities of data science experts to support a 21st century data-driven economy by building data science career pathways and creating a data-literate workforce, bridging the gap between data scientists in the public and private sectors, and supporting open and democratized data. Learn more at datascienceconsortium.org/.

The NCDS is administered by founding member RENCI, a research institute for data science and applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

RENCI and RTI International expand strategic partnership

CHAPEL HILL, N.C. — RTI International (RTI), a nonprofit research institute, has recently established a strategic partnership with UNC-Chapel Hill’s Renaissance Computing Institute (RENCI) to build upon the success of existing collaborations and jointly seek out new research opportunities in areas such as data modernization, data science, and team science solutions. RTI and RENCI have closely collaborated on multiple large-scale team science projects over the years, including NCATS Data Translator, NHLBI BioData Catalyst, and the NIH HEAL Initiative® Data Stewardship Group.

RTI and RENCI have both committed themselves to collaboratively secure funding in the aforementioned areas by:

  • Creating a joint identity for pursuit of opportunities;
  • Identifying and pursuing additional opportunities to expand on existing work;
  • Cooperating in the exchange of information and networking relevant to potential collaborations; and
  • Collaborating on business processes to streamline and simplify joint business development and project delivery.

“This partnership will allow RTI and RENCI to take full advantage of our well-established collaborative relationship, with a focus on strategically aligning ourselves and our expertise to create a unified identity to pursue future funding,” said Becky Boyles, Founding Director of the Center for Data Modernization Solutions at RTI. “It has become increasingly clear how well our organizations work together and complement each other, and we are looking forward to seeing further success with this partnership.”

Stan Ahalt, Director of RENCI, added, “RTI and RENCI have a synergistic relationship that has only strengthened over the years, and this feels like the right time to use this momentum to intentionally coordinate our efforts and make the biggest impact possible in the field of data science. We have shown time and again that our team science approach produces real results, and we know that our combined impact is greater than what we could achieve individually.”

The partnership will serve to enhance and streamline collaborations between the two organizations by creating standard processes, procedures, and marketing materials to emphasize their collective strengths. Karen Davis, Vice President of RTI’s Research Computing Division (RCD), noted, “RTI and RENCI have a long history of collaboration, and this MOU serves as a formal agreement between the organizations to continue expanding upon this groundwork while also signaling to other organizations the high value we place on team science and encouraging them to do the same.”

Ashok Krishnamurthy, Deputy Director of RENCI, further emphasized RENCI and RTI’s combined potential to make a big impact in stating, “This is a very exciting partnership, and we look forward to innovating together by applying data science to solving biological, environmental, and biomedical problems.”

RTI and RENCI are excited to establish this partnership to combine their individual strengths and resources and expand their collective scientific impact. As evidenced by the success of existing collaborations, this partnership will further facilitate the advancement of team science and scientific discovery in NC and beyond.

About RTI

RTI International is an independent, nonprofit research institute dedicated to improving the human condition. RTI’s vision is to address the world’s most critical problems with science-based solutions in pursuit of a better future.

About RENCI

The Renaissance Computing Institute (RENCI) is a research institute at UNC-Chapel Hill launched in 2004 that serves as a living laboratory fostering data science expertise, advancing software development tools and techniques, developing effective cross-disciplinary and cross-sector engagement strategies, and establishing sustainable business models for software and services.

Omnibond joins iRODS Consortium

Collaboration enhances synergies for improving end to end data integration

CHAPEL HILL, NC – The software company Omnibond has joined the iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS).

Omnibond is a software technology company with four main product areas including CloudyCluster for cloud high-performance computing and data analytics, OrangeFS for research data solutions, NetIQ for identity and access management, and TrafficVision for computer vision and AI solutions for the transportation industry. Company leaders say that enhanced integration with iRODS will help provide better instrument-to-cloud data and computation management, in particular for CloudyCluster and OrangeFS software.

“We help our customers deal with large amounts of data, and collaborating with iRODS for these products will help our customers with better end to end data management,” said Omnibond President and CEO Boyd Wilson. “We are excited to work with the iRODS team going forward and we are impressed with their vision and capabilities.”

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready distribution and professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“Omnibond can provide deployment and support services where we cannot, and their integration expertise extends the Consortium’s reach into new markets,” said Terrell Russell, executive director of the iRODS Consortium. “After working alongside one another for years, we are very happy to welcome Omnibond to the iRODS Consortium.”

Wilson noted that the open-source model makes iRODS a particularly good fit for Omnibond’s portfolio, which is focused around building synergies between research and open-source technologies. “We currently are the maintainers of OrangeFS, an open-source parallel file system that has been incorporated into the Linux kernel by the Linux kernel team, so we understand the value of open-source software and are excited to partner with the iRODS Consortium,” said Wilson.

In addition to Omnibond, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, CUBI at Berlin Institute of Health, DataDirect Networks, Emagine IT, KU Leuven, Maastricht University, Minnesota Supercomputing Institute at the University of Minnesota, the National Institute of Environmental Health Sciences, NetApp, OpenIO, RENCI, SoftIron, the SURF cooperative, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about Omnibond, please visit https://obz.io.

Emagine IT joins iRODS Consortium

Collaboration points to critical role of data management in advancing cybersecurity

Emagine IT (EIT) has joined the iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS). In becoming the Consortium’s latest member, EIT brings a cybersecurity lens to driving data management solutions in collaboration with the broader iRODS community.

EIT provides IT modernization, cybersecurity, and full lifecycle IT services to the public and private sectors. Ensuring security and regulatory compliance for disparate confidential and personal data types poses complex challenges, making data management innovation a crucial part of EIT’s business.

“The recent ransomware attacks across the globe speak to the universal importance of secure data management at the intersection of IT operations and cybersecurity,” said Aaron Pendola, director of Health IT at EIT. “EIT believes iRODS has a unique capability to solve complex data challenges related to cybersecurity.”

For instance, EIT can use iRODS to advance common data standards and terminologies, helping to overcome some of the fragmentation that has historically hindered the development of cohesive, global cybersecurity solutions. Open-source software, such as iRODS, is at the forefront of technology innovation. While the idea may seem counterintuitive, Pendola says that open-source models are well positioned to improve data privacy and security by helping users and partners anticipate how technology will evolve.

“We fully recognize how open-source technologies like iRODS have led to profound mission impacts across the industries we serve,” said Pendola. “We are excited to participate in the continuous improvement of iRODS driving its evolution and enhancements by virtue of the open-source, collaborative consortium model.”

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready distribution and professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“Emagine IT’s focus on federal, state and local, and commercial contracts with expertise in  cybersecurity and IT modernization adds a new element to our membership,” said Terrell Russell, interim executive director of the iRODS Consortium. “We are excited to welcome them to the community and look forward to new collaborations.”

In addition to EIT, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, CUBI at Berlin Institute of Health, DataDirect Networks, KU Leuven, Maastricht University, Minnesota Supercomputing Institute at the University of Minnesota, the National Institute of Environmental Health Sciences, NetApp, OpenIO, RENCI, SoftIron, the SURF cooperative, the Swedish National Infrastructure for Computing, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and five organizations that wish to remain anonymous.

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About Emagine IT

Emagine IT, inc. (EIT) is an information technology services and consulting company based in the Washington, DC metropolitan area. EIT provides IT modernization, cybersecurity, and full lifecycle IT services to the public and private sectors. For more information, please visit their website at www.eit2.com.

iRODS and Fujifilm partner to provide an archive solution

FUJIFILM Recording Media U.S.A., Inc. and the iRODS Consortium today announce a collaboration and integration, creating a joint solution built upon FUJIFILM Object Archive software and the iRODS data management platform. This joint solution leverages the benefits of a tape storage tier for infrequently accessed “cold” data, providing an automated archiving workflow for research, commercial, and governmental organizations that require storing large – and in most cases, rapidly growing – amounts of data.

With this solution, FUJIFILM Object Archive becomes a deep-tier archive storage target while iRODS provides a data management platform for users who produce massive amounts of research and analytics data.

FUJIFILM Object Archive software has been tested with the iRODS S3 plugin and fully supports the AMAZON S3 abstraction that iRODS provides. In addition to regular AMAZON S3 compatibility, Fujifilm and the iRODS Consortium worked together to add functionality comparable to AMAZON GLACIER to the iRODS S3 Resource Plugin.

This new functionality will be available as part of the upcoming iRODS 4.2.11 release.

Moving appropriate data to tape provides the benefits of air-gap security and scalability with lower data center operating costs and less electricity consumption when compared to other storage solutions. Additionally, FUJIFILM Object Archive software supports the new, higher-capacity LTO-9 tape technology, making the solution potentially even more efficient, economical, and scalable.

“We are very excited to be working with Fujifilm on the AMAZON GLACIER features,” said Terrell Russell, interim executive director of the iRODS Consortium. “Together, we are building a long-term relationship that will be good for our users, and for both organizations.”

“The new interoperability between Fujifilm’s Object Archive software and the iRODS data management platform will greatly benefit organizations who use both products, and potentially create new use cases as well,” said Tom Nakatani, vice president of sales & marketing at FUJIFILM Recording Media U.S.A., Inc. “We are pleased to successfully implement this joint solution for the benefit of our collaborators and users.”

Fujifilm is the world’s leading data tape manufacturer (based on market share). Its FUJIFILM Object Archive software allows objects to be seamlessly written to and read from data tape media with Fujifilm’s OTFormat. Using the industry-standard AMAZON S3-compatible API, Object Archive software offers the same operability as cloud storage and easy long-term retention of data similar to AMAZON GLACIER. By using FUJIFILM Object Archive software to optimize existing storage, organizations can eliminate egress fees, offload cold data to tape, maintain chain of custody, realize low ongoing storage costs, and help protect against cyber threats by providing a physical air-gap to data.

About the iRODS Consortium

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open-source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

About Fujifilm

FUJIFILM Recording Media U.S.A., Inc. is FUJIFILM Corporation’s U.S.-based manufacturing, marketing and sales operation for data tape media and data management solutions. The company provides data center customers and enterprise industry partners with a wide range of innovative recording media products and archival solutions. Based on a history of thin-film engineering and magnetic particle science such as Fujifilm’s NANOCUBIC™ and Barium Ferrite technology, Fujifilm creates breakthrough data storage products. Worldwide, Fujifilm and its affiliates have surpassed the 170 million milestone for the number of LTO ULTRIUM data cartridges manufactured and sold since introduction, establishing the company as the leading global manufacturer of mid-range and enterprise data tape.

For more information on FUJIFILM Recording Media products, call 800-488-3854 or go to https://www.fujifilm.com/us/en/business/data-storage. For more information about FUJIFILM Object Archive software, visit http://fujifilmobjectarchive.com.

FUJIFILM Holdings Corporation, Tokyo, Japan, brings cutting edge solutions to a broad range of global industries by leveraging its depth of knowledge and fundamental technologies developed in its relentless pursuit of innovation. Its proprietary core technologies contribute to the various fields including healthcare, highly functional materials, document solutions and imaging products. These products and services are based on its extensive portfolio of chemical, mechanical, optical, electronic and imaging technologies. For the year ended March 31, 2021, the company had global revenues of $21 billion, at an exchange rate of 106 yen to the dollar. The Fujifilm global family of companies is committed to responsible environmental stewardship and good corporate citizenship. For more information, please visit: www.fujifilmholdings.com

FUJIFILM, OBJECT ARCHIVE, and NANOCUBIC are the trademarks and registered trademarks of FUJIFILM Corporation and its affiliates.

AMAZON, AMAZON GLACIER and AMAZON S3 are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.

LTO and ULTRIUM are registered trademarks of Hewlett Packard Enterprise, IBM and Quantum in the United States and/or other countries.

© 2021 FUJIFILM Recording Media U.S.A. Inc. All Rights Reserved

RENCI named as partner in NSF institute to establish new field of imageomics

Imageomics Institute will advance computational methods for studying Earth’s biodiversity

RENCI has been named as a partner on an ambitious new effort to use images of living organisms as the basis for understanding biological processes of life on Earth. The project, to be led by faculty from The Ohio State University’s Translational Data Analytics Institute, has been awarded a $15 million grant from the National Science Foundation as part of NSF’s Harnessing the Data Revolution initiative.

The new entity, which will be called the Imageomics Institute, aims to establish imageomics as a new field of study that has the potential to transform biomedical, agricultural and basic biological sciences. Similar to genomics before it, which applied computation to the study of the human genome, imageomics will leverage computer science to help scientists extract meaning from an otherwise unwieldy amount of natural image data.

“There are many more species out there than scientists have been able to study in-depth,” said Jim Balhoff, a Senior Research Scientist at RENCI who will lead the RENCI component of the project. “If we can leverage machine learning to interpret images of living organisms, that would provide a scalable way to process large amounts of information about species, complementing the work of trained wildlife biologists.”

The Institute’s scientists will apply machine learning techniques to large collections of digital images from museums, labs and other institutions, as well as photos taken by scientists in the field, camera traps, drones and even members of the public who have uploaded their images to platforms such as eBird, iNaturalist and Wildbook. By training algorithms to extract biologically meaningful information from these images, researchers aim to generate new knowledge about organisms and species, including insights about how they evolve and interact within ecosystems.

Critical to this effort is the ability to categorize features of living organisms with standardized, vocabularies, known as a bio-ontologies, that can be “understood” by computers. Having served as a key contributor on the Phenoscape team for several previous NSF-funded projects, Balhoff is steeped in the art of encoding biological information in computable ways.

“There’s a lot of work going on with machine learning, and one of the key pieces of this project is to develop ways to incorporate ontology-based knowledge into machine learning processes,” said Balhoff. “We’re providing expertise in bio-ontologies to incorporate what we know about anatomical relationships into this image analysis system.”

This approach could ultimately enable a computer to identify key features in an image, such as an eye, mouth or dorsal fin, and then use automated reasoning to check that the interpretation makes anatomical sense. Repeating this process for large collections of images can give scientists a powerful platform for investigating new or previously understudied species or help them better understand the relationships between organisms.

As an inaugural institute for data-intensive discovery in science and engineering within NSF’s Harnessing the Data Revolution initiative, the Imageomics Institute will be part of a broader effort to form a national collaborative research network dedicated to computation-enabled discovery.

In addition to The Ohio State University and RENCI, the project will involve biologists and computer scientists from Tulane University, Virginia Tech, Duke University, and Rensselaer Polytechnic Institute; senior personnel from Ohio State, Virginia Tech and six additional institutions; and collaborators from more than 30 universities and organizations around the world.

RENCI to join researchers in a collaboration to increase reliability and efficiency of DOE scientific workflows by leveraging artificial intelligence and machine learning methods

Poseidon will use AI/ML-based techniques to simulate, model, and optimize scientific workflow performance on large, distributed DOE computing infrastructures.

The Department of Energy (DOE) advanced Computational and Data Infrastructures (CDIs) – such as supercomputers, edge systems at experimental facilities, massive data storage, and high-speed networks – are brought to bear to solve the nation’s most pressing scientific problems, including assisting in astrophysics research, delivering new materials, designing new drugs, creating more efficient engines and turbines, and making more accurate and timely weather forecasts and climate change predictions. 

Increasingly, computational science campaigns are leveraging distributed, heterogeneous scientific infrastructures that span multiple locations connected by high-performance networks, resulting in scientific data being pulled from instruments to computing, storage, and visualization facilities.

This image shows the terrain height – an important factor in weather modeling – across almost all of North America with spatial resolution of 4km. Poseidon tools will help improve workflows and lead to even more efficient weather forecasts through reliable and efficient execution of weather models.

Credit: Jiali Wang, Argonne National Laboratory

However, since these federated services infrastructures tend to be complex and managed by different organizations, domains, and communities, both the operators of the infrastructures and the scientists that use them have limited global visibility, which results in an incomplete understanding of the behavior of the entire set of resources that science workflows span. 

“Although scientific workflow systems like Pegasus increase scientists’ productivity to a great extent by managing and orchestrating computational campaigns, the intricate nature of the CDIs, including resource heterogeneity and the deployment of complex system software stacks, pose several challenges in predicting the behavior of the science workflows and in steering them past system and application anomalies,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator (PI). “Our new project, Poseidon, will provide an integrated platform consisting of algorithms, methods, tools, and services that will help DOE facility operators and scientists to address these challenges and improve the overall end-to-end science workflow.”

Under a new DOE grant, Poseidon aims to advance the knowledge of how simulation and machine learning (ML) methodologies can be harnessed and amplified to improve the DOE’s computational and data science.

Research institutions collaborating on Poseidon include the University of Southern California, the Argonne National Laboratory, the Lawrence Berkeley National Laboratory, and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill.

Poseidon will add three important capabilities to current scientific workflow systems — (1) predicting the performance of complex workflows; (2) detecting and classifying infrastructure and workflow anomalies and “explaining” the sources of these anomalies; and (3) suggesting performance optimizations. To accomplish these tasks, Poseidon will explore the use of novel simulation, ML, and hybrid methods to predict, understand, and optimize the behavior of complex DOE science workflows on DOE CDIs. 

Poseidon will explore hybrid solutions where data collected from DOE and NSF testbeds, as well as from an ML simulator, will be strategically inputted into an ML training system.

High Performance computing systems, such as planned Aurora at the Argonne Leadership Computing Facility, are integral pieces of DOE CDIs. 

Credit: Argonne National Laboratory

“In addition to creating a more efficient timeline for researchers, we would like to provide CDI operators with the tools to detect, pinpoint, and efficiently address anomalies as they occur in the complex DOE facilities landscape,” said Anirban Mandal, Poseidon co-PI, assistant director for network research and infrastructure at RENCI, University of North Carolina at Chapel Hill. “To detect anomalies, Poseidon will explore real-time ML models that sense and classify anomalies by leveraging underlying spatial and temporal correlations and expert knowledge, combine heterogeneous information sources, and generate real-time predictions.”

RENCI will play a pivotal role in the Poseidon project. RENCI researchers Cong Wang and Komal Thareja will lead project efforts in data acquisition from the DOE CDI and NSF testbeds (FABRIC and Chameleon Cloud) and emulation of distributed facility models, enabling ML model training and validation on the testbeds and DOE CDI. Additionally, Poseidon co-PI Anirban Mandal will lead the project portion on performance guidance for optimizing workflows.

Successful Poseidon solutions will be incorporated into a prototype system with a dashboard that will be used for evaluation by DOE scientists and CDI operators. Poseidon will enable scientists working on the frontier of DOE science to efficiently and reliably run complex workflows on a broad spectrum of DOE resources and accelerate time to discovery.

Furthermore, Poseidon will develop ML methods that can self-learn corrective behaviors and optimize workflow performance, with a focus on explainability in its optimization methods. 

Working together, the researchers behind Poseidon will break down the barriers between complex CDIs, accelerate the scientific discovery timeline, and transform the way that computational and data science are done.

Please visit the project website for more information.