Data Matters short-course series returns in August 2021

Annual short-course series aims to bridge the data literacy gap

Now in its eighth year, Data Matters 2021, a week-long series of one and two-day courses aimed at students and professionals in business, research, and government, will take place August 9 – 13 virtually via Zoom. The short course series is sponsored by the Odum Institute for Research in Social Science at UNC-Chapel Hill, the National Consortium for Data Science, and RENCI.

Although the need for data literacy has grown exponentially for employers over the last few years, many academic institutions are struggling to keep up. According to a 2021 report from Forrester, 81% of recruiters rated data skills and data literacy as important capabilities for candidates, while only 48% of academic planners reported that their institution currently has specific data skills initiatives set up. Data Matters helps bridge this gap by providing attendees the chance to learn about a wide range of topics in data science, analytics, visualization, curation, and more from expert instructors.

“As our society becomes more data-driven, we’ve seen a greater need for workers in environments such as industry, health, and law to have a basic understanding of data science techniques and applications,” said Shannon McKeen, executive director of the National Consortium for Data Science. “The Data Matters short-course series allows us to meet the high demand for data science education and to provide pathways for both recent graduates and current professionals to bridge the data literacy gap and enrich their knowledge.”

Data Matters instructors are experts in their fields from NC State University, UNC-Chapel Hill, Duke University, Cisco, Blue Cross NC, and RENCI. Topics to be covered this year include information visualization, data curation, data mining and machine learning, programming in R, systems dynamics and agent-based modeling, and more. Among the classes available are:

  • Introduction to Programming in R, Jonathan Duggins. Statistical programming is an integral part of many data-intensive careers and data literacy, and programming skills have become a necessary component of employment in many industries. This course begins with necessary concepts for new programmers—both general and statistical—and explores some necessary programming topics for any job that utilizes data. 
  • Text Analysis Using R, Alison Blaine. This course explains how to clean and analyze textual data using R, including both raw and structured texts. It will cover multiple hands-on approaches to getting data into R and applying analytical methods to it, with a focus on techniques from the fields of text mining and Natural Language Processing.
  • Using Linked Data, Jim Balhoff. Linked data technologies provide the means to create flexible, dynamic knowledge graphs using open standards. This course offers an introduction to linked data and the semantic web tools underlying its use. 
  • R for Automating Workflow & Sharing Work, Justin Post. The course provides participants an introduction to utilizing R for writing reproducible reports and presentations that easily embed R output, using online repositories and version control software for collaboration, creation of basic websites using R, and the development of interactive dashboards and web applets. 

Data Matters offers reduced pricing for faculty, students, and staff from academic institutions and for professionals with nonprofit organizations. Head to the Data Matters website to register and to see detailed course descriptions, course schedules, instructor bios, and logistical information. 

Registration is now open at datamatters.org. The deadline for registration is August 5 for Monday/Tuesday courses, August 7 for Wednesday courses, and August 8 for Thursday/Friday courses.


About the National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) is a collaboration of leaders in academia, industry, and government formed to address the data challenges and opportunities of the 21st century. The NCDS helps members take advantage of data in ways that result in new jobs and transformative discoveries. The organization connects diverse communities of data science experts to support a 21st century data-driven economy by building data science career pathways and creating a data-literate workforce, bridging the gap between data scientists in the public and private sectors, and supporting open and democratized data.The NCDS is administered by founding member RENCI. Learn more at datascienceconsortium.org/.

Tagged |

RENCI joins researchers across the US in supporting NSF Major Facilities with data lifecycle management efforts through new NSF-funded Center of Excellence

When it comes to research, having a strong cyberinfrastructure that supports advanced data acquisition, storage, management, integration, mining, visualization, and computational processing services can be vital. However, building cyberinfrastructures (CI) — especially ones that aim to support multiple varied and complex scientific facilities — is a challenge.

In 2018, a team of researchers from institutions across the country came together to launch a pilot program aimed at creating a model for a Cyberinfrastructure Center of Excellence (CI CoE) for the National Science Foundation’s (NSF) Major Facilities. The goal was to identify how the center could serve as a forum for the exchange of CI knowledge across varying fields and facilities, establish best practices for different NSF Major Facilities’ CI, provide CI expertise, and address CI workforce development and sustainability.

“Over the past few years, my colleagues and I have worked to provide expertise and support for the NSF Major Facilities in a way that accelerates the data lifecycle and ensures the integrity and effectiveness of the cyberinfrastructure,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator. “We are proud to contribute to the overall NSF CI ecosystem and to work with the NSF Major Facilities on solving their CI challenges together, understanding that our work may help support the sustainability and progress of the Major Facilities’ ongoing research and discovery.”

Five NSF Major Facilities were selected for the pilot: the Arecibo Observatory, the Geodetic Facility for the Advancement of Geoscience, the National Center for Atmospheric Research, the National Ecological Observatory Network, and the Seismological Facilities for the Advancement of Geoscience and EarthScope. As the pilot progressed, the program expanded to engage additional NSF Major Facilities.

The pilot found that Major Facilities differ in types of data captured, scientific instruments used, data processing and analyses conducted, and policies and methods for data sharing and use. However, the study also found that there are commonalities between the various Major Facilities in terms of the data lifecycle (DLC). As a result, the pilot developed a DLC model that captured the stages that data within a Major Facility goes through. The model includes stages for 1) data capture; 2) initial processing near the instrument(s); 3) central processing at data centers or clouds; 4) data storage, curation, and archiving; and 5) data access, dissemination, and visualization. Finding these commonalities helped the pilot program develop common challenges and standardized practices for establishing overarching CI requirements and to develop a blueprint for a CI CoE that can address the pressing Major Facilities DLC challenges.

Now, with a new NSF award, the pilot program has begun phase two and become CI CoE: CI Compass, An NSF Center of Excellence dedicated to navigating the Major Facilities’ data lifecycle. CI Compass will apply its three years of initial evaluation and analyses for an improved CI, as needed for the NSF’s Major Facilities.

The research institutions collaborating on CI Compass include the University of Southern California, the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, the University of Notre Dame, Indiana University, Texas Tech University, and the University of Utah.

RENCI will play a pivotal role in the success of CI Compass by leading working groups that offer expertise and services to NSF Major Facilities for processing, data movement, data storage, curation, and archiving elements of the Major Facilities DLC.   

“Cyberinfrastructure is a critical element for fulfilling the science missions for the NSF Major Facilities and a primary goal of CI Compass is to partner with Major Facilities to enhance and evolve their CI,” said Anirban Mandal, assistant director for network research and infrastructure at the Renaissance Computing Institute at University of North Carolina at Chapel Hill, and co-principal investigator and associate director of the project. “In the process, CI Compass will not only act as a ‘knowledge sharing’ hub for brokering connections between CI professionals at Major Facilities, but also will disseminate the knowledge to the broader NSF CI community.”

RENCI team members, in particular Ilya Baldin, who is also PI for the NSF FABRIC project, will offer expertise in networking and cloud computing for innovative Major Facilities CI architecture designs. Under Mandal’s leadership as associate director of CI Compass, RENCI will also be responsible for continuous internal evaluation of the project and measuring the impact of CI Compass on the Major Facilities and the broader CI ecosystem. Erik Scott will take a lead role in CI Compass working groups for data storage, curation, archiving and identity management, while Laura Christopherson will lead the efforts in project evaluation.


Working together, the CI Compass team will enhance the overall NSF CI ecosystem by providing expertise where needed to enhance and evolve the Major Facilities CI, capturing and disseminating CI knowledge and best practices that power scientific breakthroughs for Major Facilities, and brokering connections to enable knowledge sharing between and across Major Facilities CI professionals and the broader CI community. 

Visit ci-compass.org to learn more about the project.


This project is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. The pilot effort was funded by CISE/OAC and the Division of Emerging Frontiers in the Directorate for Biological Sciences under grant #1842042.

iRODS Consortium announces leadership transitions

The Renaissance Computing Institute (RENCI) – the founding member that administers the iRODS Consortium – announced today that Jason Coposky has officially resigned from his post as iRODS Consortium Executive Director effective June 11, 2021.

Coposky has been at RENCI for fifteen years and has served as the Executive Director of the Consortium for the last five and a half years and as Chief Technologist for five years before that. In these leadership roles, Coposky managed the software development team, directed the full software development lifecycle, and coordinated code hardening, testing, and application of formal software engineering practices. He also built and nurtured relationships with existing and potential consortium members and served as the chief spokesperson on iRODS development and strategies to the worldwide iRODS community. The Consortium has more than tripled in size under his leadership.  

In addition to growing the community, Coposky has been instrumental in turning the open source iRODS platform into enterprise software that is now deployed as a data management and data sharing solution at businesses, research centers, and government agencies in the U.S., Europe, and Asia. 

Terrell Russell, who has also been working on iRODS software since the development team transitioned to RENCI in 2008 and has held the role of Chief Technologist for the past five and a half years, has been named Interim Executive Director. 

For more information on iRODS and the iRODS Consortium, please visit irods.org.

Minnesota Supercomputing Institute joins iRODS Consortium

University of Minnesota institute aims to use iRODS to support HIPAA compliance

CHAPEL HILL, NC – The Minnesota Supercomputing Institute (MSI) of the University of Minnesota has become the newest member of the iRODS Consortium, the membership-based foundation that leads development and support of the integrated Rule-Oriented Data System (iRODS).

MSI provides state-of-the art compute and storage solutions to accelerate scientific inquiry at the University of Minnesota and beyond. Its high-performance computing resources, specialized hardware, visualization tools, and dedicated consultants support data intensive research in any area of science, engineering, and the humanities.

“Our users have large data sets so we are always interested in new ways to help users manage their data,” said Edward Munsell, system administrator at MSI. “After we started testing iRODS, we quickly saw how it could help us with our compliance concerns around our new HIPAA cluster. We will be using iRODS to help log user activity on files, ensure files are only accessed according to our policies, and to automate processing of data off of instruments.”

Looking ahead, the team plans to expand its use of iRODS to help users manage and share their data in MSI’s second tiered storage system. Munsell noted that the open source nature of iRODS makes it possible to customize the system for MSI’s specific needs by building in additional logging and permissions handling. In addition, the team plans to take advantage of iRODS’ capability to automatically extract and tag objects with metadata, helping MSI move closer to completely automating some user workflows.

The iRODS Consortium is a membership-based organization that guides development and support of iRODS as free open source software for data discovery, workflow automation, secure collaboration, and data virtualization. The iRODS Consortium provides a production-ready iRODS distribution and iRODS professional integration services, training, and support. The consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill, USA.

“We are excited to welcome the University of Minnesota to the iRODS community,” said Jason Coposky, executive director of the iRODS Consortium. “The variety and complexity of their use cases requires a solution with significant flexibility, and we believe iRODS will be able to meet their current and future needs. UMN will be a valuable member of the community, and we look forward to a long relationship.”

“We decided to become a community member because we see the value that iRODS has and we want to help support the software,” said Munsell. “I have worked with many from the team while we were evaluating iRODS, and have been impressed with their responsiveness, knowledge, and willingness to help.”

In addition to UMN, current iRODS Consortium members include Agriculture Victoria, Bayer, Bibiothèque et Archives nationales du Québec, CINES, Cloudian, DataDirect Networks, KU Leuven, Maastricht University, the National Institute of Environmental Health Sciences, NetApp, OpenIO, RENCI, SoftIron, the SURF cooperative, SUSE, the Swedish National Infrastructure for Computing, Texas Advanced Computing Center, University College London, University of Colorado, Boulder, University of Groningen, Utrecht University, Wellcome Sanger Institute, Western Digital, and four organizations that wish to remain anonymous.

To learn more about iRODS and the iRODS Consortium, please visit irods.org.

To learn more about the Minnesota Supercomputing Institute, please visit https://msi.umn.edu.

What to expect at the 2021 iRODS User Group Meeting

iRODS users and consortium members will gather virtually from June 8-11 

The worldwide iRODS user community will connect online from June 8 – 11 for the 13th Annual iRODS User Group Meeting – four days of learning, sharing of use cases, and discussions of new capabilities that have been added to iRODS in the last year.

The virtual event, sponsored by Wellcome Sanger Institute, Globus, GRAU DATA, SoftIron, and RENCI, will be a collection of live talks and panels with Q&A. An audience of over 200 participants representing dozens of academic, government, and commercial institutions is expected to join.

“The 2021 iRODS User Group Meeting features an impressive list of presentations from our user community, including talks in fields such as healthcare, agriculture, and education,” says Jason Coposky, Executive Director at iRODS. “The structure of our virtual event ensures that attendees get plenty of opportunities to network and collaborate throughout the week, while learning how users have utilized iRODS across the globe.”

Meeting attendees will learn about new updates such as the Python iRODS client, C++ REST API, and the Zone Management Tool, according to Coposky. On June 11, the last day of the meeting, the Consortium team will run an iRODS Troubleshooting session, where participants can receive one-on-one help with an existing or planned iRODS installation or integration.

The iRODS Consortium and RENCI are gearing up to release iRODS 4.2.9. A notable addition within the release is the introduction of logical locking by providing additional replica status values within the catalog. Previously, replicas in iRODS could only be marked ‘good’ or ‘stale,’ which did not capture the states of when data was in flight, or incomplete. The new intermediate and locked states for iRODS replicas will be used to provide protection from uncoordinated writes into the system.

The iRODS NFSRODS 2.0.0 client was released last month. The NFSRODS client presents the iRODS virtual file system as NFSv4.1, which allows iRODS to be surfaced into any existing infrastructure with just a mount command, while still preserving the server-side policies enforced by the iRODS API. The new version provides significant performance improvements and caching, and resolves a few small bugs. 

During last year’s UGM, users learned about policy composition, which allowed them to configure multiple policies together without the need to touch the code. This feature is now ready and will be released with iRODS 4.2.9, including a growing library of implemented policies which will include future support for indexing and publishing.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature 30 talks from users around the world. Among the use cases and deployments to be featured are:

  • The Research Data Management System at the University of Groningen: architecture, solution engines, and challenges. University of Groningen. The University of Groningen has developed the RUG Research Data Management System, powered by iRODS, to store, share data, and allow collaborative research projects. The system is developed based on open source solutions, providing access to the stored data by means of command line, webdav, and a web-based graphical user interface. The system provides a number of key functionalities and technical solutions such as metadata templates management, data policies, data provenance, and audit and uses existing iRODS functionalities and tools, such as the iRODS audit plugin and the iRODS Python rule-engine.
  • Frictionless Data for iRODS. Earlham Institute. The international wheat community has embraced the omics era and is producing larger and more heterogeneous datasets at a rapid pace in order to produce better varieties via breeding programs. These programs, especially in the pre-breeding space, have encouraged wheat communities to make these datasets available more openly. However, the consistent and standardized collection and dissemination of data based on rich metadata remains difficult as so much of this information is stored in papers and supplementary information. In response to this concern, the Earlham Institute has built Grassroots, an infrastructure including portals to deliver large scale datasets with semantically marked-up metadata to power FAIR data in crop research.
  • Go-iRODSClient, iRODS FUSE Lite, and iRODS CSI Driver: Accessing iRODS in Kubernetes. CyVerse / University of Arizona. As developers are increasingly adopting the cloud-native paradigm for application development, Kubernetes has become the dominant platform for orchestrating their cloud-native services. To facilitate iRODS access in Kubernetes, CyVerse has developed an iRODS Container Storage Interface (CSI) Driver, which provides on-demand data access using multiple connectivity modes to the iRODS server and exposes a file system interface to Kubernetes pods, thereby allowing cloud-native services to access iRODS without manually staging data within the containers. During this talk, the researchers will introduce the design and functionalities of the iRODS CSI Driver, as well as two sub-projects: Go-iRODSClient and iRODS FUSE Lite.
  • iRODS and NIEHS Environmental Health Science. NIEHS / NIH. NIEHS continues to leverage iRODS and has contributed to two important capabilities, indexing/pluggable search and pluggable publication. NIEHS will feature work on integrating search with the standard file and metadata indexing capability and describe how targeted search features are easily added. NIEHS will feature work on publishing and demonstrate how iRODS data collections and metadata can be published to the GEO repository. NIEHS will feature the ability to publish Data Repository Service bundles and serve them through a GA4GH-compliant interface. NIEHS will also discuss the NIH Gen3 platform and highlight opportunities and features of interest in the areas of rich metadata, metadata templates, and authorization and authentication via NIH Data Passport standards.

On Wednesday, the iRODS UGM will host a panel called “Storage Chargeback: Policy and Pricing,” featuring researchers from CyVerse, Wellcome Sanger Institute, and the iRODS Consortium discussing the opportunities, the costs, and the complexities involved in servicing customer requests to bring their storage into an existing managed software stack or environment.

Registration for the Virtual iRODS UGM will remain open throughout the week. See the registration page for details.


About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.
The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.

UNC-Chapel Hill and RTI International selected to provide data management, stewardship to NIH-funded researchers focused on the opioid and pain management public health crises

NIH HEAL Initiative Data Stewardship Group will support researchers in making their data FAIR (findable, accessible, interoperable and reusable)  

Healthcare data has become increasingly easy to create, collect and store over the last decade. However, the industry is still working toward next steps in unlocking the potential of that collected data: preparing the data in such a way that it can be found and accessed; breaking down storage silos while also maintaining patient privacy; and teaching researchers, policymakers, physicians and patients how to effectively analyze and make use of the wealth of data that can inform decisions and policy.  

The NIH Helping to End Addiction Long-term InitiativeSM, or NIH HEAL InitiativeSM, is an aggressive, transagency effort to speed scientific solutions to stem the national opioid public health crisis. Recognizing the need to capitalize on the data their researchers are gathering in support of this mission, the NIH HEAL Initiative is providing up to $21.4 million over five years to the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill and RTI International (RTI) to help researchers successfully and securely prepare and sustain data from more than 500 studies. 

RENCI and RTI will work in partnership with the HEAL-funded team at the University of Chicago that is building a cloud-based platform to allow HEAL researchers, other investigators and advocates, health care providers, and policymakers to easily find NIH HEAL Initiative research results and data and use them to inform their own research, practice, policies and programs. 

According to Rebecca G. Baker, Ph.D., director of the NIH HEAL Initiative, data is the currency of lifesaving and evidence-based practice. 

“To be maximally useful, data must be findable to support new research and secondary analyses, as well as to guide education and policy about pain and addiction,” said Baker. “Preparing data to be easily discoverable can be a challenging and resource-intensive task. While most recognize the need to make data FAIR, not all research teams have the resources or expertise to do this. The RENCI/RTI group, in partnership with the Chicago team, will be available to HEAL-funded investigators to augment efforts where needed.”

Sharing HEAL-generated results and associated data as rapidly as possible will allow the broader community to ask and answer new research questions; conduct secondary analyses; and address fast-evolving challenges that surround pain management, opioid use and misuse, and overdose. NIH HEAL Initiative data are highly diverse and include imaging/microscopy, behavior, genomics, pharmacokinetics, and more. 

“Providing efficient and secure access for investigators to combine data from different studies should give us a much more accurate overall picture of how challenges around pain management and addiction can be addressed,” said Stan Ahalt, director of RENCI. “Given the urgency of HEAL’s mission, we are thankful to be able to provide expertise that can facilitate discovery of important elements hidden within the data.”  

To bring these hidden elements to light, the RENCI/RTI team will study the existing NIH HEAL Initiative data efforts and collaborations and through engagement with HEAL investigators will produce use cases and requirements for working across diverse data types. 

“We will ensure that the ecosystem architecture is purpose-built and that the ecosystem team provides the on-hand expertise to address HEAL’s needs as the research evolves,” said Rebecca Boyles, director and senior scientist in the Research Computing Division at RTI International. 


About RENCI

The Renaissance Computing Institute (RENCI) develops and deploys advanced technologies to enable research discoveries and practical innovations. RENCI partners with researchers, government, and industry to engage and solve the problems that affect North Carolina, our nation, and the world. An institute of the University of North Carolina at Chapel Hill, RENCI was launched in 2004 as a collaboration involving UNC Chapel Hill, Duke University, and North Carolina State University.

About RTI International

RTI International is an independent, nonprofit research institute dedicated to improving the human condition. Clients rely on us to answer questions that demand an objective and multidisciplinary approach — one that integrates expertise across the social and laboratory sciences, engineering and international development. We believe in the promise of science, and we are inspired every day to deliver on that promise for the good of people, communities and businesses around the world. For more information, visit www.rti.org.

SoftIron® Joins the iRODS Consortium; certifies HyperDrive® Compatibility with iRODS Architecture

SoftIron Ltd., the leader in task-specific data center solutions, today announced that it has joined the Integrated Rule-Oriented Data System (iRODS) Consortium, which supports the development of free open source software for data discovery, workflow automation, secure collaboration, and data virtualization. In joining the consortium, whose data management platform is used globally by research, commercial and governmental organizations, SoftIron has certified that its open source, Ceph-based HyperDrive™ Storage Appliances are compatible with the iRODS Architecture.

“With the open-source nature of Ceph and its ‘Swiss Army Knife’ capabilities that combine file, block, and object storage within the same infrastructure, we think that SoftIron’s HyperDrive storage appliances are a perfect complement to organizations using iRODS, and who want to scale their storage in a supported, simplified, flexible way,” said Phil Straw, CEO of SoftIron. “And, we’re especially pleased to formalize our membership this week, to coincide with BioData World Congress.” The event hosts some of the world’s leading life science organizations – many of whom use iRODS as a key data management platform in pharmaceutical research – enabling collaboration in their pursuit to solve some of the world’s great challenges. Phil continues; “These organizations are already using open source iRODS to advance their mission critical research, so we’re excited to showcase what SoftIron and Ceph can do to provide them with performance, flexibility and scalability gains, as well as reducing their total cost of ownership.”

“SoftIron and its strong orientation to open source is a great addition to the iRODS ecosystem,” said Jason Coposky, Executive Director of the iRODS Consortium. “Ceph has been gaining traction with both vendors and end-user organizations engaged with iRODS. Welcoming SoftIron, which purpose-builds hardware to optimize every aspect of Ceph, as a member brings immense value to that ecosystem. We look forward to collaborating with SoftIron as we work together to bring added capability, and flexibility to the iRODS community.”

In order to give iRODS users and others in life, bio and pharmaceutical sciences a perspective in using open source Ceph as part of their operational foundation, SoftIron’s Andrew Moloney, VP of Strategy, will be presenting this week during the BioData World Congress. His presentation, titled, “Redefining Software-Defined Storage – All the Performance, Without the Complexity,” will discuss some of the most important drivers of HPC storage growth, the operational challenges in storage infrastructure, and various infrastructural approaches for building software-defined storage architectures. Andrew’s talk will be available at 12.30pm GMT, November 9th, 2020. For more information, or a copy of the talk, please email info@softiron.com.

SoftIron® is the world-leader in task-specific appliances for scale-out data center solutions. Their superior, purpose-built hardware is designed, developed and assembled in California, and they are the only manufacturer to offer auditable provenance. SoftIron’s HyperDrive® software-defined, enterprise storage portfolio runs at wire-speed and is custom-designed to optimize Ceph. HyperSwitch™ is their line of next-generation, top-of-rack switches built to maximize the performance and flexibility of SONiC. HyperCast™ is their high-density, concurrent 4K transcoding solution, for multi-screen, multi-format delivery. SoftIron unlocks greater business value for enterprises by delivering best-in-class products, free from software and hardware lock-in. For more information visit www.SoftIron.com.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure located at the University of North Carolina at Chapel Hill. Current members of the iRODS Consortium, in addition to SoftIron, include RENCI, Bayer, the U.S. National Institute of Environmental Health Sciences, DataDirect Networks, Western Digital, the Wellcome Sanger Institute, Utrecht University, MSC, University College London, the Swedish National Infrastructure for Computing, University of Groningen, SURF, NetApp, Texas Advanced Computing Center (TACC), Cloudian, Maastricht University, University of Colorado, Boulder, SUSE, Agriculture Victoria, OpenIO, KU Leuven, the Bibliothèque et Archives nationales du Québec, CINES, and four additional anonymous members.

NSF announces $3 million award to expand FABRIC cyberinfrastructure globally

Advanced network offers platform to reimagine the Internet and speed scientific discovery

A new $3 million grant from the National Science Foundation (NSF) will expand FABRIC, a project to build the nation’s largest cyberinfrastructure testbed, to four preeminent scientific institutions in Asia and Europe. The expansion represents an ambitious effort to accelerate scientific discovery by creating the networks needed to move vast amounts of data across oceans and time zones seamlessly and securely.

Science is fast outgrowing the capabilities of today’s Internet infrastructure. To fully capitalize on big data, artificial intelligence, advanced computation and the Internet of Things requires robust, interconnected computers, storage, networks and software. Uneven progress in science cyberinfrastructure has led to bottlenecks that stymie collaboration and slow the process of discovery.

FABRIC, launched in 2019 with a $20 million grant from NSF, is building a cyberinfrastructure platform where computer scientists can reimagine the Internet and test new ways to store, compute, and move data. With the new NSF award, a sister project called FABRIC Across Borders (FAB) will link FABRIC’s nationwide infrastructure to nodes in Japan, Switzerland, the U.K. and the Netherlands.

“FAB allows collaborative international science projects to experiment with ways to do their science more efficiently,” said FAB Principal Investigator Anita Nikolich, Director of Technology Innovation at the University of Illinois School of Information Sciences and Cyber Policy Fellow at the Harris School of Public Policy at University of Chicago. “Sending large quantities of data long distances—across borders and oceans—is complicated when your science depends on real-time processing so you don’t miss once in a lifetime events. Being able to put FABRIC nodes in physically distant places allows us to experiment with the infrastructure to support new capabilities and also bring disparate communities together.”

FAB will be led by the University of Illinois along with core team members from RENCI at the University of North Carolina at Chapel Hill; the University of Kentucky; the Department of Energy’s Energy Sciences Network (ESnet); Clemson University; and the University of Chicago. Over three years, the team will work with international partners to place FABRIC nodes at the University of Tokyo; CERN, the European Organization for Nuclear Research in Geneva, Switzerland; the University of Bristol in the U.K.; and the University of Amsterdam.

The project is driven by science needs in fields that are pushing the limits of what today’s Internet can support. As new scientific instruments are due to come online in the next few years—generating ever larger data sets and demanding ever more powerful computation—FAB gives researchers a testbed to explore and anticipate how all that data will be handled and shared among collaborators spanning continents.

“FAB will offer a rich set of network-resident capabilities to develop new models for data delivery from the Large Hadron Collider (LHC) at CERN to physicists worldwide,” said Rob Gardner, Deputy Dean for Computing and research professor in the Physical Sciences Division at the University of Chicago and member of FAB’s core team. “As we prepare for the high luminosity LHC, the FAB international testbed will provide a network R&D infrastructure we’ve never had before, allowing us to consider novel analysis systems that will propel discoveries at the high energy frontier of particle physics.”

“FABRIC will tremendously help the ATLAS experiment in prototyping and testing at scale some of the innovative ideas we have to meet the high throughput and big data challenges ATLAS will face during the high luminosity LHC era,” said ATLAS computing coordinators Alessandro Di Girolamo, a staff scientist in CERN’s IT department, and Zach Marshall, an ATLAS physicist from Lawrence Berkeley National Laboratory. “The ATLAS physics community will be excited to test new ways of doing analysis, better exploiting the distributed computing infrastructure we run all around the world.”

To ensure the project meets the needs of the scientists it aims to serve, FAB will be built around use cases led by scientific partners in five areas:

  • Physics (high energy physics use cases at CERN’s Large Hadron Collider)

  • Space (astronomy and cosmology use cases in the Legacy Survey of Space and Time and the Cosmic Microwave Background-Stage 4 project)

  • Smart cities (sensing and computing use cases to advance smart, connected communities for the NSF SAGE project and work at the University of Antwerp and the University of Bristol)

  • Weather (use cases to improve weather and climate prediction at the University of Miami and Brazil’s Center for Weather Forecast and Climatic Studies)

  • Computer science (use cases in private 5G networks at the University of Tokyo; censorship evasion at Clemson University; network competition and sharing at the University of Kentucky; and software-defined networking and P4 programming at South Korea’s national research and engineering network, KREONET)

FAB will connect with existing U.S. and international cyberinfrastructure testbeds and bring programmable networking hardware, storage, computers, and software into one interconnected system. All software associated with FAB will be open source and posted in a publicly available repository: https://github.com/fabric-testbed/.

Tagged , , , |

Cloud Computing Testbed Chameleon Launches Third Phase with Focus on IoT and Reproducibility

$10 million NSF grant funds next four years of multi-institutional project

Since it launched in 2015, Chameleon has enabled systems and networking innovations by providing thousands of computer scientists with the bare metal access they need to conceptualize, assemble, and test new cloud computing approaches. 

Under a new four-year, $10 million grant from the National Science Foundation (NSF), the cloud computing testbed will further broaden its scope, adding new features for reproducibility, IoT and networking experimentation, and GPU computation to its core mission. This multi-institutional initiative is led by the University of Chicago (UChicago) in collaboration with the Renaissance Computing Institute (RENCI), Texas Advanced Computing Center (TACC), and Northwestern University.

“Chameleon is a scientific instrument for computer science systems research,” said Kate Keahey, senior computer scientist at Argonne National Laboratory and the Consortium for Advanced Science and Engineering (CASE) of the University of Chicago, and principal investigator of the Chameleon project. “Astronomers have telescopes, biologists have microscopes, and computer scientists have Chameleon.”

In its first five years, Chameleon has attracted more than 4,000 users from over 100 institutions, working on more than 500 different research and education projects. Scientists have used the testbed to study power management, operating systems, virtualization, high performance computing, distributed computing, networking, security, machine learning, and more. Educators have used Chameleon for cloud computing courses, allowing college and high school students to build their own cloud and learn the inner workings of the technology. 

The upcoming phase of Chameleon will further develop work already begun such as the popular CHameleon Infrastructure (CHI) that provides enhanced capabilities with the open source OpenStack project

The team will also broaden connections to other mission-specific testbeds, which will allow experimenters to implement core contributions of testbeds beyond Chameleon into their work. For example, Chameleon will expand capabilities for connecting IoT technologies by integrating with testbeds such as SAGE.

RENCI’s contributions to Chameleon in the third phase of funding will support this cross-testbed capability by further enabling experimentation with advanced programmable networking devices and accelerators. The RENCI team will also develop new options for software-defined networking that will allow compatibility with FABRIC, a currently-developing “everywhere programmable” nationwide instrument with large amounts of compute and storage, interconnected by high speed, dedicated optical links. 

“The planned additions to Chameleon will allow academic researchers to experiment with advanced programmable networks in a large-scale cloud environment,” said Paul Ruth, assistant director of network research and infrastructure at RENCI and co-PI on the Chameleon project. “We are excited to extend Chameleon’s cloud experiments into RENCI’s FABRIC testbed, which will facilitate larger, more diverse networking experiments.” 

Finally, the Chameleon team will also add expanded tools for reproducible research, and they will add new hardware and storage resources at the project’s two primary sites, UChicago and TACC, as well as at a supplemental Northwestern University site.

“Chameleon is a great example of how shared infrastructure with over 4,000 users can save the academic community time and money while catalyzing new research results,” said Deepankar Medhi, program director in the Computer & Information Sciences & Engineering Directorate (CISE) at the National Science Foundation. “NSF is pleased to fund Chameleon for four more years in order to extend the platform with new capabilities, thus allowing researchers to conduct new lines of research and students to learn newer technologies.”

To learn more about the testbed or begin experimenting on it today, visit chameleoncloud.org.

What to expect at the 2020 iRODS User Group Meeting

iRODS users and consortium members will gather virtually from June 9-12

The worldwide iRODS user community will connect online this week for the 12th Annual iRODS User Group Meeting – three days of learning, sharing of use cases, and discussions of new capabilities that have been added to iRODS in the last year.

The virtual event, sponsored by the University of Arizona, Cloudian, and RENCI, will be a collection of live talks with Q&A. An audience of nearly 300 participants representing dozens of academic, government, and commercial institutions is expected to join.

“The annual iRODS User Group Meeting has always opened our eyes to the impact of iRODS worldwide, and this year’s meeting will be no different,” says Jason Coposky, Executive Director at iRODS. “Although we are moving to a virtual platform, we intend to provide a similar experience to years past by ensuring there are plenty of opportunities for networking, discussion, and collaboration.”

Meeting attendees will learn about new updates such as hard links, direct streaming, and policy composition, according to Coposky. On June 12, the last day of the meeting, the Consortium team will run an iRODS Troubleshooting session, where participants can receive one-on-one help with an existing or planned iRODS installation or integration.

Last month, iRODS Consortium and RENCI announced the release of iRODS 4.2.8. A notable addition within the release was a new C++ rule engine plugin that provides an iRODS system the ability to convey hard links to its users. An iRODS system stores a hard link when replicas of two different iRODS data objects with different logical paths share a common physical path on the same host. When this occurs, metadata is added to both logical data objects for bookkeeping.

This year’s update to the iRODS S3 plugin shares the design and engineering underway to provide direct streaming into and out of S3-compatible storage. This rewrite uses the new iRODS IOStreams library and in-memory buffering to make efficient multi-part transfers.

With the addition of a continuation code to the rule engine plugin framework, iRODS users are now able to configure multiple policies to be invoked for any given policy enforcement point. The policy developers now have the ability to separate the policy enforcement points from the policy itself. Given this new approach, multiple policies can be configured together, or composed, without the need to touch the code.

As always with the annual UGM, in addition to general software updates, users will offer presentations about their organizations’ deployments of iRODS. This year’s meeting will feature 23 talks from users around the world. Among the use cases and deployments to be featured are:

  • SmartFarm data management, Agriculture Victoria. Data management challenges increase with large datasets generated with new sensing technologies. This requires the development of standardised, automated, on line, authenticated and verifiable standard processes for uploading data for storage and analytics on computing facilities. Agriculture Victoria undertakes research and development in animal and plant production, chemistry, spatial information, soil and water science. Working with iRODS, Agriculture Victoria are piloting new data management workflows of ‘SmartFarm’ data, and this talk will discuss lessons from small, medium, and high data Agriculture SmartFarm use cases using edge computing and collaborative data infrastructure and the flow on development of capability for AVR researchers.
  • Data management in autonomous driving projects, Aptiv. Aptiv is a global technology company that develops safer, greener, and more connected solutions that enable the future of mobility. The company deployed iRODS in production around 1.5 year ago, together with the start of the development phase of a big project on autonomous driving. The researchers will share how iRODS has assisted in tracking and migrating data between partners and within engineering groups responsible for data collection, manual and automatical analysis.
  • Building a national Research Data Management (RDM) infrastructure with iRODS in the Netherlands, SURF. In the Netherlands, many universities are looking at iRODS to support their researchers, as they recognize the powerful potential of the tool in two areas: support for secure cooperation, and support over the entire research data life cycle. SURF, a national organization providing IT support and infrastructure for universities, is now working closely together with six universities towards building a national RDM infrastructure based on iRODS. Researchers from SURF will share a case study for the use of iRODS, not for a specific research group, but for an entire nation to enhance the support of their researchers by working together on this iRODS based infrastructure.
  • Keeping pace with science: The CyVerse Data Store in 2020 and the Future, CyVerse / University of Arizona. CyVerse, hosted at the University of Arizona, provides a national cyberinfrastructure for life science research as well as training scientists in using such high performance computing resources.This talk will describe the current features of the CyVerse Data Store and plans for its evolution. Since its inception in 2010, the Data Store has leveraged the power and versatility of iRODS by continually extending the functionality of CyVerse’s cyberinfrastructure. These features include project-specific storage, offsite replication, third-party service and application integrations, several data access methods, event stream publishing for indexing, and optimizations for accessing large sets of small files.

Registration for the Virtual iRODS UGM will remain open throughout the week. See the registration page for details.


About the iRODS Consortium

The iRODS Consortium is a membership organization that supports the development of the integrated Rule-Oriented Data System (iRODS), free open source software for data virtualization, data discovery, workflow automation, and secure collaboration. The iRODS Consortium provides a production-ready iRODS distribution and iRODS training, professional integration services, and support. The world’s top researchers in life sciences, geosciences, and information management use iRODS to control their data. Learn more at irods.org.

The iRODS Consortium is administered by founding member RENCI, a research institute for applications of cyberinfrastructure at the University of North Carolina at Chapel Hill. For more information about RENCI, visit renci.org.