When it comes to research, having a strong cyberinfrastructure that supports advanced data acquisition, storage, management, integration, mining, visualization, and computational processing services can be vital. However, building cyberinfrastructures (CI) — especially ones that aim to support multiple varied and complex scientific facilities — is a challenge.
In 2018, a team of researchers from institutions across the country came together to launch a pilot program aimed at creating a model for a Cyberinfrastructure Center of Excellence (CI CoE) for the National Science Foundation’s (NSF) Major Facilities. The goal was to identify how the center could serve as a forum for the exchange of CI knowledge across varying fields and facilities, establish best practices for different NSF Major Facilities’ CI, provide CI expertise, and address CI workforce development and sustainability.
“Over the past few years, my colleagues and I have worked to provide expertise and support for the NSF Major Facilities in a way that accelerates the data lifecycle and ensures the integrity and effectiveness of the cyberinfrastructure,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator. “We are proud to contribute to the overall NSF CI ecosystem and to work with the NSF Major Facilities on solving their CI challenges together, understanding that our work may help support the sustainability and progress of the Major Facilities’ ongoing research and discovery.”
Five NSF Major Facilities were selected for the pilot: the Arecibo Observatory, the Geodetic Facility for the Advancement of Geoscience, the National Center for Atmospheric Research, the National Ecological Observatory Network, and the Seismological Facilities for the Advancement of Geoscience and EarthScope. As the pilot progressed, the program expanded to engage additional NSF Major Facilities.
The pilot found that Major Facilities differ in types of data captured, scientific instruments used, data processing and analyses conducted, and policies and methods for data sharing and use. However, the study also found that there are commonalities between the various Major Facilities in terms of the data lifecycle (DLC). As a result, the pilot developed a DLC model that captured the stages that data within a Major Facility goes through. The model includes stages for 1) data capture; 2) initial processing near the instrument(s); 3) central processing at data centers or clouds; 4) data storage, curation, and archiving; and 5) data access, dissemination, and visualization. Finding these commonalities helped the pilot program develop common challenges and standardized practices for establishing overarching CI requirements and to develop a blueprint for a CI CoE that can address the pressing Major Facilities DLC challenges.
Now, with a new NSF award, the pilot program has begun phase two and become CI CoE: CI Compass, An NSF Center of Excellence dedicated to navigating the Major Facilities’ data lifecycle. CI Compass will apply its three years of initial evaluation and analyses for an improved CI, as needed for the NSF’s Major Facilities.
The research institutions collaborating on CI Compass include the University of Southern California, the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, the University of Notre Dame, Indiana University, Texas Tech University, and the University of Utah.
RENCI will play a pivotal role in the success of CI Compass by leading working groups that offer expertise and services to NSF Major Facilities for processing, data movement, data storage, curation, and archiving elements of the Major Facilities DLC.
“Cyberinfrastructure is a critical element for fulfilling the science missions for the NSF Major Facilities and a primary goal of CI Compass is to partner with Major Facilities to enhance and evolve their CI,” said Anirban Mandal, assistant director for network research and infrastructure at the Renaissance Computing Institute at University of North Carolina at Chapel Hill, and co-principal investigator and associate director of the project. “In the process, CI Compass will not only act as a ‘knowledge sharing’ hub for brokering connections between CI professionals at Major Facilities, but also will disseminate the knowledge to the broader NSF CI community.”
RENCI team members, in particular Ilya Baldin, who is also PI for the NSF FABRIC project, will offer expertise in networking and cloud computing for innovative Major Facilities CI architecture designs. Under Mandal’s leadership as associate director of CI Compass, RENCI will also be responsible for continuous internal evaluation of the project and measuring the impact of CI Compass on the Major Facilities and the broader CI ecosystem. Erik Scott will take a lead role in CI Compass working groups for data storage, curation, archiving and identity management, while Laura Christopherson will lead the efforts in project evaluation.
Working together, the CI Compass team will enhance the overall NSF CI ecosystem by providing expertise where needed to enhance and evolve the Major Facilities CI, capturing and disseminating CI knowledge and best practices that power scientific breakthroughs for Major Facilities, and brokering connections to enable knowledge sharing between and across Major Facilities CI professionals and the broader CI community.
Visit ci-compass.org to learn more about the project.
This project is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. The pilot effort was funded by CISE/OAC and the Division of Emerging Frontiers in the Directorate for Biological Sciences under grant #1842042.