CHAPEL HILL, NC – RENCI will work with Indiana University and the Information Sciences Institute (ISI) at the University of Southern California on a new project to strengthen the integrity of data, giving researchers added assurance and trust in computational science.
The three-year project, Scientific Workflow Integrity with Pegasus, is funded by a $1 million grant from the National Science Foundation (NSF) as part of its Cybersecurity Innovation for Cyberinfrastructure (CICI) program. Von Welch, director of IU’s Center for Applied Cybersecurity Research (CACR), is the project’s principal investigator.
RENCI will receive $230,000 to support efforts that add information to data provenance—the term used to describe the tracking of location and ownership of data using metadata. That new information will describe the infrastructure used for data processing and describe all pieces of infrastructure that the data has touched. This additional information can then be used for a variety of purposes, such as tracing problems with the data back to faulty hardware, performing forensic analysis on compromised data, and more. Additionally, RENCI will help implement an improved process of selecting the best infrastructure for performing workflow analyses based on data location, data ownership and other attributes.
IU, the lead institution on the NSF grant, will receive $479,855 to increase cybersecurity within the Pegasus Workflow Management System. Pegasus is popular among the research community for its ability to easily structure and execute large-scale data analyses. The application benefits a wide range of scientific applications including LIGO (the Laser Interferometer Gravitational-Wave Observatory), which announced the first direct detection of gravitational waves earlier this year—proving that Einstein’s theory was right.
By digitally signing the data that is run through Pegasus, these improvements will strengthen consistency in results from multiple workflows. They’ll also allow users to see whether their data has changed since the last time a workflow was completed.
“Scientific data is a key part of scientific workflows and, ultimately, the science project,” said Von Welch, director of IU’s Center for Applied Cybersecurity Research (CACR), and the project’s principal investigator. “By integrating support for data integrity into the popular workflow management tool Pegasus, we increase our trust in computational science in a manner that will be easy for scientists to use.”
Scientists from a variety of disciplines, including astronomy, bioinformatics, earthquake science, gravitational wave physics, ocean science and neuroscience, have used Pegasus to run over 700,000 workflows over the last three years. However, the research team aims to achieve solutions that will be generic enough to apply to other workflow systems and applications and help an even broader scope of researchers.
“This is the kind of work that could have impacts well beyond the life of the grant and far beyond the $1 million in funding,” said Ilya Baldin, director of RENCI’s network research and infrastructure group and co-principal investigator on the project. “Scientists from many disciplines use Pegasus and it is critical for them to feel confident that the data they run through Pegasus workflows will be safe and uncompromised. Our team at RENCI looks forward to working with IU and ISI on this important data integrity problem.”
One of the challenges of the new project will be to make sure that the cryptography used for ensuring data integrity, such as the digital signatures, will scale appropriately to handle the increasingly large scientific datasets. Steven Myers, an expert in cryptography in IU’s School of Informatics and Computing, will guide the selection, implementation and deployment of the cryptographic systems, making sure they are efficient, and likely to maintain their security over the lengthy time periods scientific data is referenced and used.
“Cryptography can provide strong assurances of data integrity and records of its origin and modifications over the long periods of time that much scientific data is used and must be maintained,” said Myers. “Given the experimental costs of some of this data, having strong assurances is critical, as some groups have definite motive to modify the data, and the experiments are incredibly costly to reproduce if the data’s integrity is questioned.”
The award began Sept. 1.