Scientific Data Analysis at Scale (SciDAS)

Overview

Many tools and resources exist to help scientists conduct research efficiently, including high performance computing in the cloud, data management and scientific workflow systems, and tools that enable collaboration and data sharing. However, scientists need to seek out these different tools and platforms, and national cyberinfrastructure does not currently operate as a seamless, integrated system.

SciDAS researchers are developing a more fluid and flexible cyberinfrastructure for working with and analyzing large-scale data. It will combine both new and existing software to construct a system that will be efficient, practical and user-friendly.  The SciDAS system will help researchers discover data, move it smoothly across advanced networks, and improve flexibility and accessibility to national and international scientific resources.

Domain scientists will be heavily involved in building the SciDAS system, providing real data sets and use cases that will help to refine the system.  The first scientific use case will involve building gene co-expression networks for systems biology.

RENCI’s role

RENCI will lead the effort to integrate existing cyber tools and technologies into the new SciDAS infrastructure that will be designed to support all aspects of distributed, data-driven research. Development of the SciDAS framework will involve integrating a number of NSF-funded CI systems—most of them developed at RENCI—into one package, including:

  • NSF CC-IIE RADII (Resource Aware Data-centric collaborative Infrastructure), an effort to couple data management (iRODS) and resource management (the ORCA control framework) from the ground up. Its tools and approaches allow scientists to easily map collaborative data-driven activities onto dynamically configurable cloud infrastructures.
  • iRODS: the integrated Rule Oriented Data System, which federates distributed and heterogenuous data into a single virtual file system for easier, safer data sharing and data management.
  • NSF SSI Hydroshare, an open-source collaborative system for sharing hydrologic data and models. Hydroshare enables scientists to easily discover and access data and models in the cloud or retrieve them to their desktops.
  • NSF CC-NIE ADAMANT (Adaptive Data-Aware Multi-Domain Application Network Topologies), which integrates the Pegasus workflow management system and the ORCAresource control framework. It leverages ExoGENI as well as national research and education networks to create elastic, isolated environments to execute complex distributed tasks.
  • NSF CICI SAFE, a project working to securely automate and monitor the creation of virtual super-facilities that link scientists to multiple resources. CICI-SAFE automates the authorization and security monitoring needed to keep these very fast and dynamic network links safe.

 Project Team

Alex Feltus, Clemson University (Lead PI)
Claris Castillo (Lead RENCI PI)
Ray Idaszak (RENCI co-PI)
Melissa Smith, Clemson University
Stephen Ficklin, Washington State University

Partners
Clemson University
Washington State University

Funding
National Science Foundation, Office of Advanced Cyberinfrastructure, Grant No. 1659300

 Related Links
RENCI News Release
Clemson News Release
SciDAS Video (produced by Clemson University)