DENVER, Colorado – Presentations about key RENCI projects, including iRODS and the iRODS Consortium, ExoGENI and advanced networking, the National Consortium for Data Science (NCDS), the South Big Data Hub (SBDH) will be featured in the RENCI exhibit at SC17. We will also offer information and presentations on new initiatives, such as SciDAS (Scientific Data Analysis at Scale) and {x}DCI (cross-disciplinary data science cyberinfrastructure).
SC17 is the world’s premier conference for high performance computing, networking, storage, and analysis, attracting more than 10,000 participants each year. The RENCI booth (#437) will open at 7 p.m. Monday, Nov. 13 for a pre-conference gala and sneak peak at booth offerings. In addition to the Monday night opening gala, exhibit hours for the conference will be 10 a.m. – 6 p.m. Tuesday, Nov. 14 and Wednesday, Nov. 15, and 10 a.m. – 3 p.m. Thursday, Nov. 16.
Look for updates about our activities at SC17 via the following social media:
7 p.m. – 9 p.m.
Welcome SC17 attendees! The RENCI booth tonight will feature information on:
- Advanced networking
- The iRODS data management platform and the iRODS Consortium
- The National Consortium for Data Science (NCDS)
- The South Big Data Hub
Data Management Design Patterns
iRODS Software and Consortium Update
iRODS Overview
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows
iRODS Software and Consortium Update
xDCI: Accelerating Discovery for Scientific Communities
Automating the Data Workflow and Distribution in Research Computing
iRODS Demonstrations
10:00 a.m. – 10:30 a.m.
Data Management Design Patterns
Presenter: Terrell Russell, iRODS Consortium | At the Western Digital/HGST Theater, booth #643
Description: The community of iRODS users has taught us about a variety of common use cases that tightly orbit around large data volumes, annotation, automation, auditing, and compliance issues.
Surveying diverse implementations is how mature domains refine their complex technologies into a common language. Design patterns are technology agnostic, structured descriptions of how a specific goal may be achieved.
This talk will describe our discovery process and share some Data Management Design Patterns as best open community practice. These design patterns can be used to clearly communicate organizational policy about your data lifecycle, both to others within an organization as well as to external interested parties.
********
10:30 a.m. – 11:30 a.m.
iRODS Software and Consortium Update
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Description: This talk will discuss the history, the present, and the future of iRODS development – with a focus on the newest 4.2 software features. The iRODS team will also discuss the state of the iRODS Consortium, the organization that supports continued development of iRODS data management software as free open source software.
********
11:30 a.m – 12:30 pm
IRODS Overview
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Description: This talk will present an overview of iRODS, including an overview of its functions, architecture, use cases, and future directions. iRODS is open source data grid middleware that consolidates the management of heterogeneous data storage technologies. Equipped with configurable automation and metadata cataloging capabilities, over 100 PB of data is managed using iRODS worldwide. Example use cases include tracking gene sequencing workflows at several of the world’s preeminent research institutes and streaming terabytes of production video footage across the globe.
********
12:30 p.m. – 1:30 p.m.
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows
Presenters: Anirban Mandal and Cong Wang, RENCI; Ewa Deelman and Rafael Ferreira da Silva, USC/ISI; Mariam Kiran, LBNL; Vickie Lynch, ORNL
Description: We will present the DOE Panorama 360 project, which aims to provide a resource for the collection, analysis, and sharing of performance data about end-to-end scientific workflows executing on DOE facilities. We are developing a repository and associated capabilities for data collection, ingestion, and analysis for a broad class of DOE applications that span experimental and simulation science workflows.
In particular, our work focuses on workflows that include experimental data generation at DOE facilities and include: 1) a distributed repository for different types of data (point measurements, time series, traces); 2) a set of data capture, curation, and publishing tools that can be used to interact with the repository; 3) a set of analysis, algorithms, and machine learning-based tools to analyze and characterize the data; and 4) best practices and recommendations for workflow evaluation, analysis, execution, and architectures.
********
1:30 p.m. – 2:30 p.m.
iRODS Software and Consortium Update
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Description: This talk will discuss the history, the present, and the future of iRODS development – with a focus on the newest 4.2 software features. The iRODS team will also discuss the state of the iRODS Consortium, the organization that supports continued development of iRODS data management software as free open source software.
********
2:30 – 3:30 pm
xDCI: Accelerating Discovery for Scientific Communities
Presenter: Ray Idaszak, RENCI
Description: xDCI – or {cross disciplinary} Data Cyberinfrastructure – is a complete and customizable solution for scientific communities looking to use data cyberinfrastructure to kick start their research and accelerate the discovery process. Unlike typical software stacks, xDCI enables researchers who are new to using data science cyberinfrastructure to start from “ground zero” and quickly ramp up to form vibrant, connected research communities. This presentation will offer an overview of the xDCI software stack as well as tips on how to get started with the platform.
********
3:30 p.m. – 4:30 p.m.
Automating the Data Workflow and Distribution in Research Computing
Dave Fellinger, iRODS Consortium
Description: A growing number of applications in high performance research computing require large scale data reductions or comparisons. Instruments such as the Large Hadron Collider and, more recently, the Square Kilometer Array produce huge amounts of sensor data which must be analyzed on a parallel compute cluster. We will discuss the use of iRODS as a middleware tool which can enable the automation of the complete data migration process from the external file system, to the parallel “scratch” file system adjacent to the compute cluster, and finally to a distributed file system providing a discoverable and searchable research archive. All of these operations are coordinated by the iRODS interface to the compute cluster machine scheduler maximizing the efficiency of the entire process workflow and achieving the ultimate goal of providing timely data to researchers.
********
4:30 p.m. – 5:30 p.m.
iRODS Demonstrations
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Using material from recently presented workshops, we will demonstrate several key features that have made iRODS a critical technology for research organizations worldwide. We will show how storage resource composition makes it easy to distribute and replicate data across multiple file systems; how the iRODS rule engine can automate searchable metadata annotation; and how federation allows users to access data at remote sites through a common interface.
Data Management Design Patterns
SAFE Superfacilities using ExoGENI, Chameleon, and ScienceDMZs
SWIP: Scientific Workflow Integrity with Pegasus
Data Management Design Patterns
iRODS Overview
Status and Demo of Kanki, an Open Source Cross-platform Native iRODS Client Application
Big Data, Analytics, and iRODS
SciDAS
Object storage for IRODs
xDCI: Accelerating Discovery for Scientific Communities
The South Big Data Regional Innovation Hub
10:30 a.m. – 11:00 a.m.
Data Management Design Patterns
Presenter: Terrell Russell, iRODS Consortium | At the Western Digital/HGST Theater, booth #643
Description: The community of iRODS users has taught us about a variety of common use cases that tightly orbit around large data volumes, annotation, automation, auditing, and compliance issues.
Surveying diverse implementations is how mature domains refine their complex technologies into a common language. Design patterns are technology agnostic, structured descriptions of how a specific goal may be achieved.
This talk will describe our discovery process and share some Data Management Design Patterns as best open community practice. These design patterns can be used to clearly communicate organizational policy about your data lifecycle, both to others within an organization as well as to external interested parties.
********
10:30 a.m. – 11:30 a.m.
SAFE Superfacilities using ExoGENI, Chameleon, and ScienceDMZs
Presenters: Paul Ruth, Mert Cevik, and Cong Wang, RENCI; Yuanjun Yao, Qiang Cao, and Jeff Chase, Duke
Description: Currently, superfacilities are purpose-built manually for a specific scientific application or community. However, recent advances in Science DMZs and federated Infrastructure-as-a-Service (IaaS), as in the NSF testbeds GENI, Chameleon, and CloudLab, provide the technical building blocks to construct dynamic superfacilities on demand. Compute and storage resources can be provisioned from any testbed and connected with dynamic L2 circuits (e.g., as in GENI slice dataplanes).
In this demo, we use SAFE as a security building block for a virtual software-defined exchange (vSDX) that provides transit services for customer slices on the ExoGENI and Chameleon cloud testbeds. The customers are tenants that deploy their own slices, each with an independent network forwarding policy expressed in SAFE. The tenant slices express policy using SAFE in order to create network flows between each other.
********
11:30 a.m. – 12:30 p.m.
SWIP: Scientific Workflow Integrity with Pegasus
Presenters: Ilya Baldin and Anirban Mandal, RENCI; Ewa Deelman, Karan Vahi, and Mats Rynge, USC/ISI; Von Welch, Randy Heiland, Steve Myers and Omkar Bhide, IU
Description: We will present the Scientific Workflow Integrity with Pegasus (SWIP) project, which aims to enable more trustworthy science by adding cryptographic data integrity checking and provenance information to the scientific workflow. The project includes enhancements to the Pegasus data management layer, auditing of those enhancements by security researchers, and testing of the new capabilities in an ExoGENI testbed sandbox capable of introducing data integrity problems at the infrastructure level. We will demonstrate how data-integrity errors introduced by a novel “Chaos Jungle” software deployed on the ExoGENI infrastructure are detected by the new enhancements in Pegasus for workflow integrity-check checking.
********
11:30 a.m. – 11:45 a.m.
Data Management Design Patterns
Presenter: Terrell Russell, iRODS Consortium | At the DDN Theater, booth #1325
Description: The community of iRODS users has taught us about a variety of common use cases that tightly orbit around large data volumes, annotation, automation, auditing, and compliance issues.
Surveying diverse implementations is how mature domains refine their complex technologies into a common language. Design patterns are technology agnostic, structured descriptions of how a specific goal may be achieved.
This talk will describe our discovery process and share some Data Management Design Patterns as best open community practice. These design patterns can be used to clearly communicate organizational policy about your data lifecycle, both to others within an organization as well as to external interested parties.
********
12:30 p.m. – 1:00 p.m.
iRODS Overview
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Description: This talk will present an overview of iRODS, including an overview of its functions, architecture, use cases, and future directions. iRODS is open source data grid middleware that consolidates the management of heterogeneous data storage technologies. Equipped with configurable automation and metadata cataloging capabilities, over 100 PB of data is managed using iRODS worldwide. Example use cases include tracking gene sequencing workflows at several of the world’s preeminent research institutes and streaming terabytes of production video footage across the globe.
********
1:00 p.m. – 1:30 p.m.
Status and Demo of Kanki, an Open Source Cross-platform Native iRODS Client Application
Presenter: Ilari Korhonen, KTH Royal Institute of Technology
********
1:30 p.m. – 2:30 p.m.
Big Data, Analytics, and iRODS
Presenter: David Wade, IERA
Description: From the conceptual model of a seemingly uncomplicated device, this talk will elaborate, how, from a determined desire to answer one seemingly trivial question about the device’s behavior, unexpectedly (and unexceptionally) data derived in the analysis of the behavior grow out of bounds in both quantity and complexity, and presenting details in a design of a system, how IRODS would function in the system to aid and abet the collection, analysis, and storage of such data.
********
2:30 p.m. – 3:30 p.m.
SciDAS
Presenters: Claris Castillo, Ray Idaszak, RENCI; Alex Feltus, Clemson University
Description: The NSF CC*Data Scientific Data Analysis at Scale (SciDAS) (Award # 1659300) is a Cyberinfrastructure and Cloud agnostic software infrastructure with a main objective of improving flexibility and accessibility to the broad ecosystem of national CI and cloud resources to help researchers more effectively use them and improve scientific productivity.
SciDAS enables fluid access to multiple national CI resources, including NSF Cloud, Open Science Grid, Cloud providers (AWS and Azure), and campus resources via novel open-source based middleware technologies. Central to SciDAS is the use of the integrated Rule-Oriented Data System (iRODS) and the ExoGENI dynamic networked infrastructure to support network-aware data management decisions and efficient use of network resources across data and compute resources.
In this demo, the SciDAS team will show case how data intensive biology workflows and HydroShare – a data sharing platform for the hydrology community – use SciDAS to dynamically deploy and execute scientific applications across public Clouds (AWS and Azure), NSF Clouds, and campus resources.
********
3:30 p.m. – 4:00 p.m.
xDCI: Accelerating Discovery for Scientific Communities
Presenters: Ray Idaszak, RENCI
Description: xDCI – or {cross disciplinary} Data Cyberinfrastructure – is a complete and customizable solution for scientific communities looking to use data cyberinfrastructure to kick start their research and accelerate the discovery process. Unlike typical software stacks, xDCI enables researchers who are new to using data science cyberinfrastructure to start from “ground zero” and quickly ramp up to form vibrant, connected research communities. This presentation will offer an overview of the xDCI software stack as well as tips on how to get started with the platform.
********
4:00 p.m. – 4:30 p.m.
Object storage for IRODs
Presenter: David Tobin, HGST
********
4:30 p.m. – 5:30 p.m.
The South Big Data Regional Innovation Hub
Presenter: Stan Ahalt, RENCI
Description: The South Big Data Hub is part of a network of four regional Big Data Hubs, launched by the National Science Foundation and funded in part by host universities and other partners. Managed jointly by RENCI at the University of North Carolina at Chapel Hill and the Georgia Institute of Technology, the South Hub serves 16 states and the District of Columbia. This talk will provide an overview of Hub activities, milestones achieved, and information on how to get involved in the Hub community.
********
Data Management Design Patterns
SciDAS
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows
SAFE Superfacilities using ExoGENI, Chameleon, and ScienceDMZs
iRODS Demonstrations
10:30 a.m. – 11:00 a.m.
Data Management Design Patterns
Presenter: Terrell Russell, iRODS Consortium | At the Western Digital/HGST Theater, booth #643
Description: The community of iRODS users has taught us about a variety of common use cases that tightly orbit around large data volumes, annotation, automation, auditing, and compliance issues.
Surveying diverse implementations is how mature domains refine their complex technologies into a common language. Design patterns are technology agnostic, structured descriptions of how a specific goal may be achieved.
This talk will describe our discovery process and share some Data Management Design Patterns as best open community practice. These design patterns can be used to clearly communicate organizational policy about your data lifecycle, both to others within an organization as well as to external interested parties.
********
10:30 a.m. – 11:30 a.m.
SciDAS
Presenters: Claris Castillo, Ray Idaszak, RENCI; Alex Feltus, Clemson University
Description: The NSF CC*Data Scientific Data Analysis at Scale (SciDAS) (Award # 1659300) is a Cyberinfrastructure and Cloud agnostic software infrastructure with a main objective of improving flexibility and accessibility to the broad ecosystem of national CI and cloud resources to help researchers more effectively use them and improve scientific productivity.
SciDAS enables fluid access to multiple national CI resources, including NSF Cloud, Open Science Grid, Cloud providers (AWS and Azure), and campus resources via novel open-source based middleware technologies. Central to SciDAS is the use of the integrated Rule-Oriented Data System (iRODS) and the ExoGENI dynamic networked infrastructure to support network-aware data management decisions and efficient use of network resources across data and compute resources.
In this demo, the SciDAS team will show case how data intensive biology workflows and HydroShare – a data sharing platform for the hydrology community – use SciDAS to dynamically deploy and execute scientific applications across public Clouds (AWS and Azure), NSF Clouds, and campus resources.
********
11:30 a.m. – 12:30 p.m.
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows
Presenters: Anirban Mandal and Cong Wang, RENCI; Ewa Deelman and Rafael Ferreira da Silva, USC/ISI; Mariam Kiran, LBNL; Vickie Lynch, ORNL
Description: We will present the DOE Panorama 360 project, which aims to provide a resource for the collection, analysis, and sharing of performance data about end-to-end scientific workflows executing on DOE facilities. We are developing a repository and associated capabilities for data collection, ingestion, and analysis for a broad class of DOE applications that span experimental and simulation science workflows.
In particular, our work focuses on workflows that include experimental data generation at DOE facilities and include: 1) a distributed repository for different types of data (point measurements, time series, traces); 2) a set of data capture, curation, and publishing tools that can be used to interact with the repository; 3) a set of analysis, algorithms, and machine learning-based tools to analyze and characterize the data; and 4) best practices and recommendations for workflow evaluation, analysis, execution, and architectures.
********
12:30 p.m. – 1:30 p.m.
SAFE Superfacilities using ExoGENI, Chameleon, and ScienceDMZs
Presenters: Paul Ruth, Mert Cevik, and Cong Wang, RENCI; Yuanjun Yao, Qiang Cao, and Jeff Chase, Duke
Description: Currently, superfacilities are purpose-built manually for a specific scientific application or community. However, recent advances in Science DMZs and federated Infrastructure-as-a-Service (IaaS), as in the NSF testbeds GENI, Chameleon, and CloudLab, provide the technical building blocks to construct dynamic superfacilities on demand. Compute and storage resources can be provisioned from any testbed and connected with dynamic L2 circuits (e.g., as in GENI slice dataplanes).
In this demo, we use SAFE as a security building block for a virtual software-defined exchange (vSDX) that provides transit services for customer slices on the ExoGENI and Chameleon cloud testbeds. The customers are tenants that deploy their own slices, each with an independent network forwarding policy expressed in SAFE. The tenant slices express policy using SAFE in order to create network flows between each other.
********
1:30 p.m. – 2:30 p.m.
iRODS Demonstrations
Presenters: Jason Coposky and Terrell Russell, iRODS Consortium
Description: Using material from recently presented workshops, we will demonstrate several key features that have made iRODS a critical technology for research organizations worldwide. We will show how storage resource composition makes it easy to distribute and replicate data across multiple file systems; how the iRODS rule engine can automate searchable metadata annotation; and how federation can be allows users to access data at remote sites through a common interface.
Show floor closes at 3 p.m.