SC18 Booth Schedule

DALLAS, Texas – Presentations about key RENCI projects – including iRODS and the iRODS Consortium, ExoGENI and advanced networking, the National Consortium for Data Science (NCDS), and the South Big Data Hub (SBDH) – will be featured in the RENCI exhibit at SC18. We will also offer information and presentations on new initiatives, such as Comet and a Pilot Study for a Cyberinfrastructure Center of Excellence.

SC18 is the world’s premier conference for high performance computing, networking, storage, and analysis, attracting more than 10,000 participants each year. The RENCI booth (#2238) will open at 7 p.m. Monday, Nov. 12 for a pre-conference gala and sneak peek at booth offerings. In addition to the Monday night opening gala, exhibit hours for the conference will be 10 a.m. – 6 p.m. Tuesday, Nov. 13 and Wednesday, Nov. 14, and 10 a.m. – 3 p.m. Thursday, Nov. 15.

Look for updates about our activities at SC18 on social media!

10:30 – 11:30
A Scalable HPC System Architecture Support for iRODS Data Ingest

Dave Wade (Integral Engineering)

Description: Integral Engineering is developing an innovative data ingest engine to a cluster which is optimized for highly scalable performance and throughput using COTS HPC system hardware and COTS HPC software libraries. We show how iRODS data automated ingest may be integrated into IERA’s hyper-visor based engine to further optimize the orderly cataloging of metadata and collection of data for analysis.


11:20 – 12:00
SC18 Cloud HPC Hack

Terrell Russell (The iRODS Consortium) @ the AC Hotel

Visit http://hackhpc.org/ for more information.


11:30 – 12:30
iRODS Capability: Automated Ingest

Alan King (The iRODS Consortium)

Description: This demo will show how iRODS can be configured to ingest data and watch for changes. Combined with additional policy, this capability can be used to clean data, stage data, and keep the catalog up to date.


2:00 – 2:20
Managing Data from the Edge to HPC

Terrell Russell (The iRODS Consortium) @ Western Digital #3901

Description: Data management has historically started at the point of ingest where users manually placed data into a system. While this process was sufficient for a while, the volume and velocity of automatically created data coming from sequencers, satellites, and microscopes have overwhelmed existing systems. In order to meet these new requirements, the point of ingest must be moved closer to the point of data creation.

iRODS now supports packaged capabilities which implement the necessary automation to scale ingest horizontally. This shifts the application of data management policy directly to the edge.

Once your data resides in the iRODS namespace, additional capabilities such as storage tiering, data integrity, and auditing may be applied. This includes tiering data to scratch storage for analysis, archiving, and collaboration. The combination of these capabilities plus additional policy allow for the implementation of the data to compute pattern which can be tailored to meet your specific use cases.


2:30 – 3:30
Pilot Study for a Cyberinfrastructure Center of Excellence (CI CoE)

Ewa Deelman (USC/ISI), Anirban Mandal (RENCI), Ilya Baldin (RENCI), Valerio Pascucci (Univ. of Utah), Rob Ricci (Univ. of Utah), Jaroslaw Nabrzyski (Notre Dame), Jane Wyngaard (Notre Dame), Susan Sons (IU), Karan Vahi (USC/ISI), Mats Rynge (USC/ISI), Erik Scott (RENCI), Paul Ruth (RENCI)

Description: The goal of the CI CoE Pilot project is to develop a model and a plan for a Cyberinfrastructure (CI) Center of Excellence that can be a core for knowledge sharing and community building about existing cyberinfrastructure models and best practices. Our approach is to (a) build a community centered around CI for NSF Large Facilities (LFs), (b) create a community-curated portal for the knowledge sharing, (c) define a structure to address workforce development, training, retention, career paths, and diversity of CI personnel, and (d) engage with NEON to pilot the effort.


3:30 – 4:30
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows

Anirban Mandal (RENCI), Ewa Deelman (USC/ISI), Jeff Vetter (ORNL), Mariam Kiran (LBNL), Cong Wang (RENCI), George Papadimitriou (USC/ISI), Rafael Ferreira da Silva (USC/ISI), Karan Vahi (USC/ISI), Rajiv Mayani (USC/ISI), Vickie Lynch (ORNL)

Description: We will present the DOE Panorama 360 project, which aims to provide a resource for the collection, analysis, and sharing of performance data about end-to-end scientific workflows executing on DOE facilities. We are developing a repository and associated capabilities for data collection, ingestion, and analysis for a broad class of DOE applications that span experimental and simulation science workflows. In particular, our work focuses on workflows that include experimental data generation at DOE facilities and include: 1) a distributed repository for different types of data (point measurements, time series, traces), 2) a set of data capture, curation, and publishing tools that can be used to interact with the repository, 3) a set of analysis, algorithms, and machine learning-based tools to analyze and characterize the data, and 4) best practices and recommendations for workflow evaluation, analysis, execution, and architectures. We will have three sections in the presentation – (1) Overview of Panorama 360 project (20 minutes), (2) Demonstration of repository (10 mins), and (3) Presentation on ML methods for anomaly detection (30 mins).

10:30 – 11:30
SWIP – Scientific Workflow Integrity with Pegasus

Ilya Baldin (RENCI), Anirban Mandal (RENCI), Von Welch (IU), Randy Heiland (IU), Raquel Loran (IU), Omkar Bhide (IU), Ewa Deelman (USC/ISI), Karan Vahi (USC/ISI), Mats Rynge (USC/ISI)

Description: We think of binary data as, well, binary. But in reality, our data is subject to both random and malicious threats to its integrity, meaning we cannot blindly trust in its immutability. The Scientific Workflow Integrity with Pegasus (SWIP) project, funded by the NSF, is improving the security and integrity of scientific data by integrating cryptographic integrity checking and provenance information into the Pegasus workflow management system (WMS). Complex workflows are commonplace in computational science and engineering applications, e.g., one project using Pegasus is LIGO (the Laser Interferometer Gravitational-Wave Observatory), which announced in early 2016 the first direct detection of gravitational waves. The project is also developing the Chaos Jungle virtualized infrastructure as an integrity testbed to validate its approach, and also conducting research into blockchain as a high-assurance method of storing integrity data. Project leaders from ISI/USC, IU, and RENCI will present.


10:45 – 11:00
Managing Data from the Edge to HPC

Terrell Russell (The iRODS Consortium) @ DDN #3213

Description: Data management has historically started at the point of ingest where users manually placed data into a system. While this process was sufficient for a while, the volume and velocity of automatically created data coming from sequencers, satellites, and microscopes have overwhelmed existing systems. In order to meet these new requirements, the point of ingest must be moved closer to the point of data creation.

iRODS now supports packaged capabilities which implement the necessary automation to scale ingest horizontally. This shifts the application of data management policy directly to the edge.

Once your data resides in the iRODS namespace, additional capabilities such as storage tiering, data integrity, and auditing may be applied. This includes tiering data to scratch storage for analysis, archiving, and collaboration. The combination of these capabilities plus additional policy allow for the implementation of the data to compute pattern which can be tailored to meet your specific use cases.


11:30 – 12:30
SAFE Superfacilities using ExoGENI, Chameleon, and ScienceDMZs

Yuanjun Yao (Duke), Qiang Cao (Duke), Cong Wang (RENCI), Mert Cevik (RENCI), Paul Ruth (RENCI), Jeff Chase (Duke)

Description: Currently, superfacilities are purpose-built manually for a specific scientific application or community. However, recent advances in Science DMZs and federated Infrastructure-as-a-Service (IaaS), as in the NSF testbeds GENI, Chameleon, and CloudLab, provide the technical building blocks to construct dynamic superfacilities on demand. Compute and storage resources can be provisioned from any testbed and connected with dynamic L2 circuits (e.g., as in GENI slice dataplanes). In this demo, we use SAFE as a security building block for a virtual software-defined exchange (vSDX) that provides transit services for customer slices on the ExoGENI and Chameleon cloud testbeds. The customers are tenants that deploy their own slices, each with an independent network forwarding policy expressed in SAFE. The tenant slices express policy using SAFE in order to create network flows between each other.


12:00 – 12:20
Managing Data from the Edge to HPC

Terrell Russell (The iRODS Consortium) @ DDN #3901

Description: Data management has historically started at the point of ingest where users manually placed data into a system. While this process was sufficient for a while, the volume and velocity of automatically created data coming from sequencers, satellites, and microscopes have overwhelmed existing systems. In order to meet these new requirements, the point of ingest must be moved closer to the point of data creation.

iRODS now supports packaged capabilities which implement the necessary automation to scale ingest horizontally. This shifts the application of data management policy directly to the edge.

Once your data resides in the iRODS namespace, additional capabilities such as storage tiering, data integrity, and auditing may be applied. This includes tiering data to scratch storage for analysis, archiving, and collaboration. The combination of these capabilities plus additional policy allow for the implementation of the data to compute pattern which can be tailored to meet your specific use cases.


12:30 – 1:30
Utilizing iRODS to Automate the Workflow in High Performance Research Computing

Dave Fellinger (The iRODS Consortium)

Early supercomputing environments had relatively simple data workflow requirements. Parallel applications were loaded onto the machine from a supplemental network and the simulation or visualization result was written to an adjacent file system. Checkpoints for long runs were also written so the data traffic was 90% or more WRITE. The growth in the use of digital instrumentation and sensors changed that paradigm. It is now necessary to ingest huge amounts of data, migrate that data to a scratch file system adjacent to the compute cluster, trigger the mechanisms in the scheduler to begin an operation, then migrate the result to a distribution file system. The supercomputer is just a step in a complex workflow. In this compute paradigm of “big data” collections, iRODS can enable complete workflow control, data lifecycle management, and present discoverable data sets with assured traceability and reproduceability.


1:30 – 2:30
iRODS Capablity: Storage Tiering

Justin James (The iRODS Consortium)

Description: This demo will show how iRODS can be configured to control data movement around the enterprise, defined only by resource hierarchies and metadata.


2:30 – 3:30
DyNamo – Delivering a Dynamic Network-centric Platform for Data-driven Science

Anirban Mandal (RENCI), Ewa Deelman (USC/ISI), Michael Zink (UMass), Ivan Rodero (Rutgers), Cong Wang (RENCI), Komal Thareja (RENCI), Paul Ruth (RENCI), George Papadimitriou (USC/ISI), Eric Lyons (UMass), JJ Villalobos (Rutgers)

Description: The goal of DyNamo is to develop a novel network-centric platform that will enable high-performance, adaptive data flows and coordinated access to multi-campus CI facilities and community data repositories for observational science workflows. Our platform will be coupled with the Pegasus workflow management system. We will deploy our solutions for workflows from the following applications – Ocean Observatory Initiative (OOI), and Collaborative and Adaptive Sensing of Atmosphere (CASA). We will present our project followed by a demonstration of integration of a representative CASA workflow, modeled by Pegasus, which leverages resources from ExoGENI and Chameleon.


3:30 – 4:30
COMET: A Distributed Meta-data Service for Federated Cloud Infrastructure

Cong Wang (RENCI), Komal Thareja (RENCI), Ilya Baldin (RENCI)

Description: COMET is a meta-data system developed for multi-cloud environments. It provides simple and flexible authorization and is independent of any cloud provider. Using COMET, cloud tenants can configure individual compute instances or distributed applications deployed across multiple cloud providers. The demo describes the capabilities of the COMET system and demonstrates its use on a deployment of a compute cluster across ExoGENI and Chameleon cloud systems.

10:30 – 11:30
iRODS Scalable RDBMS Table View Construction

Dave Wade (Integral Engineering)

Description: We propose a technique in RDBMS programming to employ IRODS to simulate the join of column table data from independent RDBMS instances into a single unified view across a compute cluster suitable for analysis.


11:30 – 12:30
iRODS Capability: Automated Ingest

Alan King (The iRODS Consortium)

Description: This demo will show how iRODS can be configured to ingest data and watch for changes. Combined with additional policy, this capability can be used to clean data, stage data, and keep the catalog up to date.


1:30 – 2:30
Using iRODS for the Brain Image Library and HuBMAP HIVE

Derek Simmel

Funded by the U.S. National Institute of Health (NIH), the Brain Image Library (BIL) is a 10PB brain image data archive with metadata search capability currently under construction at the Pittsburgh Supercomputing Center (PSC). BIL is using iRODS to manage metadata for the archive, and to facilitate various forms of remote access and query capabilities into BIL. PSC, together with the University of Pittsburgh Department of Bioinformatics, is also participating in the NIH Human Biomolecular Atlas Program (HuBMAP) Integration, Visualization and Engagement (HIVE) collaboratory. PSC is building storage and metadata management infrastructure with iRODS to support data archiving and metadata management for high-resolution, 3D biomolecular tissue data for HIVE.