Visit booth #781 to discover:
- The newest advances from RENCI’s networking and research infrastructure group, including the newly-awarded $20 million NSF grant, FABRIC, and major projects such as DyNamo, IRIS, and Panorama 360.
- Updates on iRODS, open source data management software used by research, commercial, and governmental organizations worldwide.
- How to get involved in national data science initiatives, such as the South Big Data Hub and the National Consortium for Data Science (NCDS).
SC19 is the world’s premier conference for high performance computing, networking, storage, and analysis, attracting more than 10,000 participants each year. The RENCI booth will open at 7 p.m. Monday, Nov. 18 for a pre-conference gala and sneak peek at booth offerings. In addition to the Monday night opening gala, exhibit hours for the conference will be 10 a.m. – 6 p.m. Tuesday, Nov. 19 and Wednesday, Nov. 20, and 10 a.m. – 3 p.m. Thursday, Nov. 21.
Look for updates about our activities at SC19 on social media!
10:30 – 11:00
A Secure SCADA System with InfiniBand and iRODS
David Wade (Integral Engineering)
Description: Current deployed SCADA systems suffer from major security design flaws. Most are also highly complex, proprietary, and employ design criteria which are antiquated. A robust, secure SCADA Design is presented using HPC cluster techniques, InfiniBand and IRODS.
11:30 – 12:00
iRODS S3 Resource Plugin: Cacheless and Detached Mode
Justin James (The iRODS Consortium)
Description: The iRODS S3 Resource Plugin is able to be configured without being a child of a compound resource and without a sibling cache resource. This standalone functionality significantly reduces administrative overhead.
11:45 – 12:45
DyNamo for Real-Time Weather Forecasting Workflows
Part of the SCinet Technology Challenge @ SC Theater booth (#981) next to the SCinet booth
Team: Cong Wang, Komal Thareja, Paul Ruth, Anirban Mandal (RENCI/UNC Chapel Hill); George Papadimitriou, Ewa Deelman (USC/ISI); Eric Lyons, Mike Zink (UMass Amherst); JJ Villalobos, Ivan Rodero (Rutgers University)
Description: Computational science today depends on many complex, data-intensive applications operating on distributed datasets that originate from a variety of scientific instruments and data repositories. A major challenge for these applications is the integration of data into the scientists’ workflow. In this demo, we present DyNamo, a network-centric cloud platform that facilitates inter-domain science workflows using SCinet and its partner network providers’ high performance network. We bring together two research cyberinfrastructures — Chameleon Cloud and ExoGENI, to instrument a real-time weather forecasting application — Collaborative and Adaptive Sensing of the Atmosphere (CASA). We show that the high performance networking and distributed, dynamic computing infrastructures can significantly improve CASA application performance, as well as enhance the effectiveness of workflow executions.
1:00 – 1:30
Data Locality is Key to Accelerating HPC & AI workloads
Glenn Haley (Cloudian, Director of Product Management)
Description: HPC is about generating lots of data; AI is about searching and inferring insights from massive amounts of data. Running AI workloads in a disaggregated manner at the edge where the data objects reside is the most efficient approach when processing data-intensive workloads. Utilizing a scale-out and distributed architecture such as the Cloudian HyperStore is the most S3 compatible and cost-effective solution.
2:00 – 2:30
Storage Systems and Performance – 3 Criteria to Consider
Jean-François Smiglieski (OpenIO, CTO) and Romain Acciari (OpenIO, Head of Industrialization)
Description: It’s easy to exaggerate performance when talking about data storage. It’s not so easy to objectively compare the performance of different types of storage systems. And for good reason: performance, when it relates to data storage, is understood in at least three dimensions (capacity, bandwidth and latency), each one responding to specific use cases. Not many organizations can afford to store their data without limitations, and the highest price does not always guarantee the best performance. Limitations appear quickly if the usage is too different from what the system was originally designed to do. Storage system performance is a blurred concept, and companies need to realize this to see through the overly simplistic marketing that can bias their choices.
3:00 – 3:30
What is FABRIC?
Ilya Baldin (RENCI)
Description: Learn about FABRIC, an NSF-funded grant that will create a unique national-scale research infrastructure to enable cutting-edge and exploratory at-scale research in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications.
FABRIC will be an ‘everywhere-programmable’ nationwide instrument comprised of novel extensible network elements equipped with large amounts of compute and storage, interconnected by high speed, dedicated optical links. It will connect a number of specialized testbeds (5G/IoT PAWR, NSF Clouds) and HPC (High-Performance Computing) facilities to create a rich fabric for a wide variety of experimental activities. Discover how you can get involved now.
3:30 – 4:00
ESnet6 High Touch Services: Real-Time Precision Telemetry and ML-based TCP Classification
Richard Cziva (ESnet)
Description: ESnet6’s HighTouch Services will allow programmable network services to be deployed at over 30 locations of the next-generation ESnet network. One such service will be gathering precise, per-packet telemetry information using P4-programmable telemetry producers and a high-performance collector and storage architecture.
This presentation shows our motivation, overall architecture, system components and presents visualization and analysis of the collected data. Additionally, we we will present a way to use the collected telemetry data to identify TCP congestion control algorithms with 98% of accuracy and as few as 5-20 packet observations.
4:30 – 5:00
Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows
George Papadimitriou (USC/ISI), Cong Wang (RENCI), Karan Vahi (USC/ISI), Rafael Ferreira da Silva (USC/ISI), Rajiv Mayani (USC/ISI), Anirban Mandal (RENCI), Mariam Kiran (LBL), Jeffrey Vetter (ORNL), Ewa Deelman (USC/ISI)
Description: We will present our accomplishments from the DOE Panorama 360 project. With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand the impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, and optimization of these next generation systems and solutions.
We will present the architecture, integration with existing well-established and newly developed tools, to collect online performance statistics of workflow executions from various, heterogeneous sources and publish them in a distributed database. Using this architecture, we are able to correlate online workflow performance data with data from the underlying infrastructure, and present them in a useful and intuitive way via an online dashboard. Based on the data collected in Elasticsearch, we are able to demonstrate that we can correctly identify anomalies. Using the data, we will also present machine learning analysis for detection of network problems and corresponding diagnosis.
4:30 – 5:00
iRODS Beyond Discoverability @ SUSE booth #1917
Terrell Russell (The iRODS Consortium)
Description: As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories. Many products and approaches now provide data discoverability through indexing and aggregate counts, but few also provide the level of confidence needed for making strong assertions about data provenance. For that, a system needs policy to be enforced; a model for data governance that provides understanding about what is in the system and how it came to be.
With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance. Data management should be data centric, and metadata driven.
10:30 – 11:00
Cyberinfrastructure Center of Excellence Pilot
Ewa Deelman (USC/ISI), Anirban Mandal (RENCI), Jaroslaw Nabrzyski (UND), Valerio Pascucci (U. Utah), Robert Ricci (U. Utah), Mats Rynge (USC/ISI), Ilya Baldin (RENCI), Jane Wyngard (UND), Steve Petruzza (U. Utah), Susan Sons (IU), Rafael Ferreira da Silva (USC/ISI), Karan Vahi (USC/ISI), Loic Pottier (USC/ISI), Laura Christopherson (RENCI), Erik Scott (RENCI), Charles Vardeman (UND), Marina Kogan (U. Utah)
Description: NSF’s major multi-user research facilities (large facilities) are sophisticated research instruments and platforms – such as large telescopes, interferometers and distributed sensor arrays – that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) – computing, data and software systems, networking, and associated human capital – to enable broad delivery and analysis of facility-generated data. As a result of these cyberinfrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades.
The goal of the CICoE Pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CICoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI. The pilot project is exploring how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. In this talk, we will present our accomplishments and plans for the CICoE Pilot project.
11:30 – 12:00
Integrity Introspection for Scientific Workflows (IRIS)
Yufeng Xin (RENCI), Erica Fu (RENCI), Anirban Mandal (RENCI), Mats Rynge (USC/ISI), Karan Vahi (USC/ISI), Ryan Tanaka (USC/ISI), Ewa Deelman (USC/ISI), Ishan Abhinit (IU), Von Welch (IU)
Description: We will report our accomplishments on the NSF IRIS project. The goal of our work is to detect, diagnose, and pinpoint the source of unintentional integrity anomalies in scientific workflow executions on distributed cyberinfrastructure.
1:00 – 1:30
NFSRODS: Presenting iRODS as NFSv4.1
Kory Draughn (The iRODS Consortium)
Description: NFSRODS is an iRODS Client that presents iRODS as NFSv4.1. This client handles multi-owner mapping to POSIX permissions and has been deployed within an enterprise environment.
1:00 – 1:15
iRODS Beyond Discoverability @ DDN booth #617
Terrell Russell (The iRODS Consortium)
Description: As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories. Many products and approaches now provide data discoverability through indexing and aggregate counts, but few also provide the level of confidence needed for making strong assertions about data provenance. For that, a system needs policy to be enforced; a model for data governance that provides understanding about what is in the system and how it came to be.
With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance. Data management should be data centric, and metadata driven.
2:00 – 2:30
SUSE and iRODS
David Byte (SUSE)
Description: SUSE Enterprise Storage is an enterprise distribution of Ceph that provides block, file, and object interfaces with a cost that rivals that of the least expensive cloud services. This session will provide a brief introduction of SUSE Enterprise Storage, some applicable use cases, and the interfaces utilized with iRODS.
3:00 – 3:30
What is FABRIC?
Ilya Baldin (RENCI)
Description: Learn about FABRIC, an NSF-funded grant that will create a unique national-scale research infrastructure to enable cutting-edge and exploratory at-scale research in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications.
FABRIC will be an ‘everywhere-programmable’ nationwide instrument comprised of novel extensible network elements equipped with large amounts of compute and storage, interconnected by high speed, dedicated optical links. It will connect a number of specialized testbeds (5G/IoT PAWR, NSF Clouds) and HPC (High-Performance Computing) facilities to create a rich fabric for a wide variety of experimental activities. Discover how you can get involved now.
4:30 – 5:00
iRODS S3 Resource Plugin: Cacheless and Detached Mode
Justin James (The iRODS Consortium)
Description: The iRODS S3 Resource Plugin is able to be configured without being a child of a compound resource and without a sibling cache resource. This standalone functionality significantly reduces administrative overhead.
10:30 – 11:00
iRODS for PBSpro
Dave Wade (Integral Engineering)
Description: PBSpro, an open source scheduler for cluster computing, is a standard for scheduling resources on supercomputers. Submitted jobs run when licenses, processors, cores, networks are ready. Jobs may still fail, however, if the data resources on files systems are not available, become not available, fill up, or are removed. A mechanism to integrate iRODS commands with PBSpro is presented to ameliorate these concerns. Along with processors, cores, etc., a user’s job runs only when the data is ready, according to iRODS rules.
11:30 – 12:00
What is FABRIC?
Paul Ruth (RENCI)
Description: Learn about FABRIC, an NSF-funded grant that will create a unique national research infrastructure to enable cutting-edge and exploratory at-scale research in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications.
FABRIC will be an ‘everywhere-programmable’ nationwide instrument comprised of novel extensible network elements equipped with large amounts of compute and storage, interconnected by high speed, dedicated optical links. It will connect a number of specialized testbeds (5G/IoT PAWR, NSF Clouds) and HPC (High-Performance Computing) facilities to create a rich fabric for a wide variety of experimental activities. Discover how you can get involved now.
1:00 – 1:30
NFSRODS: Presenting iRODS as NFSv4.1
Kory Draughn (The iRODS Consortium)
Description: NFSRODS is an iRODS Client that presents iRODS as NFSv4.1. This client handles multi-owner mapping to POSIX permissions and has been deployed within an enterprise environment.