Monday, November 17
7 p.m. – 9 p.m.
Welcome SC14 attendees! The RENCI booth tonight will feature information on:
- Advanced networking
- Networking Infrastructure as a Service (NIaaS)
- The iRODS data management platform and the iRODS Consortium
- The National Consortium for Data Science (NCDS)
- Data science research projects at RENCI
Tuesday, November 18
10:30 a.m. – 11:30 a.m.
iRODS Overview and Update
Presenters: The iRODS@RENCI team
Description: This talk will present an overview of the integrated Rule-Oriented Data System, or iRODS (http://irods.org), including an overview of its functions, architecture, current use cases, and future directions. iRODS is open source data grid middleware that consolidates the management of heterogeneous data storage technologies. Equipped with configurable automation and metadata cataloging capabilities, iRODS manages over 100 PB of data worldwide. Example use cases include tracking gene sequencing workflows at several of the world’s preeminent research institutes and streaming terabytes of production video footage across the globe.
11:30 a.m. – 12:30 p.m.
iRODS Demonstrations
Presenters: The iRODS@RENCI team
Description: Members of the RENCI iRODS development team will demonstrate key iRODS features that have made this data management software critical to genomics sequencing centers, research data repositories, and many other organizations seeking to manage large amounts of data. Drawing from the experiences of active iRODS users, the team will demonstrate workflows that process, transmit, analyze, review, present, and archive data. They will show how iRODS maintains data integrity across a distributed grid, enforces access control, enables collaboration among workgroups, and maintains state and provenance information through metadata. These demonstrations will help data users explore new data management capabilities made possible with iRODS.
1:30 p.m. – 2:30 p.m.
Managing Dynamic Networked Cloud Infrastructure for Data Driven Scientific Workflows Using Proactive Introspection
Presenters: Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, and Claris Castillo, RENCI; Gideon Juve, Mats Rynge, and Ewa Deelman, University of Southern California Information Sciences Institute; Jeff Chase, Duke University
Description: This demonstration will showcase a novel, dynamically adaptable cloud infrastructure driven by the demands of a data-driven scientific workflow. It will use resources from ExoGENI – a Networked Infrastructure-as-a-Service (NIaaS) test bed funded through the National Science Foundation’s Global Environment for Network Innovation (GENI) project. The demonstration will connect compute and data resources in the RENCI SC14 booth to a large, dynamically provisioned ‘slice’ spanning multiple ExoGENI cloud sites interconnected using dynamically provisioned connections from Internet2 and ESnet. The slice will be used to execute a scientific workflow driven from a computer in the RENCI SC14 booth connected to the slice via SCinet. The demonstration team will show the features of “ShadowQ,” an entity that predicts the future resource needs of a workflow and runs alongside the Pegasus workflow management system. This workflow introspection feature will be used to adapt the slice to the demands of the workflow as it executes by adding on-ramps and adjusting the amount of resources used. The team will use a genomic workflow as a driving example to demonstrate how workflows can leverage NIaaS systems, how these workflow applications can provision resources automatically, and how they can be executed and monitored end-to-end on dynamic NIaaS.
2:30 p.m. – 3:30 p.m.
Executing Genomic Workflows on Federated IaaS Technology
Presenters: Charles Schmitt, Fan Jiang, Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, Claris Castillo, RENCI; Gideon Juve, Mats Rynge, Ewa Deelman, University of Southern California Information Sciences Institute; Jeff Chase, Duke University
Description: Research with large-scale genomic data involves intensive computations, orchestration of numerous long-running workflows, and data-centric collaborations with experts worldwide. In this work, the researchers explore the use of federated Infrastructure-as-a-Service (IaaS) technology as an enabling technology. The benefits of moving to the use of federated IaaS technology for genomic research are multiple. Most importantly, this technology provides capabilities that allow researchers to write workflows in a general way and then port them onto federated NIaaS on demand. This work will demonstrate running genomic workflows through Pegasus and Condor on the ExoGENI infrastructure. It will also use novel NIaaS features and support software to execute, adapt and monitor genomic workflows.
3:30 p.m. – 4:30 p.m.
iRODS Workshop
Presenters: The iRODS@RENCI team
Description: Bring a friend. Bring a VM! This workshop is an opportunity to explore the distributed architecture and powerful features of iRODS. Through hands-on exercises, the iRODS development team will walk participants through the various ways to automate the management of big data and metadata. Come to ask questions, come to learn, come to watch some typing in a terminal. This workshop provides the opportunity to experiment with iRODS in real time, alongside the people who develop iRODS.
4:30 p.m. – 5 p.m.
Data Bridge: Building Connections in Scientific Data Sets
Presenter: Howard Lander
Description: The DataBridge project is a National Science Foundation-funded effort to connect “dark” scientific data in a sociometric network. DataBridge uses standard and community-provided algorithms in network analysis and similarity detection to enable scientists to discover connections between data sets and maximize the utility of data sets that are otherwise difficult to find.
5 p.m. – 5:30 p.m.
Data Science@RENCI
Informational slide presentation
Description: RENCI, a research institute of UNC-Chapel Hill with links to Duke University and North Carolina State University, provides cyber tools and technologies that enable better research and business innovation. A key focus at RENCI is developing software and other tools that help researchers access, share, manage, analyze and archive data in order to make research discoveries and ensure that research data is accessible by future scientists and scientists in different domains. This presentation offers highlights of some of RENCI’s key data-related efforts, including the National Science Foundation-funded DataBridge and DataNet Federation Consortium, the Secure Medical Workspace, and the iRODS Consortium.
5:30 p.m. – 6 p.m.
The National Consortium for Data Science
Informational slide presentation
Description: The National Consortium for Data Science (NCDS) launched in April 2013 as public-private partnership to address the challenges and opportunities posed by massive data sets being created by digital medicine, environmental sensors, scientific instruments, social networks, and more. Its goals include: encouraging collaboration among industry, academia and government on data science research and problem solving; providing members with access to experts in other fields and domains to help address their data challenges; encouraging data science research that spans academia, industry and government; facilitating improved data science education; and supporting technical, ethical and policy standards for data. Members include research universities in North Carolina, Drexel University, Cisco, Deloitte LLP, GE, IBM, MCNC and RTI International. This presentation provides an overview of the benefits of membership and NCDS activities aimed at advancing data science.
Wednesday, November 19
10:30 a.m. – 11:30 a.m.
iRODS Overview and Update
Presenters: The iRODS@RENCI team
Description: This talk will present an overview of the integrated Rule-Oriented Data System, or iRODS (http://irods.org), including an overview of its functions, architecture, current use cases, and future directions. iRODS is open source data grid middleware that consolidates the management of heterogeneous data storage technologies. Equipped with configurable automation and metadata cataloging capabilities, iRODS manages over 100 PB of data worldwide. Example use cases include tracking gene sequencing workflows at several of the world’s preeminent research institutes and streaming terabytes of production video footage across the globe.
11:30 a.m. – 12:30 p.m.
iRODS Demonstrations
Presenters: The iRODS@RENCI team
Description: Members of the RENCI iRODS development team will demonstrate key iRODS features that have made this data management software critical to genomics sequencing centers, research data repositories, and many other organizations seeking to manage large amounts of data. Drawing from the experiences of active iRODS users, the team will demonstrate workflows that process, transmit, analyze, review, present, and archive data. They will show how iRODS maintains data integrity across a distributed grid, enforces access control, enables collaboration among workgroups, and maintains state and provenance information through metadata. These demonstrations will help data users explore new data management capabilities made possible with iRODS.
1:30 p.m.– 2:30 p.m.
Managing Dynamic Networked Cloud Infrastructure for Data Driven Scientific Workflows Using Proactive Introspection
Presenters: Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, and Claris Castillo, RENCI; Gideon Juve, Mats Rynge, and Ewa Deelman, University of Southern California Information Sciences Institute; Jeff Chase, Duke University
Description: Research with large-scale genomic data involves intensive computations, orchestration of numerous long-running workflows, and data-centric collaborations with experts worldwide. In this work, the researchers explore the use of federated Infrastructure-as-a-Service (IaaS) technology as an enabling technology. The benefits of moving to the use of federated IaaS technology for genomic research are multiple. Most importantly, this technology provides capabilities that allow researchers to write workflows in a general way and then port them onto federated NIaaS on demand. This work will demonstrate running genomic workflows through Pegasus and Condor on the ExoGENI infrastructure. It will also use novel NIaaS features and support software to execute, adapt and monitor genomic workflows.
2:30 p.m. – 3:30 p.m.
Executing Genomic Workflows on Federated IaaS Technology
Presenters: Charles Schmitt, Fan Jiang
Description: Research with large-scale genomic data involves intensive computations, orchestration of numerous long-running workflows, and data-centric collaborations with experts worldwide. In this work, we explore the use of federated Infrastructure-as-a-Service (IaaS) technology as an enabling technology. The benefits of moving to the use of federated IaaS technology for genomic research are multiple but have to date lacked underlying capabilities that allow researchers to write workflows in a general way that can be ported onto on demand, federated IaaS. In this work, we demonstrate running genomic workflows through Pegasus and Condor on the ExoGENI infrastructure.
3:30 p.m. – 4 p.m.
The National Consortium for Data Science
Informational slide presentation
Description: The National Consortium for Data Science (NCDS) launched in April 2013 as public-private partnership to address the challenges and opportunities posed by massive data sets being created by digital medicine, environmental sensors, scientific instruments, social networks, and more. Its goals include: encouraging collaboration among industry, academia and government on data science research and problem solving; providing members with access to experts in other fields and domains to help address their data challenges; encouraging data science research that spans academia, industry and government; facilitating improved data science education; and supporting technical, ethical and policy standards for data. Members include research universities in North Carolina, Drexel University, Cisco, Deloitte LLP, GE, IBM, MCNC and RTI International. This presentation provides an overview of the benefits of membership and NCDS activities aimed at advancing data science.
4:30 p.m. – 5 p.m.
Data Bridge: Building Connections in Scientific Data Sets
Presenter: Howard Lander
Description: The DataBridge project is a National Science Foundation-funded effort to connect “dark” scientific data in a sociometric network. DataBridge uses standard and community-provided algorithms in network analysis and similarity detection to enable scientists to discover connections between data sets and maximize the utility of data sets that are otherwise difficult to find.
5 p.m. – 5:30 p.m.
Data Science@RENCI
Informational slide presentation
Description: RENCI, a research institute of UNC-Chapel Hill with links to Duke University and North Carolina State University, provides cyber tools and technologies that enable better research and business innovation. A key focus at RENCI is developing software and other tools that help researchers access, share, manage, analyze and archive data in order to make research discoveries and ensure that research data is accessible by future scientists and scientists in different domains. This presentation offers highlights of some of RENCI’s key data-related efforts, including the National Science Foundation-funded DataBridge and DataNet Federation Consortium, the Secure Medical Workspace, and the iRODS Consortium.
Thursday, November 20
10:30 – noon
iRODS Workshop
Presenters: The iRODS@RENCI team
Description: Bring a friend. Bring a VM! This workshop is an opportunity to explore the distributed architecture and powerful features of iRODS. Through hands-on exercises, the iRODS development team will walk participants through the various ways to automate the management of big data and metadata. Come to ask questions, come to learn, come to watch some typing in a terminal. This workshop provides the opportunity to experiment with iRODS in real time, alongside the people who develop iRODS.
1 p.m. – 2 p.m.
The National Consortium for Data Science
Informational slide presentation
Description: The National Consortium for Data Science (NCDS) launched in April 2013 as public-private partnership to address the challenges and opportunities posed by massive data sets being created by digital medicine, environmental sensors, scientific instruments, social networks, and more. Its goals include: encouraging collaboration among industry, academia and government on data science research and problem solving; providing members with access to experts in other fields and domains to help address their data challenges; encouraging data science research that spans academia, industry and government; facilitating improved data science education; and supporting technical, ethical and policy standards for data. Members include research universities in North Carolina, Drexel University, Cisco, Deloitte LLP, GE, IBM, MCNC and RTI International. This presentation provides an overview of the benefits of membership and NCDS activities aimed at advancing data science.
2 p.m. – 3 p.m.
Data Science@RENCI
Informational slide presentation
Description: RENCI, a research institute of UNC-Chapel Hill with links to Duke University and North Carolina State University, provides cyber tools and technologies that enable better research and business innovation. A key focus at RENCI is developing software and other tools that help researchers access, share, manage, analyze and archive data in order to make research discoveries and ensure that research data is accessible by future scientists and scientists in different domains. This presentation offers highlights of some of RENCI’s key data-related efforts, including the National Science Foundation-funded DataBridge and DataNet Federation Consortium, the Secure Medical Workspace, and the iRODS Consortium.