SciDAC PERI

Overview

The Performance Evaluation Research Institute (PERI) is an Integrated Software Infrastructure Center (ISIC) for improving high-end computer performance. It was developed under the Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC) program. The center is developing a science for understanding the performance of scientific applications on high-end computing systems. It develops engineering strategies for improving performance on these systems. The project integrates several active efforts in the high performance computing community and is forging alliances with application scientists working on DOE Office of Science missions to ensure that the resulting techniques and tools are truly useful to end users. The project focuses on how to best execute a specific application on a given platform.

PERI research emphasizes three areas: (1) performance modeling and prediction; (2) automatic performance optimization; and (3) performance engineering of high profile applications. The performance modeling and prediction activity develops and refines the performance models for given applications or computer systems, significantly reducing the cost of collecting the data upon which the models are based and increasing model fidelity, speed and generality. Spurred by the strong preference among HPC community, the institute is also devoting a large amount of activities in automatic tuning software research. The goal is to provide methodologies for automatic performance tuning that build upon our performance modeling and prediction techniques, and substantially reduce the performance burden on the application programmer. In addition, PERI also commits significant resources to application engagement by working directly with selected application projects, targeting performance improvements on large scale systems, and facilitating major scientific advances.

Power consumption and temperature dispassion are increasing challenges for large-scale systems. With tens to hundreds of thousands of processors, these systems consume megawatts of power, especially when executing computation-intensive DOE workloads. The RENCI team builds power and performance measurement infrastructure, and develops power management and scheduling techniques to adaptively reduce power demands with minimal effect on execution time. In addition, the team also investigates the probability of system failures due to large numbers of interacting components and develops a failure prediction toolkit that uses health measurement data to predict failures. Integrating the fault tolerances with performance and power optimization, the team applies multi-objective functions to balance the two objectives and creates prototype software for runtime performance tuning and adaptations for DOE codes.

Funding

U.S. Department of Energy under Award No. DE-FC02-01ER25488 and No. DE-FC02-04ER25612

Project Directors

  • Robert Lucas, University of Southern California
  • David Bailey, Lawrence Berkeley National Laboratory

RENCI Project Team

  • Rob Fowler, RENCI project leader
  • Todd Gamblin
  • Min Yeol Lim
  • Anirban Mandal
  • Allan Porterfield
  • Dan Reed
  • Jeff Tilson
  • Bradley Viviano
  • Ying Zhang

Partner Institutions

Links

The official DOE SciDAC website
The SciDAC PERI website
PERI Publication Page

D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang, “Performance Database Technology for SciDAC Applications”, SciDAC 2007, Boston, MA, June 2007

Daniel A. Reed, C. Lu and C .L. Mendes. “Reliability Challenges in Large Systems,” Future Generation Computer Systems, Spring 2005.

Charng-da Lu, Daniel Reed, “Assessing Fault Sensitivity in MPI Applications,” SC2004 Best Technical Paper, Proceedings of Supercomputing 2004, Pittsburgh, PA, November, 2004.

Karthik Pattabiraman. “Design and Evaluation of a Power-Aware Parallel I/O System,” Master’s Thesis in Computer Science, University of Illinois at Urbana-Champaign, 2004.

Daniel A. Reed, C. L. Mendes and Charng-da Lu. “Intelligent Application Tuning and Adaptation,” In I. Foster & C. Kesselman (Eds.) The Grid: Blueprint for a New Computing Infrastructure, chapter 1, 2nd edition, Morgan Kaufmann, November 2003.

Daniel A. Reed, Charng-da Lu and Celso Mendes. “Big Systems and Big Reliability Challenges,” Proceedings of Parallel Computing 2003, pages 729-736, Dresden, Germany, September 2003.

Charng-da Lu, Daniel A. Reed. “Compact Application Signatures for Parallel and Distributed Scientific Codes,” SC2002 Technical Paper, Proceedings of Supercomputing 2002, Baltimore, MD, November 2002.

Celso Mendes, Daniel A. Reed, “Monitoring Large Systems via Statistical Sampling,” Proceedings of the LACSI Symposium, Santa Fe, NM, October 2002.

Ying Zhang, SvPablo: Performance Analysis on BlueGene/L, SIAM PP2006, San Francisco, CA, February, 2006

Ying Zhang, Shirley Moore, Performance Analysis Using SvPablo and KOJAK, 6th Linux Cluster Institute, Chapel Hill, NC, April 2005.

Daniel A. Reed, Computing – An Intellectual Lever for Multidisciplinary Discovery, keynote address at SC2004, Pittsburgh, PA, November 2004.

Ying Zhang, Methods for Performance Engineering of Scientific Application, tutorial at SC2004, Pittsburgh, PA, November 2004.

Ying Zhang, SvPablo: A Toolkit for Performance Analysis and Visualization, demonstration at SC2004, Pittsburgh, PA, November 2004.

Phil Mucci, Celso Mendes and Bronis R. de Supinski, PERI Tools for Performance Data Gathering and Analysis, tutorial at SC2003, Phoenix, AZ, November 2003.

Ying Zhang, PERI Tools – SvPablo, tutorial at Eighth IBM Scientific Computing User Group Meeting (ScicomP8), Minneapolis, MN, August 2003.

Ying Zhang, SvPablo: A Toolkit For Performance Analysis on Parallel Systems, Eighth IBM Scientific Computing User Group Meeting (ScicomP8), Minneapolis, MN, August 2003.

Celso Mendes, Tools and Methods for Performance Modeling and Prediction, tutorial at SC2002, Baltimore, MD, November 2002.

Celso Mendes, The SvPablo Performance Analysis Tool, Parallel Tools Consortium 2002 Annual Meeting, Knoxville, TN, September 2002.

Ying Zhang, SvPablo: A Toolkit for Performance Tuning and Visualization,” tutorial at The Sixth Meeting of the IBM System Scientific Computing User Group (ScicomP6) Berkeley, CA, August 2002.