Overview
Grids hold great promise for researchers who need to connect to remote computers, databases and other resources. However, they can be complex, unreliable, and require a huge time investment for even low-level operations. The Virtual Grid Application Development Software (VGrADS) project was developed to keep these problems from impeding the potential of grids and distributed resources.
Based on work from the earlier GrADS project, VGrADS collaborators developed software tools to simplify and accelerate the development of grid applications and services, while delivering high levels of performance and resource efficiency. RENCI’s role in the VGrADS project focused on using temporal-based reasoning to qualitatively assess, diagnose and adapt long-running applications. RENCI also investigated qualitative metrics for implementing a multi-level fault tolerance strategy, especially in the context of workflows that have strict deadlines, such as weather forecasting. RENCI was also involved in implementing and evaluating different fault-tolerance and recovery mechanisms for such workflows.
The VGrADS project completed its work in September 2009.
Funding
Cooperative Agreement issued to Rice University under National Science Foundation Cooperative Agreement No. CCR-0331645 with a sub agreement to the University of North Carolina at Chapel Hill.
Co-Principal Investigators
- Fran Berman, University of California at San Diego
- Henri Casanova, University of Hawaii
- Keith Cooper, Chuck Koelbel, Richard Tapia, Linda Torczon, Rice University
- Jack Dongarra, University of Tennessee at Knoxville
- Lennart Johnsson, University of Houston
- Carl Kesselman, University of Southern California Information Sciences Institute
- Richard Wolski, University of California at Santa Barbara
Project Team
- Daniel Reed (co-PI until Dec-07)
- Anirban Mandal (co-PI, Dec-07 to Sep 09)
- Gopi Kandaswamy
- Emma Buneci (Student)
L. Ramakrishnan, D. Nurmi, A. Mandal, C. Koelbel, D. Gannon, T. M. Huang, Y. S. Kee, G. Obertelli, K. Thyagaraja, R. Wolski, A. Yarkhan and D. Zagorodnov, “VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance,” in Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC) , November 2009.
R. Zhang, A. Mandal, C. Koelbel and K. Cooper, “Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids,” in Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid) , pp. 244-251, May 2009.
G. Kandaswamy, A. Mandal, and D. A. Reed, “Fault Tolerance and Recovery of Scientific Workflows on Computational Grids,” in Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid) , pp. 777-782, May 2008.
F. Berman, H. Casanova, A. Chien, K.Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, C. Koelbel, B. Liu, X. Liu, A. Mandal, G. Marin, M. Mazina, J. Mellor-Crummey, C. Mendes, A. Olugbile, M. Patel, D. Reed, Z. Shi, O. Sievert, H. Xia and A. YarKhan, “New Grid Scheduling and Rescheduling Methods in the GrADS Project,” in International Journal of Parallel Programming (IJPP), Volume 33(2-3):pp. 209-229, 2005.
“VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance” – Presentation (remote) at RENCI booth at Supercomputing (SC 2009), November 2009
Anirban Mandal, “Fault Tolerance and Recovery of Scientific Workflows on Computational Grids” – Presentation at International Symposium on Cluster Computing and the Grid (CCGrid 2008), May 2008
Anirban Mandal, “Fault-tolerance on Slots” – Presentation at VGrADS All Hands meeting, April 2008
Anirban Mandal, “Virtual Grid Execution System: Fault Tolerance Planning and Run-time Rescheduling of Scientific Workflows” – Presentation at RENCI booth in Supercomputing (SC) (2007), November 2007
Anirban Mandal, Gopi Kandaswamy and Daniel Reed, “Fault tolerance and Recovery for Grid Workflow Systems” – Presentation at VGrADS All Hands meeting, April 2007
Daniel A. Reed, presentation on Fault Tolerance at VGrADS All Hands Meeting, September 2005
Lavanya Ramakrishnan, presentation on Linked Environments for Atmospheric Discovery at VGrADS All Hands Meeting, September 2005
Partners
- Rice University
- University of California at San Diego
- University of California at Santa Barbara
- University of Houston
- University of North Carolina at Chapel Hill
- University of Southern California Information Sciences Institute
- University of Tennessee at Knoxville
- RENCI
Links
Daniel Reed (co-PI until Dec-07)
Anirban Mandal (co-PI, Dec-07 to Sep 09)
Gopi Kandaswamy
Emma Buneci (Student)