Understanding heterogeneous distributed job flow on computational grids can be a daunting task. Machine and network states constantly change as a function of job load, queue size, network utilization and other parameters. Obtaining peak efficiency in this environment requires visual monitoring and analysis tools to facilitate the complex computational topography. One such tool developed at RENCI looks specifically at job flows Open Science Grid (OSG) and the Linked Environments for Atmospheric Discovery (LEAD) portal.
Through the OSG engagement project, RENCI built a layer on top of the OSG infrastructure that matches jobs to OSG resources. The target audience for the system is members of small labs and experimentsnwho do not have the resources to host or maintain their own infrastructure. These images show the match-maker in action, including matching jobs to resources over high-speed networks across the United States, dynamically ranking the sites according to responsiveness and job failures, and handling failed jobs automatically by moving them to other sites.
The project connects a live feed, which provides details on work flow on the OSG and LEAD–grid computing environments that enable researchers to utilize computing resources at many locations. Computing sites are shown as circles anchored to their physical locations and colored by their site ranks. The rankings are used to determine the best locations to send jobs. Jobs are shown as cylinders colored by state (queued, running, and so forth).
The visualization shows how many jobs are running at a given site by stacking them up at the sites. LEAD work flows also show network connection bandwidth between sites by the thickness of the green lines connecting them. Data transfer is shown as streams of ellipsoids, scaled by the size of the data transfer.
This visual application enables RENCI researchers who are creating a more user friendly layer on top of OSG and LEAD to access the traffic on these grid environments, monitor them and determine whether their software is running correctly. The application also helps explain how OSG and LEAD work.