Anirban Mandal, Ilia Baldine, Yufeng Xin, Paul Ruth, Chris Heerman, Technical Report TR-13-01, Enabling Persistent Queries for Cross-aggregate Performance Monitoring, Renaissance Computing Institute, 2013.
It is essential for distributed data-intensive applications to monitor the performance of the underlying network, storage and computational resources. Increasingly, dis- tributed applications need performance information from multiple aggregates, and tools need to take real-time steering decisions based on the performance feedback. With increasing scale and complexity, the volume and velocity of monitoring data is increasing, posing scal- ability challenges. In this work, we have developed a Persistent Query Agent (PQA) that provides real- time application and network performance feedback to clients/applications, thereby enabling dynamic adapta- tions. PQA enables federated performance monitor- ing by interacting with multiple aggregates and perfor- mance monitoring sources. Using a publish-subscribe framework, it sends triggers asynchronously to appli- cations/clients when relevant performance events occur. The applications/clients register their events of interest using declarative queries and get notified by the PQA. PQA leverages a complex event processing (CEP) frame- work for managing and executing the queries expressed in a standard SQL-like query language. Instead of sav- ing all monitoring data for future analysis, PQA observes performance event streams in real-time, and runs contin- uous queries over streams of monitoring events. In this work, we present the design and architecture of the per- sistent query agent, and describe some relevant use cases.