TR-11-02 Scheduling OpenMP for Qthreads with MAESTRO

Allan Porterfield, Rob Fowler, Paul Horst, David O’Brien, Stephen Olivier, Kyle Wheeler, Brad Viviano, Technical Report TR-11-02, Scheduling OpenMP for Qthreads with MAESTRO, Renaissance Computing Institute, September, 2011.

Obtaining good performance from modern Multi- and Many-core processors requires understanding the dynamic performance of the resources shared by multiple cores. Perfor- mance of single core systems only requires understanding the way that threads interact with the core on which they are executing. Multi- and Many cores systems have complicated this by adding various shared resources (e.g. L3 cache, I/O, network access, etc.) which are shared by multiple cores. The usage of these resources may be influenced by other threads within an application or by other concurrent programs. Ecient scheduling will require a dynamic scheduler. Fortunately, the increase of cores (and increasing frequency of non-ALU bottlenecks) provides the scheduler an opportunity to acquire the resources necessary to do dynamic monitoring and modeling.

MAESTRO includes a scheduler for the Qthreads runtime to explore any potential benefits from dynamic performance monitoring and modeling on application performance. The idea is to use computational resources that would otherwise be idle (because of mem- ory bottlenecks) to measure and model system performance. MAESTRO implements an experimental scheduler on top of the Qthreads runtime. The scheduler communicates with a dynamic performance model to understand the dynamic state of the system. Scheduling decisions use that knowledge to better determine which threads should be executing and where. Qthreads already has a concept of locality (shepherds), to reason about shared resources. Building on the shepherd concept, MAESTRO supports hierarchical work steal- ing both intra- and inter- shepherd. Improving dynamic cache hit rates while reducing the number of expensive remote steals operations.

MAESTRO also extended Qthreads with the XOMP interface. XOMP is generated by the ROSE source-to-source translator to handle OpenMP (version 3.0) input files. The ROSE/Qthreads extension allow most C and C++ OpenMP applications to use the Qthreads runtime.