TR-12-02 Adaptive Scheduling Using Performance Introspection

Allan Porterfield, Rob Fowler, Anirban Mandal, David O’Brien, Stephen L. Olivier, Michael Spiegel, Technical Report TR-12-02, Adaptive Scheduling Using Performance Introspection, Renaissance Computing Institute, 2012.

As energy becomes a driving force in High Per- formance Computing, determining when and how en- ergy can be saved without impacting performance is a key goal for both HPC hardware and software. Scalability studies have shown that some memory- bound applications do not scale as the thread count increases, and in some cases performance degrades. Adaptive Scheduling recognizes when an application is in a memory-bound region and throttles the number of active hardware threads. Our RCRdaemon tool acquires hardware performance counter measurements in near-real time. A simple hardware model added to the Qthreads runtime system reads the collected data to determine when memory contention exists. Using that information, our extension to the Qthreads scheduler reduces contention by throttling hardware threads. Adaptive Scheduling has very low performance impact both for memory-bound benchmarks (below 4.2%) and for compute-bound benchmarks (2.4% – 3.7%).

For these techniques to reduce energy costs, ad- ditional hardware energy features will be required. Applications using Adaptive Scheduling can transition from memory-bound to compute-bound regions hun- dreds of times a second. Hardware mechanisms or instructions to allow energy savings during the short memory-bound regions could be used effectively by multithreaded software to reduce the overall power requirements for memory-bound applications.