Abstract:
The RISC revolution has spurred the development of processors with increasing
levels of instruction level parallelism (ILP). In order to realize the full
potential of these processors, multiple instructions must be issued and
executed in a single cycle. Consequently, instruction scheduling plays a
crucial role as an optimization in this context. While early attempts at
instruction scheduling were limited to compile-time approaches, the recent
trend is to provide dynamic support in hardware. In this paper, we present the
results of a detailed comparative study of the performance advantages to be
derived by the spectrum of instruction scheduling approaches: from limited
basic-block schedulers in the compiler, to novel and aggressive run-time
schedulers in hardware. A significant portion of our experimental study via
simulations, is devoted to understanding the performance advantages of run-time
scheduling. Our results indicate it to be effective in extracting the ILP
inherent to the program trace being scheduled, over a wide range of machine and
program parameters. Furthermore, we also show that this effectiveness can be
further enhanced by a simple basic-block scheduler in the compiler, which
optimizes for the presence of the run-time scheduler in the target; current
basic-block schedulers are not designed to take advantage of this feature. We
demonstrate this fact by presenting a novel enhanced basic-block scheduler in
this paper. Finally, we outline a simple analytical characterization of the
performance advantage, that run-time schedulers have to offer.
Key words: Compile-time Optimizations, Dynamic Schedulers, Instruction
Scheduling, Program Traces, Scope, Superscalar Processors