Abstract: The RISC revolution has spurred the development of processors with increasing levels of instruction level parallelism (ILP). In order to realize the full potential of these processors, multiple instructions must be issued and executed in a single cycle. Consequently, instruction scheduling plays a crucial role as an optimization in this context. While early attempts at instruction scheduling were limited to compile-time approaches, the recent trend is to provide dynamic support in hardware. In this paper, we present the results of a detailed comparative study of the performance advantages to be derived by the spectrum of instruction scheduling approaches: from limited basic-block schedulers in the compiler, to novel and aggressive run-time schedulers in hardware. A significant portion of our experimental study via simulations, is devoted to understanding the performance advantages of run-time scheduling. Our results indicate it to be effective in extracting the ILP inherent to the program trace being scheduled, over a wide range of machine and program parameters. Furthermore, we also show that this effectiveness can be further enhanced by a simple basic-block scheduler in the compiler, which optimizes for the presence of the run-time scheduler in the target; current basic-block schedulers are not designed to take advantage of this feature. We demonstrate this fact by presenting a novel enhanced basic-block scheduler in this paper. Finally, we outline a simple analytical characterization of the performance advantage, that run-time schedulers have to offer.

Key words: Compile-time Optimizations, Dynamic Schedulers, Instruction Scheduling, Program Traces, Scope, Superscalar Processors