CSCI-UA.0436 - Prof. Grishman

Lecture 17: Pipelining (cont'd) and Multiple Issue

Pipelining:  Control Hazards (Text 4.8)

Connceptually the simplest solution is to stall after a branch, waiting until the branch has been resolved (decided).  That exacts a heavy speed penalty.  Almost as simple is to assume that a branch is not taken and to continue issuing instructions along that path.  We will know whether the branch is taken before any subsequent instructions store results into registers or memory.  If it turns out that the branch is taken, we flush the pipeline (turn all operations in the pipeline into no-operations), reset the PC, and continue.

With the standard pipeline, we discard 3 instructions (lose 3 cycles) if a branch is taken.  P&H show how this can be reduced to one cycle by making the branch decision earlier.  But again as the pipeline becomes longer the problem gets worse.

The better we can predict whether a branch is taken, the smaller the penalty.  This requires dynamic branch prediction ... keeping track of which branches were previously taken with a branch history table.  Keeping 1 bit for each recent branch instruction already helps a lot;  a 2-bit history (Fig. 4.63) does even better.

Some architectures (including, notably, ARM) provide conditional execution of individual instructions ("predication").  ARM has a 4-bit condition code;  each instruction has a 4-bit field indicating the conditions under which the instruction is to be executed.  In some cases, this avoids the need to flush the pipeline.

Note:  for the final exam, you will be expected to examine sequences of instructions and answer questions about the effect on performance:  when data forwarding is sufficient, when a stall is required, how many cycles are lost, whether a branch is correctly predicted.  You will not be asked about the details of how forwarding, stalling, branch prediction is implemented in MIPS.

Multiple issue

Pipelining takes advantage of instruction-level parallelism ... the ability to execute more than one instruction at a time.

Some machines now try to go beyond pipelining to execute more than one instruction at a clock cycle, producing an effective CPI < 1. This is possible if we duplicate some of the functional parts of the processor (e.g., have two ALUs or a register file with 4 read ports and 2 write ports), and have logic to issue several instructions concurrently.  There are two general approaches to multiple issue:  static multiple issue (where the scheduling is done at compile time) and dynamic multiple issue (where the scheduling is done at execution time), also known as superscalar.   Intel Core 2 processors are superscalar and can issue up to 4 instructions per clock cycle.

Static Multiple Issue

Dynamic Multiple Issue (superscalar)

Intel superscalar microarchitectures