A Whirlwind Tour through Computer Architecture:  Part III

Honors Computer Systems Organization (Prof. Grishman)


The x86 is an example of a complex instruction set computer (CISC):  it has a large number of instructions, they come in various sizes, they have complex layouts, and some of them do multiple operations in one instruction (some even do loops, like the REP prefix on string instructions).

In contrast, reduced instruction set computers (RISC) have fewer instructions;  all instructions are all the same size and have one or a few layouts;  each instruction does one operation;  and all arithmetic operations are done between registers (separate instructions load registers from memory and store registers into memory).   Because the instructions are much more uniform, and close to the basic machine model, it is simpler to create a pipeline for a RISC.  (If different instructions take different number of cycles, it is much harder to pipeline.)  The main RISC processors in use today are the SPARC processors used by SUN and the Power PC processors used by Apple.  

CISC machines like the x86 manage to create pipelines by translating complex instructions internally into more RISC-like operations.  By such 'on the fly' translation, they have been able to remain competitive with RISC machines in performance.  (Note the implication for code using simple vs. complex instructions, such as we tried for Assignment 5:  the simple instructions fit better into the pipelined framework, and so run as fast as, or faster than, their more complex counterparts.)


In an effort to squeeze even more performance out of the chip, modern CPUs employ superscalar design, which is one step beyond pipelining.  They have multiple ALUs, and issue more than one instruction in each clock cycle. In terms of our formula, the 'clock cycles per instruction' can go below 1.  But the logic to keep track of hazards becomes even more complex;  more logic is needed to schedule operations than to do them.  And even with complex logic, it is hard to schedule parallel operations 'on the fly'.

EPIC:  IA-64

The limits of dynamic operation scheduling have led machine designers to consider a very different architecture, explicitly parallel instruction computers (EPIC), exemplified by the Itanium (IA-64 architecture).  EPIC machines have very large instructions (for the Itanium, 128 bits) which specify several operations to be done in parallel.  Thus the burden of scheduling is shifted from the processor to the compiler, and much more time can be spent in developing a good schedule (and analyzing data hazards).

To reduce the pipelining problems due to conditional branches, the IA-64 introduced predicated instructions.  Comparison instructions set predicate bits, much like they set condition codes on the x86 machine (except that there are 64 predicate bits).  Each operation specifies a predicate bit;  it is executed only if the predicate bit = 1.  In practice, all operations are performed, but the result is stored into the register file only if the predicate bit = 1.  The result is that more instructions are executed, but we don't have to stall the pipeline waiting for a condition.

The Itanium was a joint effort of Intel and Hewlett-Packard to create a '64-bit architecture'.  Itanium chips have been available for about two years (since summer 2001).  The early chips were not competitive with top-of-the-line Pentium IV's. Performance has been improving (1.5GHz Itanium 2 chips were recently released), and now the floating point performance considerably exceeds the fastest Pentiums, while integer performance is at least competitive. Note  that x86 code must be completely recompiled to obtain good performance on an Itanium. 


AMD, the main competitor to Intel for x86-compatible machines, has taken a very different approach to a 64-bit architecture.  They developed an archtecture, AMD64, which is a more natural extension of the x86 design, with 64-bit versions of the usual registers (RAX, RBX, etc.) and 8 additional 64-bit registers.