- A processor runs at 2GHz. What is the length of its clock cycle?
(Give
your answer in microseconds, nanoseconds, or picoseconds.)

- A disk has an access time of 10 ms. Assuming the time for data transfer is negligible, how many disk accesses can be performed each second?
- A disk rotates at 6000 RPM. What is its average rotational latency?
- The access time of a disk is composed of ______________ and _____________.
- Suppose we have a loop of 10 machine instructions and we execute
this loop one billion times on a 2 GHz machine with a CPI of 2.0.
How long will the billion iterations of the loop take?

- Design a fast circuit to compute the sum of two 2-bit positive numbers. Construct a truth table for such a circuit, and then convert the truth table into a sum-of-products logic formula for each output. Suppose you constructed this circuit using inverters and AND and OR gates with up to 4 inputs, where each inverter and gate has a delay of 500 ps. What is the propagation delay of this circuit, from input to output?

- Write a MIPS program with a loop which copies the 20 words (80 bytes) beginning at byte 1000 to the 20 words beginning at byte 2000.
- Give the bit pattern (32 bits) for add $3,$2,$1

- What is the purpose of the 'sign extend' circuit in the MIPS CPU you simulated? Suppose we didn't have a sign extend circuit; what limitation would there be on branch instructions?
- For which instruction is the Ainvert signal needed?

- On a pipelined MIPS machine, the instruction sequence

add $4, $2, $3

add $5, $4, $3

is an example of what type of hazard? What can we do to efficiently handle this problem? - On a pipelined MIPS machine, the instruction sequence

lw $4, 100($0)

add $5, $4, $3

is an example of what type of hazard? What can we do to efficiently handle this problem? - On a pipelined MIPS machine, the instruction sequence

bne $1, $2, quack

add $5, $4, $3

is an example of what type of hazard? What can we do to efficiently handle this problem?

- Consider two alternative caches, each of which has a capacity of 8 words and a block size of one word. Cache D is a direct mapped cache, and cache T is a two-way set associative cache. Suppose the cache is initially empty and we fetch the words at the following addresses in sequence: 1, 2, 9, 3, 1, 5, 9. Which of these fetches will result in cache hits?
- Suppose that we have a 2 ns cache (it takes 2 ns to access the data or identify a miss), and a memory system with a 40 ns access time. What is the average memory access time if the cache hit rate is 97%? If we built a larger cache, with a 4 ns access time but a hit rate of 98%, would the average memory access time increase or decrease?

- Suppose we have a disk which transfers 40MB/sec and
interrupts the
CPU each time a byte is available. The CPU executes approximately 800
mips,
and the interrupt routine takes 15 instructions to transfer a byte to
memory.
What fraction of the CPU time will be occupied doing IO with the
disk? What could we do to reduce this overhead?

- Suppose you have a program which takes 100 seconds on a single processor. 10% of the time is consumed by code which is inherently sequential; the other 90% can be fully parallelized. Suppose we buy a 50-processor multiprocessor and parallelize the program as much as possible. How much time should the program take, ignoring communication overhead?
- The Quackers multiprocessor consists of 100 processors connected in a 10-by-10 2d grid with 4Gb/sec bidirectional links. What is the bisection bandwidth of this network?
- Suppose you are given a (Runnable) Java class PredictRainfall. It has a constructor with one integer argument days and a run method which runs for several minutes and then prints "predicted rainfall in days days is nn inches". Suppose we get a machine with a dual core processor and what to compute the rainfall for tomorrow (days = 1) and the day after tomorrow (days = 2) in parallel. Set up a main method for PredictRainfall to do this.
- The text shows two versions of parallel code (for shared memory
and message passing machines) to compute the sum of a 100,000 element
array using 100 processors. What would have to be changed in this
code to use 200 processors?