Computer Architecture

Start Lecture #26

Remark: I expect the final exam to be on the 7th floor like the midterm. A practice final is on the web.

Remark: Covered Tag Size and Division of Address Bits which was inadvertently omitted.

Controler Time

Not much to say. It is typically small. We will use 0ms (i.e., ignore this time).

Queuing Delays

This can be the largest component, but we will ignore it since it is not a function of the architecture, but rather of the load and OS.

Dependability, Reliability, and Availability

Reliability measures the length of time during which services is continuously delivered as expected.

An example reliability measure is mean time to failure (MTTF), which measures the average length of time that the system is delivering service as expected. Bigger values are better.

Another important measure is mean time to repair (MTTR), which measures how long the system is not delivering service as expected. Smaller values are better.

Finally we have mean time between failures (MTBF).
MTBF = MTTF + MTTR

One might think that having a large MTBF is good, but that is not necessarily correct. Consider a system with a certain MTBF and simply have the repair center deliberately add an extra 1 hour to the repair time and poof the MTBF goes up by one hour!

RAID

The acronym was coined by Patterson and his students. It stands for Redundant Array of Inexpensive Disks. Now it is often redefined as Redundant Array of Independent Disks.

RAID comes in several flavors often called levels.

No Redundancy (RAID 0)

The base, non-RAID, case from which the others are built.

Mirroring (RAID 1)

Two disks containing the same content.

Error Detecting and Correcting Code (RAID 2)

Often called ECC (error correcting code or error checking and correcting code). Widely used in RAM, not used in RAID.

Bit-Interleaved Parity (RAID 3)

Normally byte-interleaved or several-byte-interleaved. For most applications, RAID 4 is better.

Block-Interleaved Parity (RAID 4)

Striping a.k.a. Interleaving

To increase performance, rather than reliability and availability, it is a good idea to stripe or interleave blocks across several disks. In this scheme block n is stored on disk n mod k, where k is the number of disks. The quotient n/k is called the stripe number. For example, if there are 4 disks, stripe number 0 (the first stripe) consists of block 0, which is stored on disk 0, block 1 stored on 1, block 2 stored on 2, and block 3 stored on 3. Stripe 1 (like all stripes in this example) also contains 4 blocks. The first one is block 4, which is stored on disk 0.

Striping is especially good if one is accessing full stripes in which case all the blocks in the stripe can be read concurrently.

RAID 4

RAID 4 combines striping and parity. In addition to the k so-called data disks used in striping, one has a single parity disk that contains the parity of the stripe.

Consider all k data blocks in one stripe. Extend this stripe to k+1 blocks by including the corresponding block on the parity disk. The block on the parity disk is calculated as the bitwise exclusive OR of the k data blocks.

Thus a stripe contains k data blocks and one parity block, which is the exclusive OR of the data blocks.

The great news is that any block in the stripe, parity or data, is the exclusive OR of the other k. This means we can survive the failure of any one disk.

For example, let k=4 and let the data blocks be A, B, C, and D.

  1. If the parity disk fails, we can easily recreate it since, by definition, the parity block for this stripe is
          A ⊕ B ⊕ C ⊕ D
    which is the exclusive OR of the other blocks.
  2. If a data disk fails, we can again recreate it since, by the commutative and associative properties of XOR,
        A ⊕ B ⊕ C ⊕ the parity block = A ⊕ B ⊕ C ⊕ (A ⊕ B ⊕ C ⊕ D) = D
    and again the missing block is the exclusive OR of the remaining blocks.

Properties of RAID 4.

Distributed Block-Interleaved Parity RAID 5

Rotate the disk used for parity.

Again using our 4 data-disk example, we continue to put the parity for blocks 0-3 on disk 4 (the fifth disk) but rotate the assignment of which disk holds the parity block of different stripes. In more detail.

Raid 1 and Raid 5 are widely used.

P + Q Redundancy (RAID 6)

Gives more than single error correction at a higher storage overhead

.