Computer Architecture
1999-2000 Fall
MW 3:30-4:45
Ciww 109
Allan Gottlieb
gottlieb@nyu.edu
http://allan.ultra.nyu.edu/~gottlieb
715 Broadway, Room 1001
212-998-3344
609-951-2707
email is best
======== START LECTURE #25
========
Obtaining bus access
- The simplest scheme is to permit only one bus
master.
- That is, on each bus only one device is permited to
initiate a bus transaction.
- The other devices are slaves that only
respond to requests.
- With a single master, there is no issue of arbitrating
among multiple requests.
- One can have multiple masters with daisy
chaining of the grant line.
- Any device can assert the request line, indicating that it
wishes to use the bus.
- This is not trivial: uses ``open collector drivers''.
- If no output drives the line, it will be ``pulled up'' to
5v, i.e., a logical true.
- If one or more outputs drive the line to 0v it will go to
0v (a logical false).
- So if a device wishes to make a request it drives the line
to 0v; if it does not wish to make a request it does nothing.
- This is (another example of) active low logic. The
request line is asserted by driving it low.
- When the arbiter sees the request line asserted (and the
previous grantee has issued a release), the arbiter raises the
grant line.
- The grant signal is passed from one device to another if the
first device is not requesting the bus. Hence
devices near the arbiter have priority and can starve the ones
further away.
- The device whose request is granted asserts the release line
when done.
- Simple, but not fair and not of high performance.
- Centralized parallel arbiter: Separate request lines from each
device and separate grant lines. The arbiter decides which device
should be granted the bus.
- Distributed arbitration by self-selection: Requesting
processes identify themselves on the bus and decide individually
(and consistently) which one gets the grant.
- Distributed arbitration by collision detection: Each device
transmits whenever it wants, but detects collisions and retries.
Ethernet uses this scheme (but modern switched ethernets do not).
Option | High performance | Low cost |
bus width | separate addr and data lines |
multiplex addr and data lines |
data width | wide | narrow |
transfer size | multiple bus loads | single bus loads |
bus masters | multiple | single |
clocking | synchronous | asynchronous |
Do on the board the example on pages 665-666
- Memory and bus support two widths of data transfer: 4 words and 16
words
- 64-bit synchronous bus; 200MHz; 1 clock for addr; 1 for data.
- Two clocks of ``rest'' between bus accesses
- Memory access times: 4 words in 200ns; additional 4 word blocks in
20ns per block.
- Can overlap transferring data with reading next data.
- Find
- Sustained bandwidth and latency for reading 256 words using
both size transfers
- How many bus transactions per sec for each (addr+data)
- Four word blocks
- 1 clock to send addr
- 40 clocks read mem
- 2 clocks to send data
- 2 idle clocks
- 45 total clocks
- 256/4=64 transactions needed so latency is 64*45*5ns=14.4us
- 64 trans per 14.4us = 64/14.4 trans per 1us = 4.44M trans per
sec
- Bandwidth = 1024 bytes per 14.4us = 1024/14.4 B/us = 71.11MB/sec
- Sixteen word blocks
- 1 clock for addr
- 40 clocks for reading first 4 words
- 2 clocks to send
- 2 clocks idle
- 4 clocks to read next 4 words. But this is free! Why?
Because it is done during the send and idle of previous block.
- So we only pay for the long initial read
- Total = 1 + 40 + 4*(2+2) = 57 clocks.
- 16 transactions need; latency = 57*16*5ns=4.56ms, which is
much better than with 4 word blocks.
- 16 transactions per 4.56us = 3.51M transactions/sec
- Bandwidth = 1024B per 4.56ms = 224.56MB/sec