Multiprocessors: Shared address space across all processors Multicomputer: Each processor has its own Tightly Coupled vs. Loosely Coupled Former means capable of cooperating closely on a single problem Fine grained communication: i.e. communicate often Large amt of data transfered Tightly coupled machines can normally emulate loosely coupled ones but might give up some fault tolerance Can solve a single large problem on loosely coupled hardware, but the problem must be coarse grained. Various cryptographic challenges solved over the internet with email as the only communication Bus-based multiprocessors These are symmetric multiprocessors (SMPs): From the viewpoint of one processor, all the others look the same. None are logically closer than others. SMP also implies cache coherent (see below). Hardware is fairly simple. Recall a uniprocessor looks like Proc | | Cache(s) Mem | | | | ------------------------------------ Memory Bus | | "Chipset" | | ------------------------------- I/O Bus (e.g. PCI) | | | | | | | | video scsi serial etc Complication. When a scsi disk writes to memory (i.e. a disk read is performed), you need to keep the cache(s) up to date, i.e. CONSISTENT with the data. If the cache had the old value before the disk read, the system can either INVALIDATE the cache entry or UPDATE it with the new value. Facts about caches. The can be WRITE THROUGH, when the processor issues a store the value goes in the cache and also is sent to memory, or they can be WRITE BACK, the value only goes to the cache. In the later case the cache line is marked dirty and when it is evicted it must be written back to memory. To make a bus-based MP just add more processor-caches Proc Proc | | | | Cache(s) Cache(s) Mem | | | | | | ------------------------------------ Memory Bus | | "Chipset" | | ------------------------------- I/O Bus (e.g. PCI) | | | | | | | | video scsi serial etc A key question now is whether the caches are automatically kept consistent (a.k.a. coherent) with each other. When the processor on the left writes a word, you can't let the old value hang around in the right hand cache(s) or the right hand processor can read the wrong value. If the cache is write back, then on an initial write the cache must claim ownership of the cache line (invalidate all other copies). If the cache is write through, you can either invalidate other copies or update them. Because of the broadcast nature of the bus it is not hard to design snooping (a.k.a snoopy) caches that maintain consistency. I call this the "Turkish Bath" mode of communication because "everyone sees what everyone else has got". These machines are VERY COMMON. DISADVANTAGE (really limitation) Cannot be used for a large number of processors since the bandwidth needed on the bus grows with the number of processors and this gets increasingly difficult and expensive to supply. Moreover the latency grows as the number of processors for both speed of light and more complicated (electrical) reasons. Homework 9-2 (ask me next time about this one) Other SMPs From the software viewpoint, the key property of the bus-based MP is that they are SMPs, the bus itself is just an implementation property. To support larger number of processors, other interconnection networks could be used. The book shows two possibilities crossbars and omega networks. Ignore what is said there as it is too superficial and occasionally more or less wrong. Tannenbaum is not a fan of shared memory. (The Ultracomputer in figure 9.4 is an NYU machine; my main research activity from 1980-95) Another name for SMP is UMA for Uniform Memory Access, i.e. all the memories are equal distant from a given processor. NUMAs For larger number of processors, NUMAs (NonUniform Memory Access) MPs are being introduced. Typically some memory is associated with each processor and that memory is close. Others are further away. P--C---M P--C---M | | | | |----------------------------| | | | interconnection network | | | |----------------------------| CC-NUMAs (Cache Coherent NUMAs) are programmed like SMPs BUT to get good good performance must try to exploit the memory hierarchy and have most references hit in your local cache and most others in the part of the shared memory in your "node". HOMEWORK 9.2' Will semaphores work here NUMAs (not cache coherent) are yet harder to program as you must maintain cache consistent manually (or with compiler help). HOMEWORK 9.2'' Will semaphores work here Bus-based Multicomputers The big difference is that it is an MC not an MP, NO shared memory. In some sense all the computers on the internet form one enormous MC. The interesting case is when there is some closer cooperation between processors; say the workstations in one distributed systems research lab cooperating on a single problem. Application must tolerate long-latency communication and modest bandwidth, using current state-of-the-practice Recall the figure for a typical computer Proc | | Cache(s) Mem | | | | ------------------------------------ Memory Bus | | "Chipset" | | ------------------------------- I/O Bus (e.g. PCI) | | | | | | | | video scsi serial etc The simple way to get a bus-based MC is to put an ethernet controller on the I/O bus and then connect all these with a bus called ethernet. (Draw this on the board). Better (higher performance) is to put the Network Interface (possibly ethernet possibly something else) on the memory bus Other (bigger) multicomputers Once again buses are limiting so switched networks are used. Much could be said, but currently grids (SGI/Cray T3E) and omega networks (IBM SP2) are popular. HOMEWORK 9.2''' Will semaphores work here, 9.3