Midterm exam week from today (rule is must return by 13 march) HOMEWORK Is the following code (contains an additional binary semaphore) also correct? If it is correct, in what way is it better than the original. Let's discuss next time. Binary semphore q,r initially open P (in out S) is V (in out S) is L: While S <= 0 P(r); S++; V(r) P(q) if S > 0 S-- V(q) else V(q) goto L Some authors, e.g. tanenbaum, reserve the term semaphore for context-switching (a.k.a blocking) implementations and would not call our busy-waiting algorithms semaphores. HOMEWORK 2.2 2.6 (note 2.6 uses TANENBAUM's def of semaphore; you may ignore the part about monitors). P/V Chunk: With a counting semaphore, one might want to reduce the semaphore by more than one. If each value corresponds to one unit of a resource, this reduction is reserving a chunk of resources. We call it P chunk. Similarly V chunk. Why can't you just do P chunk by doing multiple P's?? Assume there are 3 units of the resource available and 3 tasks each need 2 units to proceed. If they do P;P, you can get the case where no one gets what they need and none can proceed. PChunk(in out S, in amt) is VChunk(in out S, in amt) is while S < amt FAA(S,+amt) if FAA(S,-amt) < amt FAA(S,+amt) PChunk Let's look at the case amt=1 PChunk1(in out S) is VChunk1(in out S) is while S < 1 if FAA(S,-1) < 1 FAA(S,+1) PChunk1 Since S<1 is the same as S<=0, the above is identical to ElegantP. There will be more about this below. ---------------- Producer Consumer (bounded buffer) ---------------- Problem: Producers produce items and consumers consume items (what a surprise). We have a bounded buffer capable of holding k items and want to use it to hold items produced but not yet consumed. Special case k=1. What we want is alternation of producer adding to buffer and consumer removing. Alternation, umm sounds familiar. binary semaphore q initially open binary semaphore r initially closed producer is consumer is loop loop produce item P(r) P(q) remove item from buffer add item to buffer V(q) V(r) consume item Note that there can be many producers and many consumers But the code guarantees that only one will be adding or removing from the buffer at a time. Hence can use normal code to remove from the buffer. Now the general case for arbitrary k. Will need to allow some slop so that a few (upto k) producers can proceed before a consumer removes an item Will need to put a binary semaphore around the add and remove code (unless the code is special and can tolerate concurrency). This is what counting semaphores are great for (semi-critical section) counting semaphore e initially k -- num EMPTY slots counting semaphore f initially 0 -- num FULL slots binary semaphore b initially open producer is consumer is loop loop produce item P(f) P(e) P(b); rem item from buf; V(b) P(b); add item to buf; V(b) V(e) V(f) consume item Normally want the buffer to be a queue. HOMEWORK What would you do if you had two items that needed to be consecutive on the buffer (assume the buffer is a queue)? If we used FAA counting semaphores, there is no serial section in the above except for the P(b). NYU Ultracomputer Critical-section-free queue algorithms Implement the queue as a circular array FAA(tail,1) mod size gives the slot to use for insertions FAA(head,1) mod size gives the slot to use for deletions Will use I and D instead of tail and head below type queue is array 1 .. size of record natural phase -- will be explained later some-type data -- the data to store Since we do NOT want to have the critical section in the above code, we have a thorny problem to solve. The counting semaphores will guarantee that when an insert gets past P(e), there is an empty slot. But it might NOT be at the head slot!! How can this be? If this were all we could just force alternation at each queue position with two semaphores at each slot. But boris (lubachevsky) also found a scenario where two inserts could be going after the same slot and thus you could ruin fifo but having the first insert go second. So we have a phase at each slot. The first (zeroth) insert at this slot is phase 0 The first (zeroth) delete at this slot is phase 1. Insert j is phase 2*j; delete j is phase 2*j+1 The phase is for an insert is I div size The slot is I mod size Most hardware calculate both at once if doing a division (quotient and remainder). If size is a power of 2 these are just two parts of the number (i.e. mask and shift). I use a (made up) function (Div, Mod) that returns both values counting semaphore e initially size counting semaphore f initially 0 Insert is P(e) (MyPhase, MyI) <-- (Div, Mod) (FAA(I,1), size) while phase[MyI] < 2*MyPhase -- we are here too early, wait data[MyI] <-- the-datum-to-insert FAA(Phase[MyI],1) -- this phase is over V(f) Delete is P(f) (MyPhase, MyD) <-- (Div, Mod) (FAA(D,1), size) while phase[MyD] < 2*MyPhase+1 -- we are here too early, wait extracted-data <-- data[MyD] FAA(Phase[MyD],1) -- this phase is over V(e) This code went through several variations. The current version was discovered when I last taught this course in spring of 94, but was never written before now. Originally the insert and delete code were complicated and looked quite different from each other. At some point I found a way to make them look very symmetric (but still complicated). At first I was a little proud of this discovery and rushed to show boris. He was not impressed; indeed he remarked "it must be so". It was always hard to translate his "must" so I didn't understand if he was saying that my comment was trivial or important. His next, quite perceptive, remark showed that it was the former and ended the discussion. Of course they look the same, "Deletion is insertion of empty space.". This queue algorithm uses size for two purposes The maximum size of the queue The maximum concurrency supported. It would be natural for these too requirements to differ considerable in the size required. A system with 100 processors each running no more than 10 active threads using the queue needs at most 1000 fold concurrency, but if the traffic generation is bursty, many more than 1000 slots would be desired. One can have a (serially accessed, i.e. critical section) list instead of a single slot associated with each MyI One can further enhance this by implementing these serially accessed lists as linked lists rather than arrays. This gives the usual advantages of linked vs sequentially allocated lists (as well as the usual disadvantages). This enhanced version is used in our operating system (Symunix) written by jan edler.