Midterm exam week from today (rule is must return by 13 march)

HOMEWORK Is the following code (contains an additional binary
         semaphore) also correct?  If it is correct, in what way is it
         better than the original.  Let's discuss next time.

                Binary semphore q,r initially open

P (in out S) is                 V (in out S) is
    L: While S <= 0                 P(r); S++; V(r)
    P(q)
    if S > 0
        S--
        V(q)
    else
        V(q)
        goto L

Some authors, e.g. tanenbaum, reserve the term semaphore for
context-switching (a.k.a blocking) implementations and would not call
our busy-waiting algorithms semaphores.

HOMEWORK 2.2 2.6 (note 2.6 uses TANENBAUM's def of semaphore;
         you may ignore the part about monitors).

P/V Chunk:  With a counting semaphore, one might want to reduce the
semaphore by more than one.

    If each value corresponds to one unit of a resource, this
    reduction is reserving a chunk of resources.  We call it P chunk.

    Similarly V chunk.

    Why can't you just do P chunk by doing multiple P's??

        Assume there are 3 units of the resource available and 3 tasks
        each need 2 units to proceed.  If they do P;P, you can get the
        case where no one gets what they need and none can proceed.

    PChunk(in out S, in amt) is         VChunk(in out S, in amt) is
    while S < amt                           FAA(S,+amt)
    if FAA(S,-amt) < amt
        FAA(S,+amt)
        PChunk

    Let's look at the case amt=1

    PChunk1(in out S) is                VChunk1(in out S) is
    while S < 1
    if FAA(S,-1) < 1
        FAA(S,+1)
        PChunk1

    Since S<1 is the same as S<=0, the above is identical to ElegantP.

    There will be more about this below.

---------------- Producer Consumer (bounded buffer) ----------------

Problem: Producers produce items and consumers consume items (what a
surprise).  We have a bounded buffer capable of holding k items and
want to use it to hold items produced but not yet consumed.

Special case k=1.  What we want is alternation of producer adding
to buffer and consumer removing.  Alternation, umm sounds familiar.

            binary semaphore q initially open
            binary semaphore r initially closed

producer is                         consumer is
    loop                                loop
        produce item                        P(r)
        P(q)                                remove item from buffer
        add item to buffer                  V(q)
        V(r)                                consume item

Note that there can be many producers and many consumers

    But the code guarantees that only one will be adding or
    removing from the buffer at a time.

    Hence can use normal code to remove from the buffer.

Now the general case for arbitrary k.

    Will need to allow some slop so that a few (upto k) producers can
    proceed before a consumer removes an item

    Will need to put a binary semaphore around the add and remove code
    (unless the code is special and can tolerate concurrency).

This is what counting semaphores are great for (semi-critical section)

         counting semaphore e initially k  -- num EMPTY slots
         counting semaphore f initially 0  -- num FULL  slots
         binary   semaphore b initially open

producer is                          consumer is
    loop                                loop
        produce item                        P(f)
        P(e)                                P(b); rem item from buf; V(b)
        P(b); add item to buf; V(b)         V(e)
        V(f)                                consume item

Normally want the buffer to be a queue.

HOMEWORK  What would you do if you had two items that needed to be
consecutive on the buffer (assume the buffer is a queue)?

If we used FAA counting semaphores, there is no serial section in the
above except for the P(b).

NYU Ultracomputer Critical-section-free queue algorithms

    Implement the queue as a circular array

    FAA(tail,1) mod size gives the slot to use for insertions

    FAA(head,1) mod size gives the slot to use for deletions

    Will use I and D instead of tail and head below

    type queue is array 1 .. size of record
        natural phase    -- will be explained later
        some-type data   -- the data to store

    Since we do NOT want to have the critical section in the above
    code, we have a thorny problem to solve.

        The counting semaphores will guarantee that when an insert gets
        past P(e), there is an empty slot.  But it might NOT be at the
        head slot!!

        How can this be?

        If this were all we could just force alternation at each queue
        position with two semaphores at each slot.

        But boris (lubachevsky) also found a scenario where two
        inserts could be going after the same slot and thus you could
        ruin fifo but having the first insert go second.

        So we have a phase at each slot.

            The first (zeroth) insert at this slot is phase 0

            The first (zeroth) delete at this slot  is phase 1.

            Insert j is phase 2*j; delete j is phase 2*j+1

            The phase is for an insert is I div size

            The slot is I mod size

            Most hardware calculate both at once if doing a division
            (quotient and remainder).  If size is a power of 2 these
            are just two parts of the number (i.e. mask and shift).  I
            use a (made up) function (Div, Mod) that returns both values

              counting semaphore e initially size
              counting semaphore f initially 0

Insert is
    P(e)
    (MyPhase, MyI) <-- (Div, Mod) (FAA(I,1), size)
    while phase[MyI] < 2*MyPhase          -- we are here too early, wait
    data[MyI] <-- the-datum-to-insert
    FAA(Phase[MyI],1)                     -- this phase is over
    V(f)

Delete is
    P(f)
    (MyPhase, MyD) <-- (Div, Mod) (FAA(D,1), size)
    while phase[MyD] < 2*MyPhase+1        -- we are here too early, wait
    extracted-data <-- data[MyD]
    FAA(Phase[MyD],1)                     -- this phase is over
    V(e)

This code went through several variations.  The current version was
discovered when I last taught this course in spring of 94, but was
never written before now.

Originally the insert and delete code were complicated and looked
quite different from each other.  At some point I found a way to make
them look very symmetric (but still complicated).  At first I was a
little proud of this discovery and rushed to show boris.  He was not
impressed; indeed he remarked "it must be so".  It was always hard to
translate his "must" so I didn't understand if he was saying that my
comment was trivial or important.  His next, quite perceptive, remark
showed that it was the former and ended the discussion.  Of course
they look the same, "Deletion is insertion of empty space.".

This queue algorithm uses size for two purposes

    The maximum size of the queue

    The maximum concurrency supported.

It would be natural for these too requirements to differ considerable
in the size required.

    A system with 100 processors each running no more than 10 active
    threads using the queue needs at most 1000 fold concurrency, but
    if the traffic generation is bursty, many more than 1000 slots
    would be desired.

One can have a (serially accessed, i.e. critical section) list
instead of a single slot associated with each MyI

One can further enhance this by implementing these serially accessed
lists as linked lists rather than arrays.  This gives the usual
advantages of linked vs sequentially allocated lists (as well as the
usual disadvantages).

This enhanced version is used in our operating system (Symunix)
written by jan edler.