Acks

        One per pkt vs one per msg

        Called stop-and-wait and blast

            In former wait for each ack

            In blast keep sending packets until msg finished

        Could also do a hybrid

            Blast but ack each packet

            Blast but request only those missing instead of general
            nak

                Called selective repeat

    Flow control

        Buffer overrun problem

            Internet worm caused by buffer overrun and rewriting non-
            buffer space.  This is not the problem here.

            Can occur right at the interface chip, in which case the
            (later) packet is lost.

            More likely with blast but can occur with stop and wait if
            have multiple senders

        What to do

            If chip needs a delay to do back to back receives have
            sender delay that amt.

            If can only buffer n pkts, have sender only send n then
            wait for ack

            The above fails when have simultaneous sends.  But
            hopefully that is not too common.

            This tuning to the specific hardware present is one reason
            why gen'l protocols don't work as well as specialized ones.
            
    Why so slow?  Lots to do!

        Call stub

        get msg buf

        marshall params

        If use std (UDP), computer checksum
        
        fill in headers

        Poof

        Copy msg to kernel space (Unless special kernel)

        Put in real destination addr

        Start DMA to comm device

        ----------------  wire time 

        Process interrupt (or polling delay)

        Check packet

        Determine relevant stub
        
        Copy to stub addr space (unless special kernel)

        Unpoof

        Unmarshall

        Call server

    On the Paragon (intel large MPP of a few years ago), the above
    (not exactly the same things) took 30us of which 1us was wire time

    Eliminating copying

        Message transmission is essentially a copy so min is 1

            This requires the network device to do its dma from the
            user buffer (client stub).  Directly into the server stub.

            Hard for receiver to know where to put the msg until it
            arrives and is inspected

            Sounds like a copy needed from receiving buffer to server
            stub.

            Can avoid this by fiddling with mem maps

                Must be full pages (as that is what is mapped)

        Normally there are two copies on the receiving side

            From hardware buffer to a kernel buffer

            From kernel buffer to user space (server stub)

        Often two on sender side

            User space (client stub) to kernel buffer

            Kernel buffer to buffer on device

            Then start the device

        The sender ones can be reduced

            Device can do DMA from the kernel buffer eliminates 2nd

            Doing DMA from user would eliminate the first but need
            scather gather (just gather here) since the header must be
            in kernel space since the user is not allowed to set it.

         To eliminate the two on the receiver side is harder

            Can eliminate the first if device writes directly into
            kernel buffer.

            To eliminate the 2nd requires the remaping trick.

    Timers and timeout values

        Getting a good value for the timeouts is a black art.

            Too small and many unneded retransmissions

            Too large and wait too long

            Should be adaptive??

                If find that sent an extra msg raise timeout for this
                class of transmissions.

                If timeout expires most of the time, lower value for
                this class

        How to keep timeout values

            If you know that almost all timers of this class are going
            to go off (alarms) and accuracy is important, then keep a
            list sorted by time to alarm.

                Only have to scan head for timer (so can do it
                frequently)

                Additions must search for place to add

                Deletions (cancelled alarms) are presumed rare

            If deletions are common and can afford not so accurate an
            alarm, then sweep list of all processes (not so frequently
            since accuracy not required).

            Deletions and additions are easy since list is indexed by
            process number

Difficulties with RPC

    Global variables like errno inherently have shared-variable
    semantics so don't fit in a distributed system.

        One (remote) procedure sets the vble and the local procedure
        is supposed to see it.

        But the setting is a normal store so is not seen by the
        communication system.

        So transparancy is violated

    Weak typing makes marshalling hard/impossible

        How big is the object we should copy?

        What is the conversion needed if heterogeneous system?

        So transparancy is violated
           
    Doesn't fit all programming models

        Unix pipes

            pgm1 < f > g looks good

                pgm is a client for stdin and stdout

                RPCs to the file server for f and g

            Similarly for pgm2 < x > y

            But what about pgm1 < f | pgm2 > y

                Both pgm1 and pgm2 are compiled to be clients but they
                communicate so one must be a server

                Could make pipes servers but this is not efficient and
                somehow doesn't seem natural

        Out of band (unexpected) msgs

            Program reading from a terminal is client and have a
            terminal server to supply the characters.

            But what about ^C.  Now the terminal driver is supposed to
            be active and servers are passive

---------------- Group Communication ----------------

Groups give rise to ONE-TO-MANY communication.

    A msg sent to the group is received by all (current) members.

    We have had just send-receive which is point to point or
    one-to-one

    Some networks support MULTICASTING

        Broadcast is special case where many=all

        Multicasting gives the one-to-many semantics needed

        Can use broadcast for multicast

            Each machine checks to see if it is a part of the group

        If don't even have broadcast, use multiple one-to-one
        transmissions often called UNICASTING.

Groups are dynamic

    Groups come and go

    Members come and go

    Need algorithms for this

        Somehow the name of the group must be known to the
        member-to-be

        For an open group, just send a "I'm joining" msg

        For a closed group, not possible

        For a pseudo-closed (allows join msgs), just like open

        To leave a group send good bye

        If process dies members have to notice it

            I am alive msgs

            Piggyback on other msgs

        Once noticed send a group msg removing member

            This msg must be ordered w.r.t other group msgs (Why?)

        If many go down or network severs, need to re-establish group

Message ordering

    A big deal

    Want consistent ordering

        Will discuss in detail later this chapter

    Note that an interesting problem is ordering goodbye and join with
    regular msgs.

Closed and open groups

    Difference is that in open and outsider can send to group

    Open for replicated server

    Closed for group working on a single (internal) problem