Acks One per pkt vs one per msg Called stop-and-wait and blast In former wait for each ack In blast keep sending packets until msg finished Could also do a hybrid Blast but ack each packet Blast but request only those missing instead of general nak Called selective repeat Flow control Buffer overrun problem Internet worm caused by buffer overrun and rewriting non- buffer space. This is not the problem here. Can occur right at the interface chip, in which case the (later) packet is lost. More likely with blast but can occur with stop and wait if have multiple senders What to do If chip needs a delay to do back to back receives have sender delay that amt. If can only buffer n pkts, have sender only send n then wait for ack The above fails when have simultaneous sends. But hopefully that is not too common. This tuning to the specific hardware present is one reason why gen'l protocols don't work as well as specialized ones. Why so slow? Lots to do! Call stub get msg buf marshall params If use std (UDP), computer checksum fill in headers Poof Copy msg to kernel space (Unless special kernel) Put in real destination addr Start DMA to comm device ---------------- wire time Process interrupt (or polling delay) Check packet Determine relevant stub Copy to stub addr space (unless special kernel) Unpoof Unmarshall Call server On the Paragon (intel large MPP of a few years ago), the above (not exactly the same things) took 30us of which 1us was wire time Eliminating copying Message transmission is essentially a copy so min is 1 This requires the network device to do its dma from the user buffer (client stub). Directly into the server stub. Hard for receiver to know where to put the msg until it arrives and is inspected Sounds like a copy needed from receiving buffer to server stub. Can avoid this by fiddling with mem maps Must be full pages (as that is what is mapped) Normally there are two copies on the receiving side From hardware buffer to a kernel buffer From kernel buffer to user space (server stub) Often two on sender side User space (client stub) to kernel buffer Kernel buffer to buffer on device Then start the device The sender ones can be reduced Device can do DMA from the kernel buffer eliminates 2nd Doing DMA from user would eliminate the first but need scather gather (just gather here) since the header must be in kernel space since the user is not allowed to set it. To eliminate the two on the receiver side is harder Can eliminate the first if device writes directly into kernel buffer. To eliminate the 2nd requires the remaping trick. Timers and timeout values Getting a good value for the timeouts is a black art. Too small and many unneded retransmissions Too large and wait too long Should be adaptive?? If find that sent an extra msg raise timeout for this class of transmissions. If timeout expires most of the time, lower value for this class How to keep timeout values If you know that almost all timers of this class are going to go off (alarms) and accuracy is important, then keep a list sorted by time to alarm. Only have to scan head for timer (so can do it frequently) Additions must search for place to add Deletions (cancelled alarms) are presumed rare If deletions are common and can afford not so accurate an alarm, then sweep list of all processes (not so frequently since accuracy not required). Deletions and additions are easy since list is indexed by process number Difficulties with RPC Global variables like errno inherently have shared-variable semantics so don't fit in a distributed system. One (remote) procedure sets the vble and the local procedure is supposed to see it. But the setting is a normal store so is not seen by the communication system. So transparancy is violated Weak typing makes marshalling hard/impossible How big is the object we should copy? What is the conversion needed if heterogeneous system? So transparancy is violated Doesn't fit all programming models Unix pipes pgm1 < f > g looks good pgm is a client for stdin and stdout RPCs to the file server for f and g Similarly for pgm2 < x > y But what about pgm1 < f | pgm2 > y Both pgm1 and pgm2 are compiled to be clients but they communicate so one must be a server Could make pipes servers but this is not efficient and somehow doesn't seem natural Out of band (unexpected) msgs Program reading from a terminal is client and have a terminal server to supply the characters. But what about ^C. Now the terminal driver is supposed to be active and servers are passive ---------------- Group Communication ---------------- Groups give rise to ONE-TO-MANY communication. A msg sent to the group is received by all (current) members. We have had just send-receive which is point to point or one-to-one Some networks support MULTICASTING Broadcast is special case where many=all Multicasting gives the one-to-many semantics needed Can use broadcast for multicast Each machine checks to see if it is a part of the group If don't even have broadcast, use multiple one-to-one transmissions often called UNICASTING. Groups are dynamic Groups come and go Members come and go Need algorithms for this Somehow the name of the group must be known to the member-to-be For an open group, just send a "I'm joining" msg For a closed group, not possible For a pseudo-closed (allows join msgs), just like open To leave a group send good bye If process dies members have to notice it I am alive msgs Piggyback on other msgs Once noticed send a group msg removing member This msg must be ordered w.r.t other group msgs (Why?) If many go down or network severs, need to re-establish group Message ordering A big deal Want consistent ordering Will discuss in detail later this chapter Note that an interesting problem is ordering goodbye and join with regular msgs. Closed and open groups Difference is that in open and outsider can send to group Open for replicated server Closed for group working on a single (internal) problem