Class 24 CS 372H 20 April 2010 On the board ------------ 1. Networking Last time: A. Physical layer B. Big picture C. Link layer D. Network layer E. What do we mean by layering? F. ARP Today: G. Transport layer H. Application layer I. What is the interface to the networking stack? 2. Using networks to build distributed systems --Motivation for distributed transactions --Impossibility result: two generals' problem --------------------------------------------------------------------------- 0. Last time [DRAW PICTURE OF THE BIG PICTURE] --I think I mostly convinced you that: --if a computer knew the IP address of a local router; and --that computer knew the IP address of the destination; and --we have a network that knows how to route packets --Then, that computer could arrange for packets to travel to its destination [DRAW PICTURE OF ROUTING] --Poke around: --"arp -a" (Unix) --"ifconfig -a" (Unix) --"netstat -arn" (Unix) --"ipconfig /all" (windows) --"route print" (Windows?) 1. Networking, continued --Several questions: --How does the computer get the IP address of a local router? --manual configuration --BTW, even edge routers get this thing configured manually. A third-tier ISP is told: "here's the IP address of the other end of this link." --If you have a cable modem, it does this --DHCP --How does the computer get the IP address of the remote machine? --NAT --Explain this So where are we? --have a way to get packets to a destination computer --but don't yet have a way to indicate what application or process on that destination computer gets the packet --also don't cleanly handle things like failure, congestion in the network, etc. G. Transport Layer Motivation: failure, demultiplexing, flow control, etc. DRAW PICTURE: layer role TCP UDP ICMP("ping") {flow control, port space} IP {forwarding} Ethernet {framing} radio copper_wires fiber {signal propagation} Several types of error can affect packet delivery --Bit errors (e.g., electrical interference, cosmic rays) --Packet loss (packets dropped when queues fill on overload) --Link and node failure In addition, properly delivered frames can be delayed, reordered, even duplicated How much should OS (or the networking modules) expose to application? --Some failures cannot be masked (e.g., server dead) --Others can be (e.g., retransmit lost packet) --But masking errors may be wrong for some applications (e.g., old audio packet no longer interesting if too late to play) UDP and TCP most popular protocols on IP --Both use 16-bit _port_ number as well as 32-bit IP address --Applications _bind_ to a port and receive traffic to that port (discuss later what the interface is) UDP -- Unreliable Datagram Protocol --Exposes packet-switched nature of Internet --Sent packets may be dropped, reordered, even duplicated (but generally not corrupted). Application's problem to deal with these errors TCP -- transmission control protocol --Provides illusion of a reliable "pipe" between two processes on two different machines --Masks lost and reordered packets so apps don't have to worry --Handles congestion and flow control Uses of TCP --Most applications use TCP --Easier interface to program to (reliability) --Automatically avoids congestion (don't need to worry about taking down network) Many issues involved in implementing TCP --TCP involves *acknowledgments* from receiver to sender --Key to good TCP throughput is to have the right number of outstanding (unacknowledged) packets. Call this value W, the window. --The effective throughput of the communication depends on W and the round-trip time (RTT). Specifically, the transmit rate = W/RTT. (convince yourself of this by drawing a picture in which you imagine that W packets are *not* acknowledged, so the sender can't send more than W over an RTT.) --But W can't be too big because TCP needs to react to congestion in the network --Specifically, every end-point is tasked with saving the network from congestion collapse --Approach: TCP "learns" a continuously varying and appropriate value of W: --Slowly increase transmission by one packet per acked window (corresponds to increasing W by the ratio 1/W). --When a packet is lost (as indicated by the acknowledgments), cut window size in half (corresponds to decreasing W by the factor 1/2) --This is called additive increase, multiplicative decrease --Connection set-up and tear-down is complicated --sender never knows if it's last packet was lost --so has to keep state around after connection close --Tons of hacks for good performance --Initially ramp W up faster (but too fast caused collapse in 1986, so TCP had to be changed) --Fast retransmit when single packet lost Issues directly for OS too --Have to track unacknowledged data --Keep a copy around until recipient acknowledges it --Keep timer around to retransmit if no ack --Receiver must keep out of order segments and reassemble --When to wake process receiving data? --E.g., sender calls write (fd, message, 8000); --First TCP segment arrives, but is only 512 bytes --Could wake recipient, but useless w/o full message --TCP sets PUSH bit at end of 8000 bytes, to force write data --When to send short segment, vs. wait for more data --Usually send only one unacked short segment --But bad for some apps, so provide NODELAY option --Must ack received segments very quickly --Otherwise, effectively increases RTT, increasing bandwidth-delay product but without increase in bandwidth --> useful throughput declines Servers typically listen on well-known ports SSH: 22 Email: 25 Finger: 79 Web / HTTP: 80 --Example: Interacting with www.cs.utexas.edu --Browser resolves IP address of www.cs.utexas.edu --Browser connects to TCP port 80 on that IP address --Over TCP connection, browser requests and gets home page H. Application layer Example: HTTP Normally, HTTP servers, otherwise known as Web servers, run on port 80 when your Web browser connects to a URL, it knows to always make requests on port 80, meaning it stamps "80" in its packets you can direct your Web browser to make requests on any port, though, like this: http://:port_num In that case, the browser itself will address its packets to the IP address that corresponds to the name of the machine and destination port port_num instead of destination port 80. Messages look like this: Browser --> Server: "GET /pics/dog.jpg HTTP/1.0\r\n" Server --> Browser: "HTTP/1.0 404 Not found\r\n" or "HTTP/1.0 400 OK\r\n header1: value1\r\n header2: value2\r\n \r\n [the bytes in dog.jpg]" [Keep in mind that the above is happening inside TCP, and that TCP is presenting a reliable byte stream to the layers above it.] QUESTION: where does NFS sit in this picture? [answer: runs over UDP or TCP on some port, either well-known, or determined with a port mapping service running on the server] I. What is the interface to the networking stack? Application programmer classically sees *sockets*. Inspired by pipes (which we'll come back to) int pipe(int fds[2]) --Allow inter-process communication on one machine --Writes to fds[1] will be read on fds[0] --Can give each file descriptor to a different process (with fork) The idea is: let's do the same thing across machines: **SOCKETS** Write data on one machine, read it on another *sockets* can represent many different network protocols, but: --classically an interface to TCP/IP and UDP --sometimes an interface to IP or Ethernet (raw sockets) sockets API: /* senders and receivers */ int sockfd = socket(AF_INET, SOCK_STREAM|SOCK_DGRAM|, 0); [note: with AF_INET in the first position, the setting of SOCK_STREAM vs SOCK_DGRAM controls whether the app's data is going to go over TCP or UDP]. [with UDP sockets, send atomic messages that may be reordered or lost] [with TCP sockets, bytes written on one end are read on the other, provided no failures. but no guarantees that reads will return the full amount requested ... or that the data will be packetized according to the number of times the sender called send(). With TCP, you *must* sit there in a loop and keep reading. You know you're done because either (a) the application-level protocol is expected to understand where message boundaries begin and end or (b) the first machine closed its connection to the server] int rc = close(); select(); struct sockaddr_in { short sin_family; short sin_port; uint32_t sin_addr; char sin_zero[8]; }; /* senders */ int rc = connect(sockfd, &addr, addrlen); int rc = send(sockfd, buf, len, 0); int rc = sendto(sockf, buf, len, 0, &sockaddr, addrlen, 0); /* receivers */ int rc = bind(sockfd, &addr, addrlen); int rc = listen(sockfd, backlog_len); int rc = accept(sockfd, &addr, &adddrlen); int rc = recv(sockfd, buf, len, 0); int rc = recvfrom(sockfd, buf, len, 0, &addr, &addrlen); NOTES: * connections are named by 5 components: protocol (TCP), local IP address, local port, remote IP address, remote port * UDP does not require connected sockets * OS tracks all of this state in a PCB (protocol control block). What does kernel see, and what interfaces does it invoke? TX direction: --usually gets payloads from higher levels and implements TCP/IP, UDP, IP, and part of Ethernet --usually hands most of an Ethernet frame to the network device --but not always: could imagine a Web server implemented entirely in the kernel, or even a Web server implemented on a network card --(in JOS, the entire networking stack is implemented in user space. that is the function of the lwip library.) RX direction: --when a packet arrives, use 5-tuple (above) to find PCB and figure out what to do with packet Note that to avoid lots of copies, OS may not actually store packets contiguously. May store linked list of buffers. Each buffer is either a packet header or a payload Network interface cards (NICs) --Used to be dumb --Now sometimes do lots of stuff --You are getting a network interface card working in lab 6 Kernels also do *routing* --A machine has multiple NICs connected to different networks, kernel gets a packet (either from one of the NICs or from an application), now which NIC does it go out? --kernel generally looks at the destination address of the packet and does a lookup in a table that it maintains: [IP address, prefix-length] --> next-hop next-hop is the physical interface to send the packet out This is the same routing function that Internet routers do there are data structures to make it efficient in time and space (radix trees are a decent first cut) 2. Using networks to build distributed systems Distributed systems -- a system running across multiple machines -- is a key application of the network! Lots of issues to consider..... Note that previously, we had better modularity: --bug in user-level program --> process crashes --bug in kernel --> all processes crash --power outage --> all machines fail But in a distributed system, one machine can crash, others can stay up. Some machines can be slow. Some can crash and come back up. Lots of other issues to consider......computers can lose state, reboot, have partial state. Messages can be reordered, dropped, duplicated, delayed, etc......How do you build a system out of multiple processors and make the system *appear* to be tightly coupled (i.e., running in the same machine.) even if it is not? "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." --Leslie Lamport http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt A. Motivation for distributed transactions (i) want to coordinate actions across sites: --I write you a check for $100. My bank is Frost, yours is BoA --need to debit my account $100 and credit yours with $100 --how the heck are we going to ensure that both banks execute the transaction or don't? --More complex example: --debit account on computer in New York with $1000 --open cash drawer in San Francisco, give $500 --credit account in Houston with another $500 --File systems example: --move a file from directory A on server a to directory B on server b (better not do one and not the other) We want the abstraction of a multi-site (or _distributed_) transaction --but how the heck are we going to build a transaction if our messages are carried over a network that lost them, delay them, duplicate them? and given that some computers can fail, reboot, etc.? --and actually the situation is even worse....... B. Two Generals' Problem (an impossibility result) [DRAW PICTURE: TWO ARMIES SEPARATED BY A VALLEY. RUNNERS GO BETWEEN THEM. RUNNERS CAN BE KILLED OR DELAYED. IF BOTH ARMIES ATTACK, THEY WIN. IF ONLY ONE ATTACKS, EVERYONE WHO ATTACKS DIES] -----> "3:30 PM good?" <---- "yeah, 3:30 PM is good." [at this point, both parties know that *if* there is an attack, they will attack at 3:30 PM. but the right-hand general cannot know that the left-hand general actually got the reply. so they need some more messages....a lot more.....] ----> "so we're doing this thing, right?" <---- "yeah, totally. but what if you don't get this ack?" [....in fact an infinite number of messages would be required.] Impossible to get the two generals to safely attack Conclusion: cannot use messages and retries over an unreliable network to synchronize two machines so that they are guaranteed to do the same operation at the same time. So are we out of business? Yes, if we need to actually solve Two Generals' Problem. No, if we are content with a weaker guarantee. --------------------------------------------------------------------------- Admin notes --Guest lecture Thursday. --Will be tested on that material. Plus, it should be interesting and a bit of a change of pace from our usual material. --Project: --lab 7 has most of the details --email proposal and project team to us by 3:00 PM on Friday; cannot use late days for this mini-deadline. --evaluation will be on the quality of your project, *not* scaled to the number of people. --conclusion: very much in your best interest to work with a partner (--some of you didn't choose partners because of scheduling difficulties and so forth, but the project need not be programmed in pair style.) --the project is meant to be fun, and I think if you take something you're interested in, it can be reasonably fun. --try to do something that is manageable (so you can finish it), fun (so you can enjoy it), and cool/interesting (so it is worthwhile)