Class 22
CS 372H
12 April 2012

On the board
------------

1. Last time 
2. Crash course in networking, continued
3. Discuss "Design philosophy of the DARPA Internet protocols"

---------------------------------------------------------------------------

1. Last time

    A. Intro
    B. Physical layer
    C. Big picture
    D. Link layer
    E. Network layer
    F. What do we mean by layering?
    G. ARP

2. Networking, continued

    H. Zoom out
    I. Transport layer
    J. Application layer


    H. Zoom out: where are we?

	I hope to have convinced you that if

	    (a) a computer knew the IP address of a local router; and

	    (b) that computer knew the IP address of the destination; and

	    (c) we have a network that knows how to forward packets

	then

	    --that computer could arrange for packets to travel to its
	    destination

	Okay, but how do we get (a)--(c)?

	(a) two possibilities:

	    --manual configuration
		--BTW, even edge routers get this thing configured
		manually. A third-tier ISP is told: "here's the IP
		address of the other end of this link."
		--If you have a cable modem, it does this	

	    --DHCP

	(b) Naming system: Domain Name System (DNS)

	(c) [DRAW PICTURE OF ROUTING: BGP, OSPF, etc.; ANOTHER FUNCTION
	OF THE NETWORK LAYER]

	WHAT'S NEXT?

	--we do not yet have a way to indicate what application or process
	on the destination computer gets the packet

	--we also don't cleanly handle things like failure, congestion in
	the network, etc.

    I. Transport layer

	Motivation: failure, demultiplexing, flow control, etc.

	DRAW PICTURE:

			layer                          role

		TCP    UDP    ICMP("ping")	{flow control, port space}
			    IP			{forwarding}
			Ethernet		{framing}
		    radio  copper_wires  fiber  {signal propagation}
	    
	
	Several types of error can affect packet delivery

	    --Bit errors (e.g., electrical interference, cosmic rays)

	    --Packet loss (packets dropped when queues fill on overload)

	    --Link and node failure

	In addition, properly delivered frames can be delayed,
	reordered, even duplicated

	How much should OS (or the networking modules) expose to application?

	    --Some failures cannot be masked (e.g., server dead)

	    --Others can be (e.g., retransmit lost packet)

	    --But masking errors may be wrong for some applications (e.g.,
	    old audio packet no longer interesting if too late to play)

	UDP and TCP most popular protocols on IP

	    --Both use 16-bit _port_ number as well as 32-bit IP address

	    --Applications _bind_ to a port and receive traffic to that port
		(discuss later what the interface is)

	UDP: User Datagram Protocol

	    --Exposes packet-switched nature of Internet

	    --Sent packets may be dropped, reordered, even duplicated
	    (but generally not corrupted). Application's problem to deal
	    with these errors
  
	TCP: Transmission Control Protocol

	    --Provides illusion of a reliable "pipe" between two
	      processes on two different machines

	    --Masks lost and reordered packets so apps don't have to worry

	    --Handles congestion and flow control

	Uses of TCP

	    --Most applications use TCP

	    --Easier interface to program to (reliability)

	    --Automatically avoids congestion (don't need to worry about
	      taking down network)

	Servers typically listen on well-known ports
	    SSH: 22
	    Email: 25
	    Finger: 79
	    Web / HTTP: 80

	--Example:  Interacting with www.cs.utexas.edu
	    --Browser resolves IP address of www.cs.utexas.edu 
	    --Browser connects to TCP port 80 on that IP address
	    --Over TCP connection, browser requests and gets home page


---------------------------------------------------------------------------

Aside:

NAT and lab 6

    --can think of NAT as something like a router; sits between the
    outside world and the internal computer

	creates an internal network: 10.0.2/24 

	JOS gets: 10.0.2.15
	fake IP router gets: 10.0.2.2

    --in lab, QEMU runs with tcp:<some_port>::7 which means:

	--QEMU will listen on some_port

	--QEMU will forward connections that are to
	    ip_addr_of_machine:some_port to 
	    10.0.2.15:7

---------------------------------------------------------------------------

    J. Application layer

	Example: HTTP

	Normally, HTTP servers, otherwise known as Web servers, run on
	port 80

	when your Web browser connects to a URL, it knows to always make
	    requests on port 80, meaning it stamps "80" in its packets
	you can direct your Web browser to make requests on any port,
	    though, like this:
		http://<name of some machine>:port_num

	    In that case, the browser itself will address its packets to
	    the IP address that corresponds to the name of the machine
	    and destination port port_num instead of destination port
	    80.

	Messages look like this:

	    Browser --> Server:
		"GET /pics/dog.jpg HTTP/1.0\r\n"
	    
	    Server --> Browser:  
		"HTTP/1.0 404 Not found\r\n"
		or
				 
		 "HTTP/1.0 400 OK\r\n
		 header1: value1\r\n
		 header2: value2\r\n
		 \r\n
		 [the bytes in dog.jpg]"

	    [Keep in mind that the above is happening inside TCP, and
	    that TCP is presenting a reliable byte stream to the layers
	    above it.]

	QUESTION: where does NFS sit in this picture?
	    [answer: runs over UDP or TCP on some port, either
	    well-known, or determined with a port mapping service
	    running on the server]

    K. What is the interface to the networking stack?

     --Application programmer classically sees *sockets*. 

	Inspired by pipes 
	    int pipe(int fds[2])
		--Allow Inter-process communication on one machine
		--Writes to fds[1] will be read on fds[0]
		--Can give each file descriptor to a different process
		(with fork)

	The idea is: let's do the same thing across machines:
	    **SOCKETS**

	Write data on one machine, read it on another

	*sockets* can represent many different network protocols, but:

	--classically an interface to TCP/IP and UDP
	--sometimes an interface to IP or Ethernet (raw sockets)

	--sockets API:

	/* senders and receivers */
	int sockfd = socket(AF_INET, SOCK_STREAM|SOCK_DGRAM|, 0);
	    [note: with AF_INET in the first position, the setting of
	    SOCK_STREAM vs SOCK_DGRAM controls whether the app's data is
	    going to go over TCP or UDP].
	    
	    [with UDP sockets, send atomic messages that may be
	    reordered or lost]

	    [with TCP sockets, bytes written on one end are read on the
	    other, provided no failures. but no guarantees that reads
	    will return the full amount requested ... or that the data
	    will be packetized according to the number of times the
	    sender called send(). With TCP, you *must* sit there in a
	    loop and keep reading. You know you're done because either
	    (a) the application-level protocol is expected to understand
	    where message boundaries begin and end or (b) the first
	    machine closed its connection to the server]

	int rc = close();
	select();

	struct sockaddr_in {
	    short sin_family;
	    short sin_port; 
	    uint32_t sin_addr; 
	    char sin_zero[8];
	};

	/* senders */
	int rc = connect(sockfd, &addr, addrlen);
	int rc = send(sockfd, buf, len, 0);
	int rc = sendto(sockf, buf, len, 0, &sockaddr, addrlen, 0);

	/* receivers */
	int rc = bind(sockfd, &addr, addrlen);
	int rc = listen(sockfd, backlog_len);
	int rc = accept(sockfd, &addr, &adddrlen);
	int rc = recv(sockfd, buf, len, 0);
	int rc = recvfrom(sockfd, buf, len, 0, &addr, &addrlen);


	NOTES:

	* connections are named by 5 components:

	    protocol (TCP), local IP address, local port, remote IP
	    address, remote port

	* UDP does not require connected sockets

	* OS tracks all of this state in a PCB (protocol control block).

    --What does kernel see, and what interfaces does it invoke?

	TX direction:

	--usually gets payloads from higher levels and implements
	TCP/IP, UDP, IP, and part of Ethernet

	--usually hands most of an Ethernet frame to the network device

	--but not always: could imagine a Web server implemented
	entirely in the kernel, or even a Web server implemented on a
	network card

	--(in JOS, the entire networking stack is implemented in user
	space. that is the function of the lwip library.)

	RX direction:

	--when a packet arrives, use 5-tuple (above) to find PCB and
	figure out what to do with packet

    Note that to avoid lots of copies, OS may not actually store packets
    contiguously. May store linked list of buffers. Each buffer is
    either a packet header or a payload

    Network interface cards (NICs)
 
	--Used to be dumb

	--Now sometimes do lots of stuff

	--You are getting a network interface card working in lab 6


    Kernels also do *routing*

	--A machine has multiple NICs connected to different networks,
	kernel gets a packet (either from one of the NICs or from an
	application), now which NIC does it go out?
	    
	--kernel generally looks at the destination address of the
	packet and does a lookup in a table that it maintains:
	    [IP address, prefix-length] --> next-hop 

	next-hop is the physical interface to send the packet out

	This is the same routing function that Internet routers do

	there are data structures to make it efficient in time and space
	(radix trees are a decent first cut)

3. Paper discussion

    Who is David Clark? What is context for this paper?

    ASK: why was this paper written?

    ASK: what is the most important goal listed by Clark?
	
	ASK: what high-level considerations is this goal capturing, as
	in, _why_ was this the most important goal?

    ASK: what is the least important goal listed by Clark?

    ASK: what's notably not on this list?

    A. The goal of survivability

	ASK: what does this mean?

	ASK: does traffic _really_ route around problems?

	ASK: what does Clark mean by this quotation?

	    "In other words, at the top of transport, there is only one
	    failure, and it is total partition. The architecture was to mask
	    completely any transient failure".

    B. The goal of generality

	ASK: what is a cost imposed by generality?
	    (answer: it would have been impossible to support all
	    desired traffic classes within, say, TCP.

	    Thus, the cost was an extra layer.)

	ASK: Explain this point about real-time digital speech (and
	media).

	What characteristics do we want and NOT want from the network
	protocols that carry such traffic?

	ASK: what is the problem with reliability in a network protocol?

    C. More about non-assumptions

	"There are a number of services which are explicitly not assumed
	from the network. These include reliable or sequenced delivery,
	network level broadcast or multicast, priority ranking of
	transmitted packet, [sic] support for multiple types of service,
	and internal knowledge of failures, speeds, or delays."

	ASK: what does the above mean, and what services could we build
	if the network had these features?

    D. Performance

	ASK: what does Clark mean by "It is a comment on the goal
	structure of the Internet architecture that a back of the
	envelope analysis, if done by a sufficiently knowledgeable
	person, is usually sufficient"?

    E. The datagram abstraction

	ASK: according to Clark:

	    * what is the misconception about why datagram service is
	    exposed?
		(that the higher levels benefit from that service.)

	    * what is the true reason that datagram service is exposed?
		(that it's the only building block that really makes
		sense in a packet switched network. against the
		misconception, Clark observes that, on the contrary,
		transport protocols might like to assume much _more_
		than datagrams, but they're not allowed to.)
    
    F. Accountability

	ASK: What is the effect of datagram service on accountability?
	    (Kind of rough. Each packet is logically part of some
	    communication or flow, but network elements don't have the
	    visibility into those higher-level things; all they see are
	    packets. As a result, accounting has to be done separately
	    on each packet, mostly by sampling.)
    
    G. ASK: What's soft state?