Class 22
CS 372H
12 April 2011

On the board
------------

1. Last time
2. NFS, continued
3. Other distributed file systems
4. Networking

---------------------------------------------------------------------------

1. Last time

    --client/server, RPC

    --NFS

2. NFS

    * [last time] Intro and background
    * [last time] How it works
    * [last time] Statelessness
    * [last time] Transparency
    * Security

    E. Security

	--Only security is via IP address 

	--Another case of non-transparency:

	    --On local system: UNIX enforces read/write protections
		Can't read my files w/o my password

	    --On NFS:
		--Server believes whatever UID appears in NFS request
		--Anyone on the Internet can put whatever they like in the request
		--Or you (on your workstation) can su to root, then su to me
		  --2nd su requires no password
		  --Then NFS will let you read/write my files

	    --In other words, to steal data, just adopt the uid of the
	    person whose files you're trying to read....or just spoof
	    packets.

	--So why aren't NFS servers ridiculously vulnerable?
	    --Hard to guess correct file handles.
	    --(Which rules out one class of attacks but not spoofed
	    UIDs)

	--Observe: the vulnerabilities are fixable
	    --Other file systems do it
	    --Require clients to authenticate themselves cryptographically.
	    --But very hard to reconcile with statelessness.


    F. Concluding note

	--None of the above issues prevent NFS from being useful.
	    --People fix their programs to handle new semantics.
	    --Or install firewalls for security.
	    --And get most advantages of transparent client/server.

    References
    --"RFC 1094": NFS v2
    --"RFC 1813": NFS v3

3. Other distributed file systems (disconnected operation, etc.)

    --disconnected operation: where have we seen this? (answer: git)

    --long literature on this (Coda, Bayou, Andrew File System, etc.,
    etc.)

---------------------------------------------------------------------------

words about lab 6

    gives you the sense you're programming a real piece of hardware.
	--complete with the confusing and frustrating manual (which is
	better than most)
	--this is actually a part of getting real hardware to work,
	unfortunately

    in fact, if you run JOS on real EE100-based network interface, your
    driver should work with it.

    this lab is a fair bit of work. you need to understand a bunch of
    things to make progress:

	 --> how all the different environments fit together

	 --> what the hardware expects from software

	 --> how to actually provide that in software

	 --> roughly what the sockets API is (roughly)

	 --> roughly what an HTTP GET message looks like (roughly)

	 --> how Web servers fit into this

words about lab 7

    --need to find project partners and write a proposal

guest lectures

    they will be tested

    they are also, I am expecting, going to be very interesting!

---------------------------------------------------------------------------


4. Networking

    A. Intro
    B. Physical layer
    C. Big picture
    D. Link layer
    ---------------
	(next time)
    E. Network layer
    F. What do we mean by layering?
    G. ARP
    H. Zoom out
    I. Transport layer
    J. Application layer

    A. Intro

    --What's a network?
	--just a bunch of interconnected channels
	--railroad, highway, plumbing, communication, telephone
	--computer!!!!

    --computer networks are interesting
	--end-points highly programmable, middle kind of boring (only
	kind of).
	    --can program all of the nodes!
	    --extremely easy to innovate and develop new uses of the
	    network
	--contrast: telephone network: end-points ridiculously simple, middle has
	complexity.
	    --worse, can't program most phones, need FCC approval for
	    new devices, no visibility, etc.

    --Going to describe the various layers of a network. Case study will
    be what happens when you gain access to a single Web page or send a
    single RPC in NFS.

    --If you're interested in this stuff, take classes in networking! Or
    program away! Or read the RFCs (short for "Request For Comments" but
    despite the name, they are standards). Few things are as open and
    well-documented as the various protocols that form, and run over,
    the Internet

    --Network classically explained as being divided into sharply
    distinguished layers. In reality, things are messier. But still
    incredibly useful to think about layering.

	--So begin by talking about the lowest layer and then we'll come
	back to layers again a bit later 

    B. Physical layer

	--signals in a medium
	    --medium: coaxial cable, twisted pair (Ethernet), fiber, radio
	    --signals: endless innovation. different electrical profiles
	    correspond to different sets of bits

	--some media are point-to-point:
	    --fiber, twisted pair

	--some media are shared transmission medium (coax, radio)
	    --any message can be seen by all nodes
	    --but now there is contention

	--speed of light matters!
	    --300,000 km/sec in a vacuum, slower in fiber
	    --New York to CA: ~3000 miles = ~5000 km
	    --propagation time:
		5000 km / (300,000 km/sec) = ~17 msec
	    --round-trip: ~34 msec, assuming no computation

	--Technology improvements are not going to fix this

	--But what the heck? I thought I keep reading that networks keep
	getting faster....

	    --*delay* is never going to improve as long as the theory of
	    relativity stands

	    --throughput -- bits per second -- improves ridiculously
	    well

	    --so how do we take advantage of this?

		--concept: bandwidth-delay product
		    
		    [DRAW CYLINDER:
			bandwidth is the height, delay is the length]	
   
		--get full network utilization if you've got # bytes in
		flight = bandwidth*delay

		--but what if the network isn't doing bulk transfer?
		    
		    --then you'll get poor throughput. ping/pong (send a
		    packet, wait for a response) has terrible throughput

		    --this is one reason why concurrency is absolutely
		    critical for good network utilization: a bunch of
		    low-throughput flows may add up to good utilization

    
	Note that physical connectivity is rare.....

	    --instead, communications usually "hop" through multiple
	    devices

	    --[DRAW PICTURE:
		source --> bunch of switches --> destination ]

	    --Allows links and devices to be shared for multiple purposes

	    --Must determine which bits are part of which messages
	    intended for which destinations

	Two kinds of ways to create this indirect connectivity:

	    --Circuit-switched: provide virtual links. Dump bits in at
	    source, they come out at the destination
		--example: the old telephone network. dialing the number
		set up a virtual circuit (and before that, human
		operators set up an actual circuit)
	    
	    --Packet-switched:
		--Pack a bunch of bytes together intended for same destination
		--Slap a _header_ on packet describing where it should go
		--Most networks today are packet switched


    C. Before we go further, let's look at the big picture, or the
    classic Internet technologies:

	computer  - LAN - router - cloud [lots of routers] - router - LAN - computer

	[ Web browser         <transforms packet>
	 ------
	  TCP
	 -----
	  IP ]


    D. Link layer
	
	Ethernet: classic technology

	History:

	    developed at Xerox PARC, intended to help with the office of
	    the future, amazing technology. used constantly. however,
	    not used much in its original configuration (of shared
	    medium) because many links now point-to-point. 

		--but if you plug your computers into a hub, your hardware
		is still going to use Ethernet's key features.
	
	originally designed for shared medium (coaxial cable)

	Packets in Ethernet (and most link layers) are called **frames**
	    [header: 14 bytes. then frame payload, then CRC]
	    [preamble (8 bytes) dst src ethertype <payload> CRC]
	    
	    (DIX frames...Digital, Intel, Xerox)
	    ethertype = 0x0800, 0x0806

	    preamble: helps device recognize start of packet
	    
	    CRC: helps device throw away corrupted packets

	    payload: up to 1500 bytes (roughly)

	    the payload and the other fields are usually set by the OS

	Where do Ethernet addresses, otherwise known as MAC addresses,
	come from?

	    [assigned *to* different hardware manufacturers, who then
	    install them in their products]

	    [but you can reset it, which is one reason why tying access
	    to MAC addresses is often easily circumvented: sniff the
	    wire, learn someone else's MAC address, and take that one
	    on.]

	Special Ethernet addresses for broadcast and multicast

	Medium Access Control (**MAC**) protocol governs access to coax
	    --don't transmit when someone else is
		--CSMA/CD (carrier sense, multiple access, collision
		detection)
	    --if you collide, can detect that, use randomized backoff
	    and try again
	    --need to transmit for at least RTT (measured from one end of extent to other)
	        --(above is a bit of a simplification)

	    Consequence: Ethernet has a maximum end-to-end extent and a
	    minimum frame size (these are specified in standards
	    documents). To see why.....
	
		The 10 Mbps ethernet standard specified a maximum
		end-to-end extent of 2.5 kms --> 
		    <model speed of light in this medium approx. 1.25 x 10^5 km/sec
			because of delays in repeaters>
		    RTT = 5 kms / 1.25 x 10^5 km/sec = 40 microseconds
		    10 Mbps * 40 microseconds = 400 bits = 50 bytes
	
		note that the smallest *useful* packet size is 19 bytes,
		as we'll see below

		so what happened with "fast ethernet" of 100 Mbits/sec?
		and 1Gbps Ethernet?

		    --for FastE, they reduced the maximum network diameter to
		    200 meters

		    --for GigE, minimum packet size is 512 bytes

		    --as Ethernet gets faster, this will get more
		    ridiculous, but increasingly people aren't using
		    Ethernet for its ability to manage a shared medium, so
		    it's okay

	Ethernet is awesome, but it cannot scale to the world:

	    --limit on number of nodes
	    --limit on distance
	    --forwarding state doesn't scale
	    --want a lingua franca

	People address node limits and distance with **bridges** that
	connect two Ethernet networks.

	People also use **switches**, which connect lots more
	Ethernet networks

	--bridges/switches learn where all the devices are and avoid
	forwarding useless packets
	    [table: 
		dst_ether: link]

	--this technology is widely used in organizations, but it
	could never scale to the Internet (too many addresses)

	--moreover, we need a lingua franca, the **network layer** so
	that computers connected to different media (DSL, wireless,
	phone, whatever) can communicate