Class 22 CS 372H 12 April 2011 On the board ------------ 1. Last time 2. NFS, continued 3. Other distributed file systems 4. Networking --------------------------------------------------------------------------- 1. Last time --client/server, RPC --NFS 2. NFS * [last time] Intro and background * [last time] How it works * [last time] Statelessness * [last time] Transparency * Security E. Security --Only security is via IP address --Another case of non-transparency: --On local system: UNIX enforces read/write protections Can't read my files w/o my password --On NFS: --Server believes whatever UID appears in NFS request --Anyone on the Internet can put whatever they like in the request --Or you (on your workstation) can su to root, then su to me --2nd su requires no password --Then NFS will let you read/write my files --In other words, to steal data, just adopt the uid of the person whose files you're trying to read....or just spoof packets. --So why aren't NFS servers ridiculously vulnerable? --Hard to guess correct file handles. --(Which rules out one class of attacks but not spoofed UIDs) --Observe: the vulnerabilities are fixable --Other file systems do it --Require clients to authenticate themselves cryptographically. --But very hard to reconcile with statelessness. F. Concluding note --None of the above issues prevent NFS from being useful. --People fix their programs to handle new semantics. --Or install firewalls for security. --And get most advantages of transparent client/server. References --"RFC 1094": NFS v2 --"RFC 1813": NFS v3 3. Other distributed file systems (disconnected operation, etc.) --disconnected operation: where have we seen this? (answer: git) --long literature on this (Coda, Bayou, Andrew File System, etc., etc.) --------------------------------------------------------------------------- words about lab 6 gives you the sense you're programming a real piece of hardware. --complete with the confusing and frustrating manual (which is better than most) --this is actually a part of getting real hardware to work, unfortunately in fact, if you run JOS on real EE100-based network interface, your driver should work with it. this lab is a fair bit of work. you need to understand a bunch of things to make progress: --> how all the different environments fit together --> what the hardware expects from software --> how to actually provide that in software --> roughly what the sockets API is (roughly) --> roughly what an HTTP GET message looks like (roughly) --> how Web servers fit into this words about lab 7 --need to find project partners and write a proposal guest lectures they will be tested they are also, I am expecting, going to be very interesting! --------------------------------------------------------------------------- 4. Networking A. Intro B. Physical layer C. Big picture D. Link layer --------------- (next time) E. Network layer F. What do we mean by layering? G. ARP H. Zoom out I. Transport layer J. Application layer A. Intro --What's a network? --just a bunch of interconnected channels --railroad, highway, plumbing, communication, telephone --computer!!!! --computer networks are interesting --end-points highly programmable, middle kind of boring (only kind of). --can program all of the nodes! --extremely easy to innovate and develop new uses of the network --contrast: telephone network: end-points ridiculously simple, middle has complexity. --worse, can't program most phones, need FCC approval for new devices, no visibility, etc. --Going to describe the various layers of a network. Case study will be what happens when you gain access to a single Web page or send a single RPC in NFS. --If you're interested in this stuff, take classes in networking! Or program away! Or read the RFCs (short for "Request For Comments" but despite the name, they are standards). Few things are as open and well-documented as the various protocols that form, and run over, the Internet --Network classically explained as being divided into sharply distinguished layers. In reality, things are messier. But still incredibly useful to think about layering. --So begin by talking about the lowest layer and then we'll come back to layers again a bit later B. Physical layer --signals in a medium --medium: coaxial cable, twisted pair (Ethernet), fiber, radio --signals: endless innovation. different electrical profiles correspond to different sets of bits --some media are point-to-point: --fiber, twisted pair --some media are shared transmission medium (coax, radio) --any message can be seen by all nodes --but now there is contention --speed of light matters! --300,000 km/sec in a vacuum, slower in fiber --New York to CA: ~3000 miles = ~5000 km --propagation time: 5000 km / (300,000 km/sec) = ~17 msec --round-trip: ~34 msec, assuming no computation --Technology improvements are not going to fix this --But what the heck? I thought I keep reading that networks keep getting faster.... --*delay* is never going to improve as long as the theory of relativity stands --throughput -- bits per second -- improves ridiculously well --so how do we take advantage of this? --concept: bandwidth-delay product [DRAW CYLINDER: bandwidth is the height, delay is the length] --get full network utilization if you've got # bytes in flight = bandwidth*delay --but what if the network isn't doing bulk transfer? --then you'll get poor throughput. ping/pong (send a packet, wait for a response) has terrible throughput --this is one reason why concurrency is absolutely critical for good network utilization: a bunch of low-throughput flows may add up to good utilization Note that physical connectivity is rare..... --instead, communications usually "hop" through multiple devices --[DRAW PICTURE: source --> bunch of switches --> destination ] --Allows links and devices to be shared for multiple purposes --Must determine which bits are part of which messages intended for which destinations Two kinds of ways to create this indirect connectivity: --Circuit-switched: provide virtual links. Dump bits in at source, they come out at the destination --example: the old telephone network. dialing the number set up a virtual circuit (and before that, human operators set up an actual circuit) --Packet-switched: --Pack a bunch of bytes together intended for same destination --Slap a _header_ on packet describing where it should go --Most networks today are packet switched C. Before we go further, let's look at the big picture, or the classic Internet technologies: computer - LAN - router - cloud [lots of routers] - router - LAN - computer [ Web browser ------ TCP ----- IP ] D. Link layer Ethernet: classic technology History: developed at Xerox PARC, intended to help with the office of the future, amazing technology. used constantly. however, not used much in its original configuration (of shared medium) because many links now point-to-point. --but if you plug your computers into a hub, your hardware is still going to use Ethernet's key features. originally designed for shared medium (coaxial cable) Packets in Ethernet (and most link layers) are called **frames** [header: 14 bytes. then frame payload, then CRC] [preamble (8 bytes) dst src ethertype CRC] (DIX frames...Digital, Intel, Xerox) ethertype = 0x0800, 0x0806 preamble: helps device recognize start of packet CRC: helps device throw away corrupted packets payload: up to 1500 bytes (roughly) the payload and the other fields are usually set by the OS Where do Ethernet addresses, otherwise known as MAC addresses, come from? [assigned *to* different hardware manufacturers, who then install them in their products] [but you can reset it, which is one reason why tying access to MAC addresses is often easily circumvented: sniff the wire, learn someone else's MAC address, and take that one on.] Special Ethernet addresses for broadcast and multicast Medium Access Control (**MAC**) protocol governs access to coax --don't transmit when someone else is --CSMA/CD (carrier sense, multiple access, collision detection) --if you collide, can detect that, use randomized backoff and try again --need to transmit for at least RTT (measured from one end of extent to other) --(above is a bit of a simplification) Consequence: Ethernet has a maximum end-to-end extent and a minimum frame size (these are specified in standards documents). To see why..... The 10 Mbps ethernet standard specified a maximum end-to-end extent of 2.5 kms --> RTT = 5 kms / 1.25 x 10^5 km/sec = 40 microseconds 10 Mbps * 40 microseconds = 400 bits = 50 bytes note that the smallest *useful* packet size is 19 bytes, as we'll see below so what happened with "fast ethernet" of 100 Mbits/sec? and 1Gbps Ethernet? --for FastE, they reduced the maximum network diameter to 200 meters --for GigE, minimum packet size is 512 bytes --as Ethernet gets faster, this will get more ridiculous, but increasingly people aren't using Ethernet for its ability to manage a shared medium, so it's okay Ethernet is awesome, but it cannot scale to the world: --limit on number of nodes --limit on distance --forwarding state doesn't scale --want a lingua franca People address node limits and distance with **bridges** that connect two Ethernet networks. People also use **switches**, which connect lots more Ethernet networks --bridges/switches learn where all the devices are and avoid forwarding useless packets [table: dst_ether: link] --this technology is widely used in organizations, but it could never scale to the Internet (too many addresses) --moreover, we need a lingua franca, the **network layer** so that computers connected to different media (DSL, wireless, phone, whatever) can communicate