Class 24 CS 480-008 26 April 2016 On the board ------------ 1. Last time 2. Non-scalable locks are dangerous 3. Network security A. Intro B. Sequence number and source IP forgery attacks C. Liveness, SYN flooding, DoS D. Attacks on routing protocols E. Scanning F. Discussion --------------------------------------------------------------------------- 1. Last time --finished short peer-to-peer unit --short intro to concurrency: critical sections, spinlocks --whirlwind tour of a key primitive: spinlocks --began discussion of "Non-scalable locks are dangerous" 2. Non-scalable locks are dangerous, continued quick review of cache coherence protocol idea: there is a directory that contains, for every single cache line, the following info: [tag | state | core_ID] state can be Modified, Exclusive, Shared, Invalid Modified: some core has dirty data Exclusive: some core has the cache line, but there is no dirty data Shared: a bunch of cores have the thing cached, and it matches DRAM. Invalid: no one has it cached loads or stores can change the state, and generate cross-cache traffic; this is the cache coherence protocol in action. for example, a load of a cache line that is in the modified state causes the cache coherence protocol to go get the latest value. ASK: so what's going on in this paper? What does the Markov Chain model? Answer: concept of queue What's going on with the probability analysis? --idea: run the chain for a while; it enters steady state distribution --probabilities in that distribution are like freezing the process and asking, "what's the probability that the chain is in state 1, state 2, etc.?" --once those probabilities are known, one can compute the expected value of the distribution, which is the average number of waiting cores --the speedup is the number of total cores minus the number of waiting cores --the strange thing about this chain is that as n (# cores) increases, the number of waiting cores outpaces n. in other words, speedup collapses. How do they compute the steady state probabilities? here's an informal argument: --in steady state, there is balance: arrival rate = departure rate --look at state 0: P_0*a_0 = P_1*s_0 --look at state 1: P_0*a_0 + P_2*s_1 = P_1*s_0 + P_1*a_1, so P_1*a_1 = P_2*s_1 --the pattern is (you can use induction to show this): P_k*a_k = P_{k+1}*s_k And the big problem is that the bigger the queue, the longer it takes to move something out of the queue. Why? Answer: because of serialization (have to wait for half of the reads, and each read is going to take a while because of the DP.) ASK: why is there a sudden collapse? Answer: because service rate decays as number of waiters increases. effectively increases the length of a critical section. reach a collapse point where a new arrival is likely to show up while the critical section (really the tail end of it) is being executed. ASK: why are MCS locks better? Answer: because the time to hand off the lock doesn't depend on the number of waiters. Critique: some "chi by eye": they say, "It looks close, therefore our model is good." This is not how you validate statistical models. (Better way: goodness of fit tests. Chi-squared, Kolmogorov-Smirnov, etc.) ASK: do we really need MCS locks? what about proportional locks? What's better or worse about proportional locks compared to MCS locks? Better: lower overhead to acquire/release with proportional locks. See Figure 11. [Why is it lower overhead?] What's worse? Well, the authors don't really mount a strong critique of proportional locks. They seem to be saying, "We gave these locks the maximum benefit of the doubt, and there are cases where these locks wouldn't work." Meanwhile, "it might not work in principle" is not an ideal argument in the context of this paper, because the whole paper is about what does happen in realistic workload and hardware regimes. 3. Network security a big open network (the Internet) invites many attacks * authentication * liveness * privacy both host/host and inside network (routing, DNS, ARP, etc.) today: attacks on network protocols later: host-host cryptographic solutions going to look at some old attacks. why? core Internet protocols were designed in late 1970s / 1980s network was small; stakes were low; cryptography was expensive surely old attacks on ancient protocols are no longer relevant? surely modern protocols are vastly more secure? no-one knows how to do fundamentally better than TCP/IP much progress in secure higher layers: kerberos, ssh, TLS lots of fixes for specific low-level problems but the basic network-level security properties haven't changed much --> so it's worth understanding them example application: remote login circa 1980 in 1980, TCP but no cryptography -- like many applications today telnet -- just opens a TCP connection to login program what can an attacker do? * steal the password, etc. by snooping on the network * modify the data in flight * inject false data * re-direct entire conversation via routing BUT: all would have been hard on the ARPANET (early Internet) BUT BUT: advent of Ethernet made password sniffing a real danger rlogin -- don't send password destination host has a list of trusted host names (.rhosts file) lets user log in without password if source host is on trusted list why did rlogin seem OK? authors would not have claimed "secure" -- but perhaps "pretty good" big potential problem: attacker could put trusted client's IP address in the source address field BUT: TCP communication involves *both* directions if attacker lies about source, then server's replies won't go back to the attacker, so the attacker won't be able to execute TCP correctly. let's look at the details of TCP connection setup: Standard handshake: [below, "SN" = "sequence number"] C sends: src=C, dst=S, SYN(SN_c) S responds: src=S, dst=C, SYN(SN_s), ACK(SN_c) C finishes 3-way handshake: src=C, dst=S, ACK(SN_s) C sends data: src=C, dst=S, data(SN_c), ACK(SN_s) The main point: set up initial sequence numbers for data packets. Why might one think the server can know it is talking to C? Only C should have been able to receive the second message Thus, only C should know SN_s Server accepts third message only if it has the expected sequence numbers B. Sequence number and source IP forgery TCP sequence number attack. Suppose adversary A wants to simulate a connection to S from C. (Assume A knows C's IP address -- usually not a big deal in practice.) A: SRC=C, DST=S, SYN(SN_c) S: SRC=S, DST=C, SYN(SN_s), ACK(SN_c) A: SRC=C, DST=S, ACK(SN_s) <-- but how to guess SN_s? A: SRC=C, DST=S, data(SN_c) How could the adversary guess SN_s? Many hosts kept ISN variable, for use by next connection. Increment by 128 each second, 64 after each new connection. Helps avoid old packets from interfering with new connection. [ Ref: RFC 1185 appendix ] (ISN is "initial sequence number".) Adversary can make an ordinary connection to find out current ISN, then guess next one by adding 64. What happens to the real packet that S sends to C (second pkt)? C would assume the packet is from an old conn, send RST in response. Even if that RST was sent, adversary could try to race before RST arrives. Turns out attacker can suppress C; will get to that later. But why do sequence number attacks turn into a security problem? [see next time] --------------------------------------------------------------------------- Acknowledgment: Network security piece from 6.858 staff