Class 24
CS 480-008
26 April 2016

On the board
------------

1. Last time
2. Non-scalable locks are dangerous
3. Network security
    A. Intro
    B. Sequence number and source IP forgery attacks
    C. Liveness, SYN flooding, DoS
    D. Attacks on routing protocols
    E. Scanning
    F. Discussion

---------------------------------------------------------------------------

1. Last time

    --finished short peer-to-peer unit
    
    --short intro to concurrency: critical sections, spinlocks

    --whirlwind tour of a key primitive: spinlocks

    --began discussion of "Non-scalable locks are dangerous"

2. Non-scalable locks are dangerous, continued

    quick review of cache coherence protocol

        idea: there is a directory that contains, for every single
        cache line, the following info:
            [tag | state | core_ID]

        state can be Modified, Exclusive, Shared, Invalid

            Modified: some core has dirty data

            Exclusive: some core has the cache line, but there is no
            dirty data

            Shared: a bunch of cores have the thing cached, and it matches
            DRAM.

            Invalid: no one has it cached

        loads or stores can change the state, and generate cross-cache
        traffic; this is the cache coherence protocol in action.

        for example, a load of a cache line that is in the modified
        state causes the cache coherence protocol to go get the latest
        value.

  
    ASK: so what's going on in this paper? What does the Markov
    Chain model?
        Answer: concept of queue

        What's going on with the probability analysis?
            --idea: run the chain for a while; it enters steady state
            distribution
            --probabilities in that distribution are like freezing the
            process and asking, "what's the probability that the chain
            is in state 1, state 2, etc.?"
            --once those probabilities are known, one can compute the
            expected value of the distribution, which is the average
            number of waiting cores
            --the speedup is the number of total cores minus the number
            of waiting cores
            --the strange thing about this chain is that as n (# cores)
            increases, the number of waiting cores outpaces n. in
            other words, speedup collapses.

        How do they compute the steady state probabilities?
            here's an informal argument:
            --in steady state, there is balance: 
                 arrival rate = departure rate
            --look at state 0:
                P_0*a_0 = P_1*s_0
            --look at state 1:
                P_0*a_0 + P_2*s_1 = P_1*s_0 + P_1*a_1, so
                   P_1*a_1 = P_2*s_1
            --the pattern is (you can use induction to show this):
                  P_k*a_k = P_{k+1}*s_k

    And the big problem is that the bigger the queue, the longer it
    takes to move something out of the queue. Why?

        Answer: because of serialization (have to wait for half of the
        reads, and each read is going to take a while because of the DP.)

    ASK: why is there a sudden collapse?

        Answer: because service rate decays as number of waiters
        increases. effectively increases the length of a critical
        section. reach a collapse point where a new arrival is
        likely to show up while the critical section (really the
        tail end of it) is being executed.

    ASK: why are MCS locks better?

        Answer: because the time to hand off the lock doesn't depend on
        the number of waiters.

    Critique: some "chi by eye": they say, "It looks close, therefore
    our model is good." This is not how you validate statistical models.
    (Better way: goodness of fit tests. Chi-squared, Kolmogorov-Smirnov,
    etc.)

    ASK: do we really need MCS locks? what about proportional locks?
    What's better or worse about proportional locks compared to MCS
    locks?

        Better: lower overhead to acquire/release with proportional
        locks. See Figure 11. 
            [Why is it lower overhead?]

        What's worse? Well, the authors don't really mount a strong
        critique of proportional locks. They seem to be saying, "We gave
        these locks the maximum benefit of the doubt, and there are
        cases where these locks wouldn't work." 

        Meanwhile, "it might not work in principle" is not an ideal
        argument in the context of this paper, because the whole paper
        is about what does happen in realistic workload and hardware
        regimes.

3. Network security

  a big open network (the Internet) invites many attacks
  * authentication
  * liveness
  * privacy
  both host/host and inside network (routing, DNS, ARP, etc.)

  today: attacks on network protocols
  later: host-host cryptographic solutions

  going to look at some old attacks. why?

      core Internet protocols were designed in late 1970s / 1980s
        network was small; stakes were low; cryptography was expensive

      surely old attacks on ancient protocols are no longer relevant?
        surely modern protocols are vastly more secure?

      no-one knows how to do fundamentally better than TCP/IP

      much progress in secure higher layers: kerberos, ssh, TLS 

      lots of fixes for specific low-level problems

      but the basic network-level security properties haven't changed much
        --> so it's worth understanding them

    example application: remote login circa 1980

      in 1980, TCP but no cryptography -- like many applications today

      telnet -- just opens a TCP connection to login program

        what can an attacker do?
        * steal the password, etc. by snooping on the network
        * modify the data in flight
        * inject false data
        * re-direct entire conversation via routing

        BUT: all would have been hard on the ARPANET (early
        Internet)

        BUT BUT: advent of Ethernet made password sniffing a real danger

      rlogin -- don't send password

        destination host has a list of trusted host names (.rhosts file)

        lets user log in without password if source host is on trusted list

    why did rlogin seem OK?
      authors would not have claimed "secure" -- but perhaps "pretty good"
      big potential problem: attacker could put trusted client's IP
        address in the source address field
      BUT: TCP communication involves *both* directions

        if attacker lies about source, then server's replies won't go
        back to the attacker, so the attacker won't be able to execute
        TCP correctly.

    let's look at the details of TCP connection setup:
      Standard handshake:
      [below, "SN" = "sequence number"]

        C sends:
            src=C, dst=S, SYN(SN_c)

        S responds:
            src=S, dst=C, SYN(SN_s), ACK(SN_c)

        C finishes 3-way handshake:
            src=C, dst=S, ACK(SN_s)
    
        C sends data:
            src=C, dst=S, data(SN_c), ACK(SN_s)

      The main point: set up initial sequence numbers for data packets.

      Why might one think the server can know it is talking to C?
        Only C should have been able to receive the second message
        Thus, only C should know SN_s
        Server accepts third message only if it has the expected
            sequence numbers

    B. Sequence number and source IP forgery

    TCP sequence number attack.
      Suppose adversary A wants to simulate a connection to S from C.

        (Assume A knows C's IP address -- usually not a big deal in practice.)

        A: SRC=C, DST=S, SYN(SN_c)

        S: SRC=S, DST=C, SYN(SN_s), ACK(SN_c)

        A: SRC=C, DST=S, ACK(SN_s) <-- but how to guess SN_s?

        A: SRC=C, DST=S, data(SN_c)

      How could the adversary guess SN_s?
        Many hosts kept ISN variable, for use by next connection.
          Increment by 128 each second, 64 after each new connection.
          Helps avoid old packets from interfering with new connection.
          [ Ref: RFC 1185 appendix ]
        (ISN is "initial sequence number".)

        Adversary can make an ordinary connection to find out current ISN,
          then guess next one by adding 64.

      What happens to the real packet that S sends to C (second pkt)?
        C would assume the packet is from an old conn, send RST in response.
        Even if that RST was sent, adversary could try to race before RST arrives.
        Turns out attacker can suppress C; will get to that later.

      But why do sequence number attacks turn into a security problem?
      [see next time] 


---------------------------------------------------------------------------

Acknowledgment: Network security piece from 6.858 staff