Class 25
CS 480-008
28 April 2016

On the board
------------

1. Last time
2. Network security
    A. Intro
    B. Sequence number and source IP forgery attacks
    C. Liveness, SYN flooding, DoS
    D. Attacks on routing protocols
    E. Scanning
    F. Discussion
3. Protecting network communications

---------------------------------------------------------------------------

1. Last time

    --finished discussion of "Non-scalable locks are dangerous"

    --began network security

2. Network security
    
    A. [last time] Intro

    B. Sequence number and source IP forgery

    Recall TCP sequence number attack.
      Suppose adversary A wants to simulate a connection to S from C.

        (Assume A knows C's IP address -- usually not a big deal in practice.)

        A: SRC=C, DST=S, SYN(SN_c)

        S: SRC=S, DST=C, SYN(SN_s), ACK(SN_c)

        A: SRC=C, DST=S, ACK(SN_s) <-- but how to guess SN_s?

        A: SRC=C, DST=S, data(SN_c)

      How could the adversary guess SN_s?
        Many hosts kept ISN variable, for use by next connection.
          Increment by 128 each second, 64 after each new connection.
          Helps avoid old packets from interfering with new connection.
          [ Ref: RFC 1185 appendix ]
        (ISN is "initial sequence number".)

        Adversary can make an ordinary connection to find out current ISN,
          then guess next one by adding 64.

      What happens to the real packet that S sends to C (second pkt)?
        C would assume the packet is from an old conn, send RST in response.
        Even if that RST was sent, adversary could try to race before RST arrives.
        Turns out attacker can suppress C; will get to that later.

      But why do sequence number attacks turn into a security problem?

    (1) Forging IP source address to services that authenticated based on IP address.
    rlogin example: 
     --Attacker can pretend to be a host in rlogin trusted list, send commands
        without needing to know a password.
     --rlogin made a bad assumption about what the TCP layer provided.
       Assumed TCP conn from an IP address meant it really came from that host.
     --So IP-based authentication seems like a bad plan!
        --No longer used for remote login (now we use SSH)
        --But still used in other situations, since better security is complex.
            --example: access control to digital libraries
            --example: Web sites look at where you log in from

    (2) Hijack existing connections.
      --If you can guess seq #s, can inject data into an existing connection.
        I.e. wait for someone to log in, then take over the connection.
        [ Ref: Blind TCP/IP hijacking is still alive, by lkm@phrack.org, 2007 ]
      --(This is a generalization of (1). The difference is that this
         one targets the case that the server *was* using password
         protection.)

    (3) Denial of service attack: connection reset.
      --If attacker can guess SN_c, can send a RST packet, and interfere
        with an existing connection
            (server first checks sequence number if the state is not
            "listen", per RFC 793; thanks GW)
        --Worse yet: server will accept a RST packet for any SNc value within window.
        --With a large window (~32K=2^15), only need 2^32/2^15 = 2^17 guesses.
      --How bad is a connection reset?
        One target of such attacks were the TCP connections between BGP routers.
        Causes routers to assume link failure, could affect traffic for minutes.
        Solutions:
          TTL hack (255).
          MD5 header authentication (very specialized for router-to-router links).


    How to mitigate attacks that forge IP source addresses?
      --Some applications now have end-to-end cryptographic authentication.
        E.g. ssh, ssl, Kerberos.
      --ISPs can filter packets with obviously forged IP source addresses.
        Often done today for small customers.
        Not straightforward for customers with complex networks, multihoming, ...

    How to harden TCP against forged IP source addresses?
      Make it harder for attacker to guess next ISN.
      Can't choose ISNs in a completely random way, without violating TCP spec.
        Need to avoid recently used sequence numbers for same host/port pair.
      Random increments?
        Can't increment too quickly; don't want to wrap very often.
        So not a huge amount of randomness (say, low 8 bits per increment).
      Aside: must be careful about how we generate random numbers!
        Common PRNG: linear congruential generator: R_k = A*R_{k-1}+B mod N.
        Not secure: given one pseudo-random value, can guess the next one!
        Lots of better cryptographically secure PRNGs are available.
          Ideally, use your kernel's built-in PRNG (/dev/random, /dev/urandom)
        [ Ref: http://en.wikipedia.org/wiki/Fortuna_(PRNG), or any stream cipher
          like http://en.wikipedia.org/wiki/RC4 ]
      However, SN values for different src/dst pairs never interact!

      So, can choose the ISN using a random offset for each src/dst pair.
        ISN = M + MD5(srcip, srcport, dstip, dstport, secret),
        where M depends on the increasing timer
        [see 
            https://tools.ietf.org/html/rfc1948 
            https://tools.ietf.org/html/rfc6528
        ]
        and we will need to use only 32 bits of the MD5 hash

        Requires no extra state to keep track of per-connection ISNs.
        The point: attacker can no longer make an ordinary connection in order
          to guess current ISN for a different client.

    Are forged source IP address attacks still relevant?
      Most operating systems implement the above per-connection ISN scheme.
        [ Ref: Linux secure_tcp_sequence_number in net/core/secure_seq.c ]
        [ Linux uses MD5 rather than SHA1. Good enough for the purposes
        at hand.]
      But other protocols suffer from similar problems -- e.g., DNS.
        DNS runs over UDP, no seq numbers, just ports, and dst port fixed (53).
        Client does basic sanity checks on reply packet.
        If adversary knows client is making a query, can fake a response.
          Just need to guess client port, often predictable.
        Popular attack starting in 2008.
          [ Ref: http://cr.yp.to/djbdns/forgery.html ]
          [ Ref: http://unixwiz.net/techtips/iguide-kaminsky-dns-vuln.html ]
        Solution: carefully take advantage of all possible randomness!
          DNS queries contain 16-bit query ID, and can randomize ~16 bit src port.
        Solution: DNSSEC (signed DNS records, including missing records).
          Problem: key distribution (who is allowed to sign each domain?)
          Problem: name enumeration (to sign "no such name" responses)
            Partially mitigated by NSEC3: http://tools.ietf.org/html/rfc5155
            [ see https://www.sidnlabs.nl/downloads/wp-2011-0x01-v2.pdf]

          Slow adoption, not much incentive to upgrade, non-trivial costs.
          Costs include both performance and administrative (key/cert management).

    C. Liveness and syn flooding

    Liveness is another big problem area for the network layer.

      --Even when there are no authentication problems,
        we still rely on network protocols to actually deliver the data!

      --"Denial of Service" (DoS) can be annoying, or part of blackmail,
        or an ingredient in a larger attack.

    SYN flooding -- the first high-profile DoS attack.

      Server must be able to check client's ACK(SNs) in 3rd packet.
        Original implementation kept state for each "half-open" connection.
        Kept it for minutes in case client is slow, or network lossy.
        Only willing to remember e.g. 50 half-open connections, to avoid out of memory.
        Silently ignored new connections if already had 50 waiting.

      The attack:
        Attacker sends SYN packet with forged random IP addresses.
          Most of the forged addresses don't respond,
          so server never gets 3rd packet.
        Fills up server's 50 half-open slots.
        Now server ignores legitimate connection requests!

      Hard to track down:
        Forged random source addresses.
        Low rate -- attacker only needed to send a few SYN packets per
          second, since servers kept half-open connections for minutes.

      These attacks appeared in 1996 and were a big problem for a while.

    Defense against SYN flooding: SYN cookies.
      Idea: make the server stateless, until it receives that third packet (ACK).
        Then server won't have half-open connections, and thus won't run out.
      Why is this tricky?
        Half-open state helped ensure source IP address wasn't forged,
        by checking that 3rd packet had the right ACK.
      Use a bit of cryptography so server doesn't have to keep state.
      Encode server-side state into sequence number.
        ISNs = SNc + (timestamp || SHA1(src/dst addr+port, secret, timestamp))
        Timestamp is coarse-grained (e.g., minutes).
        ISNs wraps around slowly assuming legitimate client choice of SNc.
        ISNs per-client, so attacker can't guess for a forged IP address.
        ISNs hash part changes, so not useful for long if one is stolen.
        SHA1: we'll use only 24 bits of this 
        [ Detailed ref: http://cr.yp.to/syncookies.html ]
      Server computes seq as above when sending SYN-ACK response.
      Server can verify state is intact by verifying hash on ACK's seq.
      SYN cookies have successfully blunted low-rate SYN-flooding DoS attacks.

    Another type of DoS attack: bandwidth amplification.
      Attacker's goal is to overwhelm server or link,
        so that legitimate traffic is discarded.
      Send ICMP echo request (ping) packets to the broadcast address of a network.
        E.g., 18.26.7.255.
        Used to be that you'd get an ICMP echo reply from all machines on network.
        What if you fake a packet from victim's address?  Victim gets all replies.
        Find a subnet with 100 machines on a fast network: 100x amplification!
        [ Ref: http://en.wikipedia.org/wiki/Smurf_attack ]
      Can we fix this?
        Routers now block "directed broadcast" (packets sent to broadcast address).
      Modern-day variant: DNS amplification.
        DNS is also a request-response service.
        With a small query, server might send back a large response.
        With DNSSEC, responses contain lots of signatures, so they're even larger!
        Since DNS runs over UDP, source address is completely unverified.
        [ http://blog.cloudflare.com/deep-inside-a-dns-amplification-ddos-attack ]
      Can we fix the DNS attack?
        Perhaps by fixing DNS servers to only respond to legitimate clients.
        Hard: many name servers must respond to open-ended set of clients.
        E.g. laptops off NYU campus, but configured with NYU DNS servers.

    Another type of DoS attack: application-level 
        --Legitimate-looking requests but ultimately spurious
        --This is tough to defend against; hope that the site can tell
        apart the legitimate users from the attackers

    D. Attacks on routing protocols

    Routing protocols: overly-trusting of participants.
      ARP: within a single Ethernet network.
        To send IP packet, need the Ethernet MAC address of router / next hop.
        Address Resolution Protocol (ARP): broadcast a request for target's MAC.
        Anyone can listen to broadcast, send a reply; no authentication.
        Adversary can impersonate router, intercept packets, even on switched net.

      DHCP: again, within a single Ethernet network.
        Client asks for IP address by sending a broadcast request.
        Server responds, no authentication (some specs exist but not widely used).
          If you just plugged into a network, might not know what to expect.
        Lots of fields: IP address, router address, DNS server, DNS domain list, ..
        Adversary can impersonate DHCP server to new clients on the network.
          Can choose their DNS servers, DNS domains, router, etc.

      BGP: Internet-wide 
        BGP routing system is huge; attackers control ISPs and BGP routers.
        Any BGP participant router can announce route to any IP address.
        Attack: announce you have a path to NYU, people route through you,
          you can inspect/modify traffic, and then forward to NYU.
        Attack: spammer announces unused address, sends spam, then goes away.
          Gets around IP-level blacklisting of spam senders: choose almost any IP!
        How to fix? S-BGP, RPKI, BGPsec.
          Sign original announcements.
          Trusted database of who is allowed to announce what IP prefixes.
          Sign paths, so others can verify length.
          Getting some traction but still not widely deployed.
          Database of what is allowed is a weak point.

    E. Scanning

    The open Internet makes it easy for attackers to gather useful info.
      Which hosts are running vulnerable software / protocols?
        Probing:
          Check if a system is listening on a well-known port.
          Protocols / systems often send an initial banner message.
        nmap can guess OS by measuring various impl-specific details.
          [ Ref: http://nmap.org/book/man-os-detection.html ]
        Use DNS to look up the hostname for an IP address; may give hints.
      Which hosts exist, e.g. to explore indirect attacks,
          or to gather botnets?
        traceroute to find routers along the way, for BGP attacks.
        Can also just scan the entire Internet: only 2^32 addresses.
          1 Gbps (100 MB/s) network link, 64 byte minimum packets.
          ~1.5M packets per second.
          2^32=4B packets in ~2500 seconds, or 45 minutes.
          zmap: implementation of this [ Ref: https://zmap.io/ ]

    F. Discussion
        Could one design Internet protocols that are "secure"?
          All packets have cryptographically verified source IP address?
          Track down DoS sources with these IP addresses?
          Require all users of TCP to use cryptography?

        How to improve security?
          Protocol-compatible fixes to TCP implementations.

          Firewalls.
            Only a partial fix, but still: widely used.
            Issue: adversary may be within firewalled network.
            Issue: hard to determine if packet is "malicious" or not.
            Issue: even for fields that are present (src/dst), hard to authenticate.
            TCP/IP's design not a good match for firewall-like filtering techniques.
            E.g., IP packet fragmentation: TCP ports in one packet, payload in another.

          Cryptographic security on top of TCP/IP: SSL/TLS, Kerberos, SSH, etc.
            A hard problem: protocol design, key distribution, trust, etc.
            Will talk about this more below

          Some kinds of security hard to provide on top: DoS-resistance, routing.

3. Protecting network communications

    Recall: two kinds of encryption schemes.
      E is encrypt, D is decrypt
      Symmetric key cryptography means same key is used to encrypt & decrypt
        ciphertext = E_k(plaintext)
        plaintext = D_k(ciphertext)
      Asymmetric key (public-key) cryptography: encrypt & decrypt keys differ
        ciphertext = E_PK(plaintext)
        plaintext = D_SK(ciphertext)
        PK and SK are called public and secret (private) key, respectively
      Public-key cryptography is orders of magnitude slower than symmetric

      Encryption provides data secrecy, often also want integrity.
      Message authentication code (MAC) with symmetric keys can provide integrity.
        Look up HMAC if you're interested in more details.
      Can use public-key crypto to sign and verify

    Strawman plan,
      Suppose A knows the public key of B.
      Don't want to use public-key encryption all the time (slow).
      Strawman protocol for establishing a secure connection between A and B:
        A generates a random symmetric session key S.
        A encrypts S for PK_B, sends to B.
        Now we have secret key S shared between A and B, can encrypt and
          authenticate messages using symmetric encryption

    Good properties of this strawman protocol:
      A's data seen only by B:
        Only B (with SK_B) can decrypt S.
        Only B can thus decrypt data encrypted under S.

    What goes wrong with this strawman?

      Adversary can record and later replay A's traffic; B would not notice.
        Solution: have B send a nonce (random value).
        Incorporate the nonce into the final master secret S' = f(S, nonce).
        Often, S is called the pre-master secret, and S' is the master secret.
        This process to establish S' is called the "handshake".

      Adversary can impersonate A, by sending another symmetric key to B
        and claiming that it (the adversary) is A. If B cares who A is,
        this is an attack. On the other hand, if B cares who A is, there
        are many possible solutions:
            --example: B also chooses and sends a symmetric key to A, encrypted
            with PK_A; then both A and B use a hash of the two keys
            combined. This is roughly how TLS client certificates work.

      Adversary can later obtain SK_B, decrypt symmetric key and all messages.
        * Solution: use a key exchange protocol like Diffie-Hellman,
           which provides forward secrecy
         [Or see, e.g., http://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html]
        * Another solution: client generates an ephemeral public/secret key pair:
            A -> B: PK_C, E_{PK_B}({S_c})  # A knows SK_C
	    B -> A: E_{PK_C}({S_s})
	    S = SHA1(S_s||S_c)
	    Now both parties use S as a symmetric key

    Hard problem: what if neither computer knows each other's public key?
      Common approach: use a trusted third party to generate certificates.
      Certificate is tuple (name, pubkey), signed by certificate authority.
      Meaning: certificate authority claims that name's public key is pubkey.
      B sends A a pubkey along with a certificate.
      If A trusts certificate authority, continue as above.

---------------------------------------------------------------------------

Acknowledgment: 6.858 staff