Class 16
CS 480-008
31 March 2016

On the board
------------

1. Last time
2. Haven
    Intro
    Drawbridge
    Haven design
    Discussion
3. Some crypto concepts
    intro
    public key crypto
    Diffie-Hellman key exchange

---------------------------------------------------------------------------

1. Last time

    --defending against untrusted OSes

    --SGX

        Note that ARM processors also have security extensions:
        "TrustZone" technology. Supposedly, iOS (the OS on Apple
        iPhones) makes use of this.

        It looks like the protection offered is more coarse-grained than
        what SGX is providing. Processor is either in trusted mode, or
        not. Memory isn't encrypted by the processor. the "normal" OS
        has less work to do, in terms of managing resources.

    --review memory: does the division of labor between OS, Enclave, and
    processor leak info? (yes: OS sees faulting virtual page numbers)

2. Haven

    A. Intro

        Goal: "Our objective is to run existing server applications in the
        cloud with a level of trust and security roughly equivalent to a
        user operating their own hardware in a locked cage at a colocation
        facility."
            
        More specifically, the authors' goal is to execute *unmodified
        Windows applications* on a cloud platform, under the following
        Threat model:

          --System admins control cloud software

          --Remote attackers may control cloud software

          --OS may launch "Iago" attacks
            May pass arbitrary values to Haven
            May interrupt execution of Haven
          
          --Intel and the hardware are trusted:
            SGX design and fab is correct
            Intel's private key isn't compromised

        Unmodified Windows applications means that applications need an
        environment:
            send packets
            store files
            ...
            all the services of an operating system


        What's the challenge?

            --running binaries unmodified (so there needs to be OS-like
            services somewhere), but

            --OS is untrusted (so the OS has to be assumed to misbehave)

        Response:
            SGX + Drawbridge (2x)


    B. Drawbridge

        What is this?

        Old idea, new take on it: OS services in user space.
         
        High-level goal: isolate apps from each other as if they were
        running on different virtual machines, but without the overhead
        of virtual machines.

            Notice: the goal of this is to protect the platform from the
            untrusted application (which is the usual goal of isolation but
            the opposite of the starting goal for the Haven authors)

        Advantage of this approach versus Virtual Machines?
            (lighterweight)

        Advantage of this approach versus the isolation provided by
        normal process boundaries?
            Independent evolution of host OS and libOS
            Ability to migrate application state
            Stronger isolation; for example, different apps see
                different file systems

        How do they do it?

            win32, win8
            \         /
             \ libOS /
              \     /
               \   /
               DABI
         -------------------- 
            Host OS
    
        libOS is a (very large) library that exposes the entire windows
        interface to applications but that is implemented in terms of a
        much smaller set of primitives: 
            DABI = Drawbridge ABI
             ABI = Application Binary Interface

        Notice that if the DABI equals "the x86 instruction set", then
        this picture becomes a representation of virtual machines, and
        the implementer of the DABI (the "Host OS") is a
        conventional VMM (=Virtual Machine Monitor)

      
        Small interface protects host OS from application

    C. Haven design 

        See figure 2

        Shield module implements API inside enclave
            interacts with host OS using a narrow, untrusted API
            untrusted API is a subset of drawbridge's API (see figure 3)
                [the upcall that they remove is InitializeProcess]
 
        Untrusted runtime tunnels between shield in enclave and host kernel
            also used for bootstrap

        Host kernel contains SGX driver and drawbridge host
            drawbridge host implements the narrow API using OS calls

        Untrusted runtime calls SGX driver in host
            this driver is what calls ECREATE

        NOTICE: the Drawbridge ABI shows up *twice*:
            once to provide OS services, inside the enclave, in terms
                of something validated (libOS on shield)
            once to force calls through a narrow interface, where they
                can be validated (shield on top of untrusted runtime on 
                top of host OS)

            NOTE: untrusted API/ABI is a subset of Drawbridge's ABI

        Shield services

          Virtual memory

            Enclave starts at 0 (to handle null pointer dereferences by app, libos)
                otherwise:
                    VA 0 is outside the enclave
                    OS could create an association 
                        VA=0 --> PA=x,
                    where x is a valid physical page

                    this would mean that normal NULL pointer
                    dereferences no longer crash, and now are under the
                    control of the OS.

            Tracking memory pages used by application/libOS

            Adding/removing memory pages from enclave
              Verifies that changes have been made correctly

            Never allows host to pick virtual-memory addresses

            Doesn't allow application and libOS to allocate pages outside of enclave

          Threads

            user-level scheduling (e.g., so that existing bugs in
            mutexes aren't triggered)

            multiplexes threads on a fixed number of threads created at startup
              Allocate a fixed number of TCSs at start

        *Question*: How do they handle the problem of an untrusted OS
        mounting Iago attacks?

        two responses; both are arguably unsatisfying

            (1) libOS in the process. assumption is that it was
            validated a priori. okay.

            (2) shield validates responses from the untrusted host
            system. we don't really find out how this is done, or get
            formal guarantees. This could be a source of
            vulnerabilities...

                "The interface at the enclave boundary must allow the
                shield to verify the correctness of all operations.":
                can such verification can be done for all calls in all
                cases? the authors have designed the interface to make
                validation simpler, but without formal guarantees, it's
                hard to know if there are vulnerabilities.

        Running binaries unmodified

            main approach is to be careful about exceptions

            also have to emulate instructions
            [no magic here, just lots of work]

        Attestation
            more or less what you'd expect


    D. Discussion

        Q. What happens if the host OS cheats?
        A. Shield panics. (Their goal was never availability.)

        Q. Can Haven run unmodified apps?
           No: fork. Maybe a minor problem on Windows?
           No: Cannot map an enclave page at several virtual addresses
            The authors needed to modify applications

        Q. How do we know this is secure?

        Q. Should the authors do fuzz testing on the untrusted
        interface?

        Q. What is the relationship between this mechanism and privilege
        separation?

     
3. Some crypto concepts

    For millenia, cryptography was *symmetric-key*: two communicating
    parties share a key, and want the content of their messages to be
    hidden from any eavesdroppers.

        msg --> [Enc alg(k)] --> cipher text --> [Dec alg(k)] --> msg

    The Enc and Dec algorithms are parameterized by a *secret key*, k.
    The key k is known only to the two parties.

    Aside: _Kerckhoffs principle_:

        "The cipher method must not be required to be secret, and it
        must be able to fall into the hands of the enemy without
        inconvenience." (the quotation is from the textbook _Modern
        Cryptography_, by Katz and Lindell, 2008).

        Idea: **must** assume that your algorithms are public (and
        in fact, you should publicize them). The only secret should be
        the key itself.

        Why?

            --easier to secure a key than an algorithm
                (details of algorithm can be leaked by the person who
                wrote the code or designed the system; also, details of
                the algorithm can be inferred by reverse engineering;
                etc.)

            --if key is compromised, easier to update a key than to come
            up with a new algorithm

            --easier to standardize (should every pair of people have to
            use different algorithms?!)

        What's the alternative? "security through obscurity". This is
        not really an option. It's a terrible practice, and often leads
        to embarrassment and difficulty for organizations (like
        commercial companies who try to design their own crypto
        algorithms in secret):

            --public designs are reviewed. scrutiny leads to strength

            --if there are flaws, it's better if the ethical hackers
            have found them

            --trying to keep the algorithm private is inherently less
            secure, because more people are exposed to it (see above)


    Public key cryptography

        Motivation:

            With symmetric-key crypto, every pair of users needs a key...
            ...and needs to coordinate out of band to share the key

            This is completely unworkable in open systems, like today's
            Internet, where users do not meet physically, and users are
            constantly in touch with servers they have never associated
            with previously.

            Other motivating points:

                What if an authority holds key pairs, or maybe a key
                per-user, and uses the per-user key to set up shared
                keys on demand? Problem: the existence of an authority
                runs counter to the original goal ("only the two
                endpoints should be able to see the original message").

                It's much better if each user has to maintain a *single*
                secret, instead of a number of secrets proportional to
                the number of other users and services in the world.

            [Governments are willing to maintain a per-party secret;
            anecdote from Lindell about how US embassies decrypted
            communication.]

        So, the question: how can two parties, who have never met,
        communicate over *public channels*, to send each other *private*
        messages, with zero coordination?

            Until 1974, everyone assumed that it was impossible to do
            encryption without the two entities first sharing a secret.

                Merkle, in 1974, proposed public key cryptography. His
                paper was not understood at the time. Though his paper
                was submitted in 1974, it appeared only in 1978.

                Diffie and Hellman's 1976 paper "New Directions in
                Cryptography" also proposed public key cryptography, and
                was a thunderbolt.
               
                For that work, Diffie and Hellman were awarded this
                year's Turing Award.
                
                However, many (most?) experts believe that an injustice
                was done, and that the award should have included
                Merkle. (There is a famous picture of Diffie, Hellman,
                and Merkle; when the New York Times reported on the
                Turing Award, they cropped Merkle out of the photo!)

        It can be done!

            You have two keys: private key, public key

            Everyone knows your public key. Knowing the public key
            allows others to encrypt messages to you but not to decrypt
            messages.

            Also, knowing the public key allows others to check that you
            signed a document but not to forge your signatures.

            [We're ignoring the issue of how users learn each other's
            public keys. This is THE thorny issue with public key
            cryptography, and -- arguably -- it still hasn't been
            adequately solved. But it may be solved, yet: Max Krohn's
            keybase is promising.]

        We'll look at a few primitives:
            
            key exchange
            public-key encryption
            digital signatures

        Diffie-Hellman key exchange

            To avoid digressing into math, we are going to make some
            simplifying and wrong assumptions. We'll flag them below.
           
            Assume that we are working over the positive integers mod a
            prime, p:
                {1,2,....,p-1}
    
            g is selected to be an element that will *generate* this
            group, in the sense that:
                g^1 mod p,
                g^2 mod p,
                ...
                ...
                g^{p-1} mod p

            will be a permutation of the integers {1,2,....,p-1}

            Example: if p=5, our domain is {1,2,3,4}, and 2 is a
            generator, because:
                2^1 mod 5 = 2
                2^2 mod 5 = 4
                2^3 mod 5 = 3
                2^4 mod 5 = 1

                [And for all k >= 1, 2^{k} mod 5 = 2^{k mod 4} mod 5]

            p=5 is far too small for security.
               
            However, when p is of reasonable size (thousands of bits),
            the discrete log problem is assumed to be hard. To say that
            the discrete log problem is hard means, roughly, that if you
            are given g and g^x (mod p), you cannot compute x. In other
            words, you cannot take logs.
            
            FALSE ASSUMPTION: the protocol below is secure if we are
            working over a group where the discrete log problem is hard.
            In reality, for the protocol below to hold up, we need to be
            working over a different domain. We want a particular kind
            of subgroup of the domain above. (Technically, we want a
            subgroup where the Decisional Diffie-Hellman assumption is
            assumed to be hard. "DDH is hard" is a stronger assumption,
            and hence slightly lower quality, than "Discrete Log is
            hard". However, many existing cryptosystems are built on the
            assumption that DDH is hard, and it's considered to be a
            good quality (safe) cryptographic assumption.)

            Now, assume that Alice and Bob have agreed on p and a
            generator g, of the integers {1,2,....,p-1} [this is again a
            simplification because the true domain won't be {1,...,p-1}]

           
            Alice                                       Bob
            -----                                      -----

            choose x                                  choose y

            public key: g^x                           public key: g^y
            private key: x                            private key: y

               
                                 a=g^x
                              ------------->           
                              <-------------
                                 b=g^y
                       

            k1=b^x (mod p)                             k2=a^y (mod p)

    
                        Notice:
                        
                        k1=(g^y)^x = g^{yx} mod p
                        
                        k2=(g^x)^y = g^{xy} mod p

                        k1 = k2 

                        So Alice and Bob have a shared secret, based
                        only on knowledge of each other's public keys
                        (!!!)


---------------------------------------------------------------------------

References:

SGX Overview:
   http://www.pdl.cmu.edu/SDI/2013/slides/rozas-SGX.pdf

SGX Instructions overview 
    https://software.intel.com/sites/default/files/article/413936/hasp-2013-innovative-instructions-and-software-model-for-isolated-execution.pdf

SGX hardware
   https://jbeekman.nl/blog/2015/10/sgx-hardware-first-look/

SGX Security discussion:
    https://www.nccgroup.trust/uk/about-us/newsroom-and-events/blogs/2015/january/intel-software-guard-extensions-sgx-a-researchers-primer/

Iago attacks
   https://cseweb.ucsd.edu/~hovav/dist/iago.pdf

Drawbridge
   http://research.microsoft.com/pubs/141071/asplos2011-drawbridge.pdf
   http://research.microsoft.com/pubs/180156/bascule_eurosys13.pdf

Acknowledgments: MIT's 6.858 staff

Crypto
    A good text is "Modern Cryptography", by Jonathan Katz and Yehuda
    Lindell.