Class 12
CS 480-008
10 March 2016

On the board
------------

1. Last time
2. Access control
   --intro, review
   --DAC
   --MAC
   --capabilities (!= Linux capabilities)
3. admin notes
4. Capsicum
   --Description
   --Using Capsicum
   --Discussion

---------------------------------------------------------------------------

1. Last time

    --setuid, SFI
        
        --Note that extensions can still be wrecked by, say, buffer
        overflow. there are techniques that protect against that (CFI,
        XFI, etc.)

        --So SFI cannot enforce the integrity of the originally intended
        control flow

        --What about an infinite loop? (The authors discuss)

2. Access control

    Reminder: we want access control so that we can build isolation
    boundaries, while still letting programs do what they need to do.
        
        example: OKWS

        other examples:

          Programs that deal with network input:
            Parsing code should not have much privilege -- bugs.
            But rest of app may need to read and write various files.

          Programs that manipulate potentially untrusted file content:
            (gzip, media codecs, etc.)
            Restrict access privileges of e.g. image manipulation code.
            Allow rest of application to read and write user's files.
            [don't want bug in image manipulation to give control over
            the user's files.]

          Untrusted software downloaded from the network
            e.g. JavaScript and extensions in browsers
            Needs to talk to main browser to display things
            Better not read/write my files -- even though browser can

      
    Recall:
    Approach 1: run the code on another machine
    Approach 2: run the untrusted code inside a virtual machine.
    Approach 3: run the untrusted code in a separate process.
    Approach 4: Approach 3 + OS access control techniques
    Approach 5: SFI and binary sandboxing

    Today: continue with Approach 4; describe alternatives for three
    "sub-approaches"

    A. Alternative 1: DAC

      Each object has a set of permissions (an access control list).
        E.g., Unix files, with rwx permission bits.
        "Discretionary" means applications set permissions on objects (e.g., chmod).

      Each program runs with privileges of some principals.
        E.g., Unix user/group IDs.

      When program accesses an object, check the program's privileges to decide.
        "Ambient privilege": process's privileges used implicitly for each access.

           Name              Process privileges
             |                       |
             V                       V
          Object -> Permissions -> Allow?

      DAC is well-suited to time-sharing, where users own their own files,
        sometimes need to share, and programs unambiguously execute as
        as single specific user.

      What if code might be malicious, or exploited via buffer overflow?
        Don't want to run it with my full permissions!

      What if program acts as multiple principals, e.g. web server that
      uses its own files as well as fetching files for browsers?

      Problem: only root can create new principals, on most DAC systems.
           E.g., Unix, Windows.

      Problem: ambient authority makes it too hard to constrain malicious/buggy code.
        Too easy for some files to have the wrong permissions.

      Problem: ambient authority makes confused deputies all too likely.

      Problem: only root can create new principals, on most DAC systems.
        E.g., Unix, Windows.

      Problem: some objects might not have a clear configurable access control list.
        Unix: processes, network, ...
            (no real way in Unix to prevent processes from accessing the
            network. firewall rules are not specific to principals.)


    B. Alternative 2: MAC

      MAC enforces a set of policies (== rules) on an application.
        Rules set up by application writer, or administrator, or user.
        O/S enforces policy; program can't change them.

     [lots of versions of MAC; we're describing the abstract picture
     here.]

      Example policies:
        Can only access specified files, no others.
        Can access any file except specified files/directories.
        Cannot use the network.
        Can connect to host X over the network.

      The goal of MAC is to make it much harder for applications to make mistakes
        about what files etc. they access, or to be tricked into making mistakes.

      "Mandatory" in the sense that applications can't change this policy.
        (by contrast, in DAC, security policy is set by applications themselves
          (chmod, etc).)

           Name    Operation + caller process
             |               |
             V               V
          Object --------> Allow?
                             ^
                             |
          Policy ------------+

      MAC usually implemented by intercepting every system call
      (policy applied in a _reference monitor_.)
        Policies keyed by system call name and arguments.

      Each application has a policy file.
        Supplied by application vendor if more or less trusted.
        Supplied by admin or user otherwise.

      Example: Mac OS X sandbox ("Seatbelt").
      Pro: any user can sandbox an arbitrary piece of code, finally!
      Pro: can be applied to existing applications (with some work).
      Pro: can be very restrictive, yet allow precise sharing.
      Con: some operations can only be filtered at coarse grain
        E.g. shared memory can be only allowed or forbidden; not specific sharing.
      Con: can be difficult to determine security impact of syscall based on args.
        What does a pathname refer to?  Symlinks, hard links, race conditions, ..
      Con: programmer must separately write the policy + application code.
      Con: static -- not a programming tool.
        Hard for program itself to use to set up sandboxes with
        dynamically-determined privileges.

      Is it a good idea to separate policy from application code?
        Depends on overall goal.
        Good if user/admin wants to look at or change policy.
        Awkward if app developer needs to maintain both code and policy.
        For app developers, might help clarify policy.

    C. Alternative 3: capabilities

      Different plan for access control: capabilities.

      If process has a handle for some object ("capability"), can access it.

          Capability --> Object

      Useful mental model: file descriptor that a process inherits.

      Characteristics of capability systems:

        The only access logic is "does the process have the capability".
          There is no ambient authority, and thus no global name spaces.

        All resources are uniformly accessible via capabilities.

        Capabilities can't be forged.

        A process can give a capability to another process.

          Holding a capability automatically grants access to
          corresponding object.

      Why is this attractive?

        A sandbox can be set up with exactly the capabilities it needs.
          Including any capabilities needed to share with other processes.
          No capability -> no access.

        There is only one access/permission scheme for all kinds of objects.

        No ambient authority, so no confused deputy.


      Unix file descriptors are a form of capability.

          An FD refers to a specific file/socket/etc. (not a name, which
          might change).

          Holding a writable FD to a file allows process to write it,
          even if permissions have changed

          FDs can't be forged.

          FDs can be passed to other processes (inherited by fork, sent by sendmsg).

          Why aren't Unix FDs enough for sandboxing?
            Unix FDs allow operations like fchmod() that must be protected.
            Some Unix resources aren't addressed via FDs, e.g. processes.
            Many Unix system calls don't involve FDs, but must be protected.


    D. Alternative 4: pure capability-based OS (KeyKOS, etc.)
    
        Capabilities are really an idea for a totally different O/S
        design!

        Some O/S's have *only* capabilities (e.g. KeyKOS); interesting but hard.

        Message-passing channels (very much like file descriptors) are
        capabilities.

        Every application has to be written in a capability style.

        Capsicum claims to be more pragmatic: some applications need not
        be changed.


    E. NOTE: Linux capabilities are solving a different problem.

      Trying to partition root's privileges into finer-grained
      privileges.

      Represented by various capabilities: CAP_KILL, CAP_SETUID,
      CAP_SYS_CHROOT, ..

      Process can run with a specific capability instead of all of
      root's privs. This is not the same thing as what we meant by "capability"
      above. 
      
      Ref: capabilities(7), http://linux.die.net/man/7/capabilities


3. admin notes

    different definitions of "sandbox". "OS sandboxing" is what we're
    talking about today.

    lab 3c is relatively straightforward.... so you can take advantage
    of the stopped late hour clock to have a break...

4. Capsicum

    A. Description

    Capsicum adds capabilities to a name+DAC system.

    A process can be in normal mode, or in "capability mode".
        cap_enter() call switches to capability mode.
        Cannot exit capability mode!
        All children/descendants inherit capability mode.

        [analogy: fork()ing and then dropping privileges is like
        pdfork()ing and then calling cap_enter.]

    In capability mode:

        Access *only* allowed via capabilities.

        Capability is a kind of file descriptor, with some flags
        indicating allowed ops (read, write, seek, etc.).
            [why aren't file permissions enough?]
            [answer: this permits the ability to give some processes
            read access and some write access, etc., without using
            the mechanism of groups, which would be unwieldy or too
            coarse-grained]

        Lots of new system calls to allow access via capabilities.
          openat(fd, name, ...)
          unlinkat(fd, name, ...)
          fd = pdfork(); pdwait(fd); pdkill(fd);

        thus ability to pdkill() can be restricted, given away

    No root directory or current directory.

    General capsicum philosophy: no global namespaces.

      Why are the authors so fascinated with eliminating global namespaces?

      Global namespaces require some access control story (e.g., ambient privs).

      Hard to control access to objects in global namespaces.

   In capability mode, can only use file descriptors -- no global namespaces.

      Cannot open files by full path name: no need for chroot as in OKWS.

      Can still open files by relative path name, given fd for dir (openat).


   Cannot use ".." in path names or in symlinks: why not?
      In principle, ".." might be fine, as long as ".." doesn't go too far.

      Hard to enforce correctly.

      Hypothetical design:
        Prohibit looking up ".." at the root capability.
        No more ".." than non-".." components in path name, ignoring ".".

      Assume a process has capability C1 for /foo.
      Race condition, in a single process with 2 threads:
        T1: mkdir(C1, "a/b/c")
        T1: C2 = openat(C1, "a")
        T1: C3 = openat(C2, "b/c/../..")   ## should return a cap for /foo/a
            Let openat() run until it's about to look up the first ".."
        T2: renameat(C1, "a/b/c", C1, "d")
        T1: Look up the first "..", which goes to "/foo"
            Look up the second "..", which goes to "/"

  Do Unix permissions still apply?
      Yes -- can't access all files in dir just because you have a cap for dir.
      But intent is that sandbox shouldn't rely on Unix permissions.

  For file descriptors, add a wrapper object that stores allowed operations.

  Where does the kernel check capabilities?
      One function in kernel looks up fd numbers -- modified it to check caps.
      Also modified namei function, which looks up path names.
      Good practice: look for narrow interfaces, otherwise easy to miss checks.

  libcapsicum.
    Why do application developers need this library?
    Biggest functionality: starting a new process in a sandbox.

  fd lists.
    Mostly a convenient way to pass lots of file descriptors to child process.
    Name file descriptors by string instead of hard-coding an fd number.

    [also, partially helps deal with the fact that in Capsicum, delayed
    initialization doesn't work. This way: the parent creates all of the
    capabilities that are needed, entrusts them to a separate service or
    module, and then when the child needs a fd, it requests it from the
    service.]

  cap_enter() vs lch_start().

    What are the advantages of sandboxing using exec instead of
    cap_enter?

    Leftover data in memory: e.g., private keys in OpenSSL/OpenSSH.

    Leftover file descriptors that application forgot to close.

    Figure 7 in paper: tcpdump had privileges on stdin, stdout, stderr.

    Figure 10 in paper: dhclient had a raw socket, syslogd pipe, lease file.


  Advantages of Capsicum: any process can create a new sandbox.
    (Even a sandbox can create a sandbox.)

  Advantages: fine-grained control of access to resources (if they map to FDs).
    Files, network sockets, processes.

  Disadvantage: weak story for keeping track of access to persistent files.

  Disadvantage: prohibits global namespaces, requires writing code differently.

   B. Using Capsicum in applications

      General plan:
        Some setup in non-capability mode -- open needed directories, etc.
        Then switch to capability mode.
        From then on, application must use openat(), etc.
        -> applications need to be modified to intentionally constrain
           themselves with capsicum, and to use capabilities.

      tcpdump.
        tcpdump snoops on LAN, parses packets w/ complex code, juicy target!

        needs superuser to open "pcap" [stands for "packet capture"; no
        relation to cap-abilities] -- then should have almost no privileges!

        2-line version (Figure 6): just cap_enter() after opening all FDs.

        Used procstat to look at resulting capabilities.

        8-line version (Figure 7): also restrict stdin/stdout/stderr.

        Why?  E.g., avoid reading stderr log, changing terminal settings, ..

        the point: now tcpdump can't do *anything* other than read
          packets, compute, write stdout.

      dhclient.

        dhclient sends/receives "raw" network packets, and then
        configures network interfaces.

        so it needs to retain privilege.

        but it also parses DHCP packets from whoever, so risk of e.g.
        buffer overflow.

        So use privilege separation:
          fork()
          parent opens raw socket, cap_enter(), send/recv packets, notify child.
          child runs as root, waits for info from parent, configures network interface.

      gzip.

        compression/decompression code may have exploitable bugs; people
        often decompress files from untrusted sources.

        Fork/exec sandboxed child process, send it file capabilities
        over a pipe.

          Child in cap mode, has no other capabilities, thus can't see
          any files, etc.

        Substantial changes, mostly to marshal/unmarshal data for RPC: 409 LoC.

        Interesting bug: forgot to propagate compression level at first.

      Chromium.

        Want to render HTML, run JS, etc. in separate sandboxed processes.
          They talk back to main browser process.

        Already privilege-separated on other platforms (but not on FreeBSD).

        ~100 LoC to wrap file descriptors for sandboxed processes.

      QUESTION: how would you apply Capsicum to OKWS?

    C. Discussion

    (1) How does this avoid the Confused Deputy?
        No ambient privilege

    (2) Does Capsicum achieve its goals?

      How hard/easy is it to use?

        Using Capsicum in an application almost always requires app changes.
          To open files with openat(), etc.
          One exception: Unix pipeline apps (filters) that just operate on FDs.

        Suggested plan: sandbox and see what breaks.
          Might be subtle: gzip compression level bug.

      What are the security guarantees it provides?

        Guarantees provided to app developers: sandbox can operate only on open FDs.

        Implications depend on how app developer partitions application, FDs.

        User/admin doesn't get any direct guarantees from Capsicum.
          Unlike MAC schemes, wher user/admin directly specifies policy.

        Guarantees assume no bugs in FreeBSD kernel (lots of code), and
        that the Capsicum developers caught all ways to access a
        resources without using FDs.

      What are the performance overheads?  (CPU, memory)
        Minor overheads for accessing a file descriptor.
        Setting up a sandbox using fork/exec takes order of msecs, non-trivial.
        Privilege separation can require RPC / message-passing, perhaps noticeable.

      Adoption?
        In FreeBSD's kernel now, enabled by default (as of FreeBSD 10).
        A handful of applications have been modified to use Capsicum.
          dhclient, tcpdump, and a few more since the paper was written
          [ Ref: http://www.cl.cam.ac.uk/research/security/capsicum/freebsd.html ]
        Casper daemon to help applications perform non-capability operations.
          E.g., DNS lookups, look up entries in /etc/passwd, etc.
          [ Ref: http://people.freebsd.org/~pjd/pubs/Capsicum_and_Casper.pdf ]

    (3) What applications wouldn't be a good fit for Capsicum?

      Apps that need to handle human-oriented file/directory names.

        Names seem to require ambient authority, fit badly with
        capability's direct refs.

      Apps that need to control access to non-kernel-managed objects.
        E.g.: imagine a windowing system at user level, where each
           window is a construct of the user-level window server (X
           Windows works this way).
        Capsicum treats pipe to a user-level server (e.g., X server) as one cap.
            So access to user-level server is all or nothing

      Apps that need to connect to specific TCP/UDP addresses/ports from sandbox.

        Capsicum works by only allowing operations on existing open FDs.

        Need some other mechanism to control what FDs can be opened.

        Possible solution: helper program can run outside of capability mode,
          open TCP/UDP sockets for sandboxed programs based on policy.

    (4) Compare OS mechanisms to non-OS mechanisms for sandboxing

      OS mechanisms for access control are great if privilege boundaries
      can be made to align well with OS objects (process boundaries,
      UIDs, etc.), and if file granularity matches desired granularity
      of permissions. But that's not always how things work.

      The non-OS mechanisms (SFI, etc.) may be a better fit, depending
      on the application
        --Those allow for finer-grained control (for example, access to
          different objects)
        --And they make it possible to consider isolating in a way that
        is OS-independent


References:
  http://reverse.put.as/wp-content/uploads/2011/09/Apple-Sandbox-Guide-v1.0.pdf
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/prctl/seccomp_filter.txt;hb=HEAD
  http://en.wikipedia.org/wiki/Mandatory_Integrity_Control

------------------

Acknowledgment: MIT 6.858 staff