Class 9
CS 480-008
23 February 2016

On the board
------------

1. Last time
2. Unix's mechanisms for isolation and controlled sharing
    UIDs, GIDs,
    setting process IDs (login, setresuid, /etc/passwd)
    files
    chmod, chown
     chroot
3. Admin and lab notes
4. OKWS

---------------------------------------------------------------------------

1. Last time

    --finished user authentication discussion

    --motivated privilege separation

2. Unix's mechanisms

  --Unix is the context for lab 3 and Tuesday's paper (OKWS)

  --Unix actions are taken by processes.

    A process is a running program.

    Processes are the most basix Unix tool for keeping code/data separate.

    A process's user ID (UID) controls many of its privileges.
      A UID is a small integer.
      Superuser (UID=0) bypasses most checks.
      A process also has a set of group IDs (GIDs) used in file permissions.

  --Sharing often depends on naming.

    If a process can name something, it can often access it.

    More important: if it *can't* name something, it usually *can't* use it.
      We can isolate a process by limiting what names it can use.
      (sounds simple, but it's a deep idea)

    So we want to know about the name-spaces Unix provides:
      PIDs, UIDs, memory, files, file descriptors, network connections.

  --What types of objects does Unix let processes manipulate?

    I.e. what do we need to control to enforce isolation, allow precise sharing?

    Processes.
      Processes with same UID can send signal, wait for exit & get
        status, debug (ptrace) each other.
      Otherwise not much direct interaction is allowed.
      Debugging, sending signals: must have same UID (almost).
        Various exceptions; this gets tricky in practice.
      Waiting / getting exit status: must be parent of that process.
      So: processes are reasonably well isolated for different UIDs.

    Process memory.
      One process cannot directly name or access memory in another process.
      Exceptions: debug mechanisms (ptrace), memory mapped files.
      So: process memory is reasonably well isolated for different UIDs.

    How is a process's UID set?
      Superuser (UID 0) can call setuid(uid) and setgid(gid).
        (In the labs, you will use setresuid() and setresguid())
      Non-superuser processes can't change their UID (to first approx)
      UID/GID often initially set by login, from /etc/passwd
        (finds user's UID based on /etc/passwd,
         finds user's groups based on /etc/group)
      login runs as root. after checking passwd against /etc/passwd 
        (using hashing, salt, etc.), then:
        --calls setuid(), setgid(), setgroups() [thereby dropping
        privileges] and then
        --runs user's shell
      UID inherited during fork(), exec().

    Files, directories.
      File operations: read, write, execute, change perms, ..
      Directory operations: lookup, create, remove, rename, change perms, ..
      Each inode (logical file) has an owner user and group.
      Each inode has read, write, execute perms for user, group, others.
        E.g. "george staff rwxr-x---"
      Who can change a file's permissions?  Only its owner (process UID).
      Execute for directory means being able to lookup names (but not ls).
      Checks for process opening file /etc/passwd:
        Must be able to look up 'etc' in /, 'passwd' in /etc 
            (x permission on the directories).
        Must be able to open /etc/passwd (r or w permission on the file).

      Unix rwx scheme is simple but not very expressive;
        cannot e.g. have two owners, or permissions for specific users.
      
        Suppose you want file readable to intersection of group1 and group2.
            Is it possible to implement this in Unix?

      So: can control which processes (UIDs) can access a specific file.
        But hard to control the set of files a specific process can access.

      **Useful tools:
         chmod, chown

         chmod: changes permissions. octal masks.
         chown: changes user owner (and optionally group)
         chgrp: changes group owner

    File descriptors (FDs).
      A process has one FD per open file and open IPC/network connection.
      File access control checks performed at file open.
        Once process has an open file descriptor, can continue accessing.
      Processes cannot see or interfere with each others' FDs.
      Processes can pass file descriptors (via Unix domain sockets).
      So: FDs are well isolated -- process-local names, not global.

    Local IPC -- "Unix domain sockets" -- socketpair().
      OKWS uses these for most of its inter-server communication.
      As used by OKWS, they have no names.
      A process can create a connection -- gets two FDs.
      It can then give the connection end FDs to other processes,
        either via fork()/exec() or by sending over existing connections.
      So: Unix domain connections are well isolated.

    Networking.
      Operations:
        bind to a port
        connect to some address
        read/write a connection
        send/receive raw packets

      Rules:
        - only root (UID 0) can bind to ports below 1024;
          (e.g., arbitrary user cannot run a web server on port 80.)
        - any process can connect to any port as a client.
        - can only read/write data on connection that a process has an fd for.
          (not really true; bad people may snoop/inject on network)
          (So: servers have to be careful who they talk to.)
        - only root can send/receive raw packets.
      Additionally, firewall (possibly running on server itself)
         imposes its own checks, unrelated to processes.

    One more Unix isolation trick: chroot()
      Problem: it is too hard to ensure that there are no
        sensitive files that a program can read, or write;
        100,000+ files in a typical Unix install; applications
        are often careless about setting permissions.
      Solution: chroot(dirname)
        causes / to refer to dirname for this process and descendants,
        so they can't name files outside of dirname.
      e.g. chroot("/var/okws/run") causes subsequent absolute pathnames
        to start at /var/okws/run, not the real /.
        Thus the program can only name files/dirs under /var/okws/run.
      chroot() is typically used to prevent a process from interacting
        at all with other processes via files, i.e. complete isolation.

    Overall, Unix is awkward at precisely-controlled isolation+sharing:
      Many global name spaces: files, UIDs, PIDs, ports.
        Each may allow processes to see what others are up to.
        Each is an invitation for bugs or careless set-up.
      No idea of "default to no access".
        Thus hard for designer to reason about what a process can do.
      No fine-grained grants of privilege.
        Can't say "process can read only these three files."
        Privileges are coarse-grained, via UID, or implicit, e.g. wait() for children.
      Chroot() and setuid() can only be used by superuser.
        So non-superusers can't reduce/limit their own privilege.
        Awkward since security suggests *not* running as superuser.

    Why is it a security vulnerability if choot is setuid root?
    like what happens if user processes can "confine" themselves?
      attack:
        --attacker sets up jail's directory /tmp/dir
        --within /tmp/dir, hard link to 
          passwd, su, login programs (not many restrictions placed on
          hard linking):
            /tmp/dir/sbin/passwd
            /tmp/dir/sbin/login
            etc.
        --create fake /tmp/dir/etc/passwd 
        --chroot() into /tmp/dir
        --the binaries are hard-coded to look at /etc/passwd. when they
        run in the jail, they will be looking at the wrong version
        --yet, they will have privilege (they are setuid)
        --result: they will apply their privilege to the wrong
        environment, and allow the attacker to, say, login as root...

3. Admin and lab notes

    admin:
        no class Thursday
        makeup on Friday?


    lab notes:

        make sure you understand the source code; do some code reading
        every day, to get familiar with it.
       
                         (exec)
        zookd --> zookfs ------> zoobar/index.cgi --> __init__.py :
                                                      registers
                                                           /users
                                                           /transfer
                                                           /login
                                                           etc.

        (CGI: mechanism for passing data between Web connections and
        normal OS processes.)

    result:

        browser visits
            http://localhost:8080/zoobar/index.cgi/transfer
        and
            a python script is running

        Question: what userid and groupid will that python script be
        running as?


4. OKWS

    --Background:
        thespark, Max, NYU, performance, ...
        OKWS was done at NYU (in what's now Prof. Jinyang Li's office!)
        Still running at okcupid...

    --Goal is least privilege

    --Fundamental technical problem:

        Unix makes it tricky to reduce privileges (chroot, UIDs, ..)

        Applications need to share state in complicated ways.

        Unix and SQL databases don't have fine-grained sharing control mechanisms.
       
    --How does OKWS partition the web server?

      Figure 1 in paper.

      How does a request flow in this web server?

        okld starts all other processes, from a config file.

        okd -> oklogd
            -> pubd
            -> svc -> dbproxy
                   -> oklogd

      How does this design map onto physical machines?

        Several DB machines (dbproxy, DB)

        A few machines for okld, okd, services (not many machines needed)

        Separate servers for static content (OKWS is solving problems
          that arise when content is dynamic)
    
           contrast to lab3: there, a single system handles static and
           dynamic content; in part A of the lab, you will have static
           and dynamic content handled in different _processes_.

   --Why this a privilege separation arrangement?

      Most bugs will be in svc code.

      So think "attacker has injected code into a svc; what can attacker
      do now?"

      High-level picture: OKWS isolates each svc so it can access only
      relevant data.

        E.g. buffer overflow in e-mail service won't let attacker see passwords.


   --How do the components interact?

      okld sets up socketpairs (Unix domain sockets aka bidirectional
      pipes) for each service.

        One socketpair for control RPC requests (e.g., "get a new log socketpair").

        One socketpair for logging (okld has to get it from oklogd first via RPC).

        For HTTP services: one socketpair for forwarding HTTP connections.

        For okd: the server-side FDs for HTTP services' socketpairs (HTTP+RPC).

      Services talk to DB proxy over TCP (connect by port number).

        Most state in DB, most interaction via state in DB.

      okd listens on a separate socket for control requests (repub, relaunch).
        Seems to be port 11277 in Figure 1, but a Unix domain socket in OKWS code.
        For repub, okd talks to pubd to generate new templates,
          then sends generated templates to each service via RPC control channel.

    --What does it take for okld to launch a service?
        --Create socket pairs
        --Get new socket that is connected to oklogd
        --fork, setuid/setgid, exec the service
        --Pass control sockets to okd

    Then:
        Components communicate via pipes (or rather, Unix domain socket
        pairs).

        File descriptor passing used to pass around HTTP connections.

    How does OKWS enforce isolation between components in Figure 1?

        [will cover this next time]

      * okld runs each service with a separate UID and GID.
        So services can't read/write each other's memory.

      * okld uses chroot to confine each process to a separate directory (almost).
        Services can't read/write *any* files (system files, app state, etc.).
        pubd and oklog can only get at their own files.

      * Why is okld a separate process?
        Must run as superuser to bind to port 80, call chroot() and setuid().
        We want as little code as possible to run as superuser.

      * Why is okd a separate process?
        --We need a way to route HTTP requests to the right svc.
        --okd sees all requests, so we don't want to do anything else in okd.
        --note: okd does *not* run as superuser; it is given port 80 by okld
          (in the form of an already bound socket)

      * Why is oklogd a separate process?
        --We don't want corrupt svc to delete/overwrite log files.
        --More generally we don't want svcs to have any access to files
            (too bug-prone).
        --So all file accesses go through a separate process
        [explain result of attacked svc:
           overwrite (ruled out) versus append-with-noise (possible)]

      * Why is pubd a separate process?
        --Keeps file handling code out of svcs.

      * Why are database proxies separate? Why not let svcs talk to the DB?

        Force communication through narrow interface: RPC
            [idea: "RPC communication channel is less error-prone that
            HTTP messaging."]

        Ensure that each service cannot fetch wrong data, if it is compromised.

          DB proxy protocol defined by app developer, depending on what
          app requires.

          Proxy enforces overall query structure (select, update), but
          allows client to fill in query parameters.

        Where does the 20-byte token come from?  Passed as arguments to service.

        Who checks the token?  DB proxy has list of tokens (& allowed queries?)

        Who generates token?  Not clear; manual by system administrator?

        What if token disclosed?
          e.g. compromised chat svc could issue queries as e-mail service.
            and read/write any user's e-mail.

---------------------------------------------------------------------------

Further reading on privilege separation:

    "Make Least Privilege a Right Not a Privilege",
        Krohn et al., HotOS 2005
        [picks up where OKWS left off. uses the hoops that OKWS has to
        jump through to motivate new OSes; see research OSes below]
        https://www.usenix.org/legacy/event/hotos05/final_papers/full_papers/krohn/krohn.pdf

    "Preventing Privilege Escalation",
        Provos, Friedl, Honeyman on Privilege-Separated OpenSSH,
        Proc. Usenix Security, 2003
        [This is a classic; it's how openssh is implemented. It also
        explains the mechanism of file descriptor passing, and relates
        it to capabilities. This paper is related to OKWS.]
        http://www.peter.honeyman.org/u/provos/papers/privsep.pdf

    "Privman: A Library for Partitioning Applications", 
        Kilpatrick, Usenix Technical, 2003
        [The name says it all.]
        https://www.usenix.org/legacy/events/usenix03/tech/freenix03/full_papers/kilpatrick/kilpatrick.pdf

    "Wedge: Splitting Applications into Reduced-Privilege Compartments",
        A. Bittau, P. Marchenko, M. Handley, and B. Karp.,
        Proc. NSDI, 2008
        [A great read that works through the mechanics of fine-grained
        partitioning]
        https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/bittau/bittau.pdf

    Research OSes that incorporate DIFC (Distributed Information Flow
    Control): 

        Asbestos
            "Labels and Events in the Asbestos Operating System",
                Efstathopoulos et al., SOSP 2005

        HiStar
            "Making Information Flow Explicit in HiStar",
                Zeldovich et al., OSDI 2006

        Flume
            "Information Flow Control for Standard OS Abstractions",
                Krohn et al., SOSP 2007

        One of the key motivating examples for these research OSes was
        OKWS, namely that in order to use Unix is a least-privileged
        way, one has to jump through hoops. That motivated the search
        for better, simpler primitives.

        This in turn led to a research craze on DIFC (lasting from
        2005-2009), which revisited ideas from the 1970s surrounding
        building hardened operating systems for defense applications. 

---------------------------------------------------------------------------

Acknowledgment: MIT's 6.858 staff