Class 12 CS 480-008 10 March 2016 On the board ------------ 1. Last time 2. Access control --intro, review --DAC --MAC --capabilities (!= Linux capabilities) 3. admin notes 4. Capsicum --Description --Using Capsicum --Discussion --------------------------------------------------------------------------- 1. Last time --setuid, SFI --Note that extensions can still be wrecked by, say, buffer overflow. there are techniques that protect against that (CFI, XFI, etc.) --So SFI cannot enforce the integrity of the originally intended control flow --What about an infinite loop? (The authors discuss) 2. Access control Reminder: we want access control so that we can build isolation boundaries, while still letting programs do what they need to do. example: OKWS other examples: Programs that deal with network input: Parsing code should not have much privilege -- bugs. But rest of app may need to read and write various files. Programs that manipulate potentially untrusted file content: (gzip, media codecs, etc.) Restrict access privileges of e.g. image manipulation code. Allow rest of application to read and write user's files. [don't want bug in image manipulation to give control over the user's files.] Untrusted software downloaded from the network e.g. JavaScript and extensions in browsers Needs to talk to main browser to display things Better not read/write my files -- even though browser can Recall: Approach 1: run the code on another machine Approach 2: run the untrusted code inside a virtual machine. Approach 3: run the untrusted code in a separate process. Approach 4: Approach 3 + OS access control techniques Approach 5: SFI and binary sandboxing Today: continue with Approach 4; describe alternatives for three "sub-approaches" A. Alternative 1: DAC Each object has a set of permissions (an access control list). E.g., Unix files, with rwx permission bits. "Discretionary" means applications set permissions on objects (e.g., chmod). Each program runs with privileges of some principals. E.g., Unix user/group IDs. When program accesses an object, check the program's privileges to decide. "Ambient privilege": process's privileges used implicitly for each access. Name Process privileges | | V V Object -> Permissions -> Allow? DAC is well-suited to time-sharing, where users own their own files, sometimes need to share, and programs unambiguously execute as as single specific user. What if code might be malicious, or exploited via buffer overflow? Don't want to run it with my full permissions! What if program acts as multiple principals, e.g. web server that uses its own files as well as fetching files for browsers? Problem: only root can create new principals, on most DAC systems. E.g., Unix, Windows. Problem: ambient authority makes it too hard to constrain malicious/buggy code. Too easy for some files to have the wrong permissions. Problem: ambient authority makes confused deputies all too likely. Problem: only root can create new principals, on most DAC systems. E.g., Unix, Windows. Problem: some objects might not have a clear configurable access control list. Unix: processes, network, ... (no real way in Unix to prevent processes from accessing the network. firewall rules are not specific to principals.) B. Alternative 2: MAC MAC enforces a set of policies (== rules) on an application. Rules set up by application writer, or administrator, or user. O/S enforces policy; program can't change them. [lots of versions of MAC; we're describing the abstract picture here.] Example policies: Can only access specified files, no others. Can access any file except specified files/directories. Cannot use the network. Can connect to host X over the network. The goal of MAC is to make it much harder for applications to make mistakes about what files etc. they access, or to be tricked into making mistakes. "Mandatory" in the sense that applications can't change this policy. (by contrast, in DAC, security policy is set by applications themselves (chmod, etc).) Name Operation + caller process | | V V Object --------> Allow? ^ | Policy ------------+ MAC usually implemented by intercepting every system call (policy applied in a _reference monitor_.) Policies keyed by system call name and arguments. Each application has a policy file. Supplied by application vendor if more or less trusted. Supplied by admin or user otherwise. Example: Mac OS X sandbox ("Seatbelt"). Pro: any user can sandbox an arbitrary piece of code, finally! Pro: can be applied to existing applications (with some work). Pro: can be very restrictive, yet allow precise sharing. Con: some operations can only be filtered at coarse grain E.g. shared memory can be only allowed or forbidden; not specific sharing. Con: can be difficult to determine security impact of syscall based on args. What does a pathname refer to? Symlinks, hard links, race conditions, .. Con: programmer must separately write the policy + application code. Con: static -- not a programming tool. Hard for program itself to use to set up sandboxes with dynamically-determined privileges. Is it a good idea to separate policy from application code? Depends on overall goal. Good if user/admin wants to look at or change policy. Awkward if app developer needs to maintain both code and policy. For app developers, might help clarify policy. C. Alternative 3: capabilities Different plan for access control: capabilities. If process has a handle for some object ("capability"), can access it. Capability --> Object Useful mental model: file descriptor that a process inherits. Characteristics of capability systems: The only access logic is "does the process have the capability". There is no ambient authority, and thus no global name spaces. All resources are uniformly accessible via capabilities. Capabilities can't be forged. A process can give a capability to another process. Holding a capability automatically grants access to corresponding object. Why is this attractive? A sandbox can be set up with exactly the capabilities it needs. Including any capabilities needed to share with other processes. No capability -> no access. There is only one access/permission scheme for all kinds of objects. No ambient authority, so no confused deputy. Unix file descriptors are a form of capability. An FD refers to a specific file/socket/etc. (not a name, which might change). Holding a writable FD to a file allows process to write it, even if permissions have changed FDs can't be forged. FDs can be passed to other processes (inherited by fork, sent by sendmsg). Why aren't Unix FDs enough for sandboxing? Unix FDs allow operations like fchmod() that must be protected. Some Unix resources aren't addressed via FDs, e.g. processes. Many Unix system calls don't involve FDs, but must be protected. D. Alternative 4: pure capability-based OS (KeyKOS, etc.) Capabilities are really an idea for a totally different O/S design! Some O/S's have *only* capabilities (e.g. KeyKOS); interesting but hard. Message-passing channels (very much like file descriptors) are capabilities. Every application has to be written in a capability style. Capsicum claims to be more pragmatic: some applications need not be changed. E. NOTE: Linux capabilities are solving a different problem. Trying to partition root's privileges into finer-grained privileges. Represented by various capabilities: CAP_KILL, CAP_SETUID, CAP_SYS_CHROOT, .. Process can run with a specific capability instead of all of root's privs. This is not the same thing as what we meant by "capability" above. Ref: capabilities(7), http://linux.die.net/man/7/capabilities 3. admin notes different definitions of "sandbox". "OS sandboxing" is what we're talking about today. lab 3c is relatively straightforward.... so you can take advantage of the stopped late hour clock to have a break... 4. Capsicum A. Description Capsicum adds capabilities to a name+DAC system. A process can be in normal mode, or in "capability mode". cap_enter() call switches to capability mode. Cannot exit capability mode! All children/descendants inherit capability mode. [analogy: fork()ing and then dropping privileges is like pdfork()ing and then calling cap_enter.] In capability mode: Access *only* allowed via capabilities. Capability is a kind of file descriptor, with some flags indicating allowed ops (read, write, seek, etc.). [why aren't file permissions enough?] [answer: this permits the ability to give some processes read access and some write access, etc., without using the mechanism of groups, which would be unwieldy or too coarse-grained] Lots of new system calls to allow access via capabilities. openat(fd, name, ...) unlinkat(fd, name, ...) fd = pdfork(); pdwait(fd); pdkill(fd); thus ability to pdkill() can be restricted, given away No root directory or current directory. General capsicum philosophy: no global namespaces. Why are the authors so fascinated with eliminating global namespaces? Global namespaces require some access control story (e.g., ambient privs). Hard to control access to objects in global namespaces. In capability mode, can only use file descriptors -- no global namespaces. Cannot open files by full path name: no need for chroot as in OKWS. Can still open files by relative path name, given fd for dir (openat). Cannot use ".." in path names or in symlinks: why not? In principle, ".." might be fine, as long as ".." doesn't go too far. Hard to enforce correctly. Hypothetical design: Prohibit looking up ".." at the root capability. No more ".." than non-".." components in path name, ignoring ".". Assume a process has capability C1 for /foo. Race condition, in a single process with 2 threads: T1: mkdir(C1, "a/b/c") T1: C2 = openat(C1, "a") T1: C3 = openat(C2, "b/c/../..") ## should return a cap for /foo/a Let openat() run until it's about to look up the first ".." T2: renameat(C1, "a/b/c", C1, "d") T1: Look up the first "..", which goes to "/foo" Look up the second "..", which goes to "/" Do Unix permissions still apply? Yes -- can't access all files in dir just because you have a cap for dir. But intent is that sandbox shouldn't rely on Unix permissions. For file descriptors, add a wrapper object that stores allowed operations. Where does the kernel check capabilities? One function in kernel looks up fd numbers -- modified it to check caps. Also modified namei function, which looks up path names. Good practice: look for narrow interfaces, otherwise easy to miss checks. libcapsicum. Why do application developers need this library? Biggest functionality: starting a new process in a sandbox. fd lists. Mostly a convenient way to pass lots of file descriptors to child process. Name file descriptors by string instead of hard-coding an fd number. [also, partially helps deal with the fact that in Capsicum, delayed initialization doesn't work. This way: the parent creates all of the capabilities that are needed, entrusts them to a separate service or module, and then when the child needs a fd, it requests it from the service.] cap_enter() vs lch_start(). What are the advantages of sandboxing using exec instead of cap_enter? Leftover data in memory: e.g., private keys in OpenSSL/OpenSSH. Leftover file descriptors that application forgot to close. Figure 7 in paper: tcpdump had privileges on stdin, stdout, stderr. Figure 10 in paper: dhclient had a raw socket, syslogd pipe, lease file. Advantages of Capsicum: any process can create a new sandbox. (Even a sandbox can create a sandbox.) Advantages: fine-grained control of access to resources (if they map to FDs). Files, network sockets, processes. Disadvantage: weak story for keeping track of access to persistent files. Disadvantage: prohibits global namespaces, requires writing code differently. B. Using Capsicum in applications General plan: Some setup in non-capability mode -- open needed directories, etc. Then switch to capability mode. From then on, application must use openat(), etc. -> applications need to be modified to intentionally constrain themselves with capsicum, and to use capabilities. tcpdump. tcpdump snoops on LAN, parses packets w/ complex code, juicy target! needs superuser to open "pcap" [stands for "packet capture"; no relation to cap-abilities] -- then should have almost no privileges! 2-line version (Figure 6): just cap_enter() after opening all FDs. Used procstat to look at resulting capabilities. 8-line version (Figure 7): also restrict stdin/stdout/stderr. Why? E.g., avoid reading stderr log, changing terminal settings, .. the point: now tcpdump can't do *anything* other than read packets, compute, write stdout. dhclient. dhclient sends/receives "raw" network packets, and then configures network interfaces. so it needs to retain privilege. but it also parses DHCP packets from whoever, so risk of e.g. buffer overflow. So use privilege separation: fork() parent opens raw socket, cap_enter(), send/recv packets, notify child. child runs as root, waits for info from parent, configures network interface. gzip. compression/decompression code may have exploitable bugs; people often decompress files from untrusted sources. Fork/exec sandboxed child process, send it file capabilities over a pipe. Child in cap mode, has no other capabilities, thus can't see any files, etc. Substantial changes, mostly to marshal/unmarshal data for RPC: 409 LoC. Interesting bug: forgot to propagate compression level at first. Chromium. Want to render HTML, run JS, etc. in separate sandboxed processes. They talk back to main browser process. Already privilege-separated on other platforms (but not on FreeBSD). ~100 LoC to wrap file descriptors for sandboxed processes. QUESTION: how would you apply Capsicum to OKWS? C. Discussion (1) How does this avoid the Confused Deputy? No ambient privilege (2) Does Capsicum achieve its goals? How hard/easy is it to use? Using Capsicum in an application almost always requires app changes. To open files with openat(), etc. One exception: Unix pipeline apps (filters) that just operate on FDs. Suggested plan: sandbox and see what breaks. Might be subtle: gzip compression level bug. What are the security guarantees it provides? Guarantees provided to app developers: sandbox can operate only on open FDs. Implications depend on how app developer partitions application, FDs. User/admin doesn't get any direct guarantees from Capsicum. Unlike MAC schemes, wher user/admin directly specifies policy. Guarantees assume no bugs in FreeBSD kernel (lots of code), and that the Capsicum developers caught all ways to access a resources without using FDs. What are the performance overheads? (CPU, memory) Minor overheads for accessing a file descriptor. Setting up a sandbox using fork/exec takes order of msecs, non-trivial. Privilege separation can require RPC / message-passing, perhaps noticeable. Adoption? In FreeBSD's kernel now, enabled by default (as of FreeBSD 10). A handful of applications have been modified to use Capsicum. dhclient, tcpdump, and a few more since the paper was written [ Ref: http://www.cl.cam.ac.uk/research/security/capsicum/freebsd.html ] Casper daemon to help applications perform non-capability operations. E.g., DNS lookups, look up entries in /etc/passwd, etc. [ Ref: http://people.freebsd.org/~pjd/pubs/Capsicum_and_Casper.pdf ] (3) What applications wouldn't be a good fit for Capsicum? Apps that need to handle human-oriented file/directory names. Names seem to require ambient authority, fit badly with capability's direct refs. Apps that need to control access to non-kernel-managed objects. E.g.: imagine a windowing system at user level, where each window is a construct of the user-level window server (X Windows works this way). Capsicum treats pipe to a user-level server (e.g., X server) as one cap. So access to user-level server is all or nothing Apps that need to connect to specific TCP/UDP addresses/ports from sandbox. Capsicum works by only allowing operations on existing open FDs. Need some other mechanism to control what FDs can be opened. Possible solution: helper program can run outside of capability mode, open TCP/UDP sockets for sandboxed programs based on policy. (4) Compare OS mechanisms to non-OS mechanisms for sandboxing OS mechanisms for access control are great if privilege boundaries can be made to align well with OS objects (process boundaries, UIDs, etc.), and if file granularity matches desired granularity of permissions. But that's not always how things work. The non-OS mechanisms (SFI, etc.) may be a better fit, depending on the application --Those allow for finer-grained control (for example, access to different objects) --And they make it possible to consider isolating in a way that is OS-independent References: http://reverse.put.as/wp-content/uploads/2011/09/Apple-Sandbox-Guide-v1.0.pdf http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/prctl/seccomp_filter.txt;hb=HEAD http://en.wikipedia.org/wiki/Mandatory_Integrity_Control ------------------ Acknowledgment: MIT 6.858 staff