Class 8 CS 480-008 18 February 2016 On the board ------------ 1. Last time 2. User authentication --the overall concern --passwords --criteria in the paper --schemes --discussion 3. Privilege separation and isolation 4. Unix's mechanisms for isolation and controlled sharing --------------------------------------------------------------------------- 1. Last time Finished BROP clarify negative offset clarify why the buffer is small: the run from the ret_addr in input.read() to the end of random+RAN_LEN is not very long Finished buffer overflow Started user authentication and passwords Clarify cryptographic hash function: H:M-->D collision resistant: hard to find m1,m2, m1!=m2 s.t. H(m1)=H(m2) pre-image resistant: given d, hard to find m s.t. H(m)=d second pre-image resistant: given m1 hard to find m2!=m1 s.t. H(m2)=H(m1) Clarify salt: why do you want it to be random? 2. passwords A. [last time] The problem: user authentication B. Passwords C. Criteria in the paper D. Schemes/alternatives E. Discussion A. The overall concern is _user authentication_ --Underpinning of many security policies --Some interesting technical issues --Easy to do wrong on technical grounds --Also remains challenging on non-technical grounds, because security isn't just a technical problem Authentication: who is the user? Challenging to know for sure User registers some secret --- but *who* registers it? At the scale of a university, we might be able to check the identity of the user when registering Typically settle for weaker guarantee Establish that the user who logs has the secret when registering If so, then assume it is the same user But, we have no guarantee that we know the true identity of the user For many usages that is fine E.g., Amazon doesn't really care who you really are as long as you pay Problem: how to authenticate users? Setting: user <-> computer <-> verifier server. Potential extra components might help authentication: A trusted third party. User's portable device (either dedicated or app in mobile phone). A proxy server. This paper proposes a number of criteria to evaluate authentication schemes. Proposed criteria are reasonable; sometimes non-orthogonal, and not complete. Useful as a starting point to think about a new authentication scheme. B. Passwords * [last time] How to _store_? * How to _transmit_? * How to defend against guessing? * What matters in password choice? * Password recovery --How to _transmit_ passwords? Poor idea: sending password to the server in cleartext. Slightly better: send password over encrypted connection. Why is this bad? --Connection may be intercepted. --Shared passwords mean that one server can use password on another server. Strawman alternative: send hash of password, instead of the password. Not so great: hash becomes a "password equivalent", can still be resent. Better alternative: challenge-response scheme. User and server both know password. Server sends challenge R. User responds with H(R || password). Server checks if response is H(R || password). If server knew password, server convinced user knows password (putting aside Man-in-the-middle [MITM] attacks). If server did not know password, server does not learn password. How to prevent server from brute-force guessing password based on H() value? Expensive hash + salting (as discussed last time) Allow client to choose some randomness too: guard against rainbow tables. To avoid storing the real password on the server, use protocol like SRP. http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol High-level idea: Given a security parameter g, the client computes this . . . v = g^(hash(salt, password)) . . . and sends v and the salt to the server. The client and the server can then establish an ephemeral key using g and v Difficult for the attacker to perform discrete logarithms modulo N; Implementing challenge/response often means changing the client and the server. --How to defend against guessing? Guessing attacks are a problem because of small key space. To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/) As you type in a potential password letter, tries to guess the next letter -Common passwords (e.g., via leaks of password databases) -Popular phrases from web sites -Common user biases in selecting characters E.g., using adjacent keys for adjacent password characters see refs below Rate-limiting authentication attempts is important. implement time-out periods after too many incorrect guesses. What to do after many failed authentication attempts? --What matters in user's password choice? --Many sites impose certain requirements on passwords (e.g., length, chars). --In reality, what matters is entropy. --Format requirements rarely translate into higher entropy Defeats only the simplest dictionary attacks. Also has an unfortunate side-effect of complicating password generation. E.g., no single password-gen algorithm satisfies every possible web site. Conflicting length, symbol rules. Password distribution "key spaces" are quite small in practice [above]. --Password recovery. Important part of the overall security story. Recall story with Sarah Palin's email account, etc. Think of this as yet another authentication mechanism. Composing authentication mechanisms is tricky: are both or either required? Recovery mechanisms are typically "either". Sometimes composing "both" is a good idea: token/paper + password/PIN, etc. C. Criteria In the reading, the authors propose a bunch of factors that can be used to evaluate authentication schemes (the goal is to determine whether passwords are as bad as they seem). The authors consider three high-level metrics: (U)sability, (D)eployability, and (S)ecurity. -Usability: How easy is it for users to interact with the authentication scheme? *Easy-to-Learn: "Users who don't know the scheme can figure it out and learn it without too much trouble." -This is a key reason why password schemes are so popular! *Infrequent errors: "The task that users must perform to log in usually succeeds when performed by a legitimate and honest user." -This is an important reason why users pick easy-to-guess passwords. *Scalable-for-Users: "Using the scheme for hundreds of accounts does not increase the burden on the user." -. . . explains why people often reuse passwords or create a simple per-site uniquifying scheme for a base password. *Easy recovery from loss of the authentication token -A win for passwords--they're easy to reset. *Nothing to carry -Another win for passwords. -Deployability: How easy is it to incorporate the authentication method into real systems? *Server-Compatible: "At the verifier's end, the scheme is compatible with text-based passwords. Providers don't have to change their existing authentication setup to support the scheme." *Browser-Compatible: "Users don't have to change their client to support the scheme . . . schemes fail to provide this benefit if they require the installation of plugins or any kind of software whose installation requires administrative privileges." *Accessible: "Users who can use passwords are not prevented from using the scheme by disabilities or other physical (not cognitive) conditions." --Deployability is extremely difficult: it's difficult to get users or servers to update en masse! --Passwords do well in this category by default, since the authors define "deployability" as how well a system integrates with current password infrastructure. However, passwords don't do very well in the next category . . . -Security: What kinds of attacks can the authentication scheme prevent? *Resilient-to-Physical-Observation: "An attacker cannot impersonate a user after observing them authenticate one or more times. We grant Quasi-Resilient-to- Physical-Observation if the scheme could be broken only by repeating the observation more than, say, 10--20 times. Attacks include shoulder surfing, filming the keyboard, recording keystroke sounds, or thermal imaging of keypad." -Passwords fail this test, since, e.g., they can be captured by filming the keyboard or recording keystroke sounds. *Resilient-to-Targeted-Impersonation: "It is not possible for an acquanitance (or skilled investigator) to impersonate a specific user by exploiting knowledge of personal details (birth date, names of relatives etc.). Personal knowledge questions are the canonical scheme that fails on this point." -The authors say that passwords are "quasi-resistant" b/c they couldn't find any studies saying that your friends or acquaintances can easily guess your password. *Resilient-to-Throttled-Guessing: "An attacker whose rate of guessing is constrained by the verifier cannot successfully guess the secrets of a significant fraction of users . . . Lack of this benefit is meant to penalize schemes in which it is frequent for user-chosen secrets to be selected from a small and well-known subset." -Passwords fail because they have low entropy + skewed distributions. *Resilient-to-Unthrottled-Guessing: "An attacker whose rate of guessing is constrained only by available computing resources cannot successfully guess the secrets of a significant fraction of users. We might for example grant this benefit if an attacker capable of attempting up to 2^40 or even 2^64 guesses per account could still only reach fewer than 1% of accounts. Lack of this benefit is meant to penalize schemes where the space of credentials is not large enough to withstand brute force search from a small and well-known subset." -Passwords fail because they have low entropy + skewed distributions. *Resilient-to-Internal-Observation: "An attacker cannot impersonate a user by intercepting the user's input from inside the user's device (e.g., by keylogging malware) or eavesdropping on the cleartext communication between prover and verifier (we assume that the attacker can also defeat TLS if it is used, perhaps through the CA) . . . This penalizes schemes that are not replay-resistant, whether because they send a static response or because their dynamic response countermeasure can be cracked with a few observations. This benefit assumes that general-purpose devices like software-updatable personal computers and mobile phones may contain malware, but that hardware devices dedicated exclusively to the scheme can be made malware-free." -Passwords fail because they are static tokens: once you have one, you can use it until it expires or is revoked. *Resilient-to-Phishing: "An attacker who simulates a valid verifier (including by DNS manipulation) cannot collect credentials that can later be used to impersonate the user to the actual verifier. This penalizes schemes allowing phishers to get victims to authenticate to look-alike sites and later use the harvested credentials against the genuine sites." -Passwords fail: phishing attacks are very common! *No-Trusted-Third-Party: "The scheme does not rely on a trusted third party (other than the prover and the verifier) who could, upon being attacked or otherwise becoming untrustworthy, compromise the prover's security or privacy." -This property makes an important point: a lot of authentication problems would become easier if we could just trust one party to store passwords, run the password servers, etc. However, single points of failure are bad, since attackers can focus all of their energy on that point! *Resilient-to-Leaks-from-Other-Verifiers: "Nothing that a verifier could possibly leak can help an attacker impersonate the user to another verifier. This penalizes schemes where insider fraud at one provider, or a successful attack on one back-end, endangers the user's accounts at other sites." -This property is related to No-Trusted-Third-Party. To avoid a central point of failure, we'd like to introduced some notion of distributed authentication: however, does this mean that the system is only as strong as its weakest link? [Think back to HTTPS, and how a bad certificate authority can convince a browser to accept fake certificates for arbitrary sites. Security depends on the strength of the least secure CA!] -Authors say that passwords fail because people often reuse passwords across sites. Goals mutually conflict Memorywise-Effortless + Nothing-to-Carry. Memorywise-Effortless + Resilient-to-Theft. //Either the user remembers something, or //it can be stolen (except for biometrics). Server-Compatible + Resilient-to-Internal-Observation. Server-Compatible + Resilient-to-Leaks-from-Other-Verifiers. //Server compatible means sending a password. //Passwords can be stolen on user machine, //replayed by one server to another. D. Schemes D1. Biometrics: leverage the unique aspects of a person's physical appearance or behavior. -How big is the keyspace? Fingerprints: ~13.3 bits. Iris scan: ~19.9 bits. Voice recognition: ~11.7 bits. So, bits of entropy are roughly the same as passwords. Scorecard: Passwords Biometrics Easy-to-learn: Yes Yes Infrequent errors: Quasi-yes No Scalable for users: No Yes Easy recovery: Yes No Nothing to carry: Yes Yes 3.5 vs 3 Passwords Biometrics Server-compatible: Yes No Browser-compatible: Yes No Accessible: Yes Quasi-yes (entering biometrics is error-prone) 3 vs 0.5 Passwords Biometrics Res-to-Phys-Obs: No Yes Res-to-Trgtd-Imp: Quasi-yes No (e.g., replaying voice recording, lifting fingerprints from surfaces) Res-to-Thrtld-Guess: No Yes Res-to-UnThrtld-Guess: No No (key space isn't much bigger than that of passwords) Res-to-Internal-Obv: No No (captured biometric data can be replayed) Res-to-Phishing: No No No-trusted-3rd-Party: Yes Yes Res-Other-Ver-Leaks: No No (same biometrics are used by all verifiers) 1.5 vs 3 So, final score is 8 vs 6.5. Of course, one could assign non-unity weights to each category, but the point is that it's not obvious that biometrics are "better" than passwords! D2. CAP (Chip Authentication Program): -The CAP reader was designed by Mastercard to protect online banking transactions. -Usage: 1)Put your credit card into the CAP reader (which looks like a hand-held calculator). 2)Enter PIN (bypassing keyloggers!). 3)Reader talks to the card's embedded processor, outputs an 8-digit code which the user supplies to the web site. CAP reader Easy-to-learn: Yes Infrequent errors: Quasi-yes Scalable for users: No (users require card+PIN per verifier) Easy recovery: No Nothing to carry: No 1.5 CAP reader Server-compatible: No Browser-compatible: Yes Accessible: No (blind people can't read 8-digit code) 1 CAP reader Res-to-Phys-Obs: Yes\ Res-to-Trgtd-Imp: Yes \__ One-time codes! Res-to-Thrtld-Guess: Yes / Res-to-UnThrtld-Guess: Yes/ Res-to-Internal-Obv: Yes Dedicated device Res-to-Phishing: Yes One-time codes No-trusted-3rd-Party: Yes Each site is its own verifier Res-Other-Ver-Leaks: Yes One-time codes 8 -So, passwords=8 and CAP reader=10.5. However, there are reasons why CAP readers haven't taken over the world (see the low usability and deployability scores). -In practice, deployability and usability are often more important than security. *Migration costs (coding+debugging effort, user training) make developers nervous! *The less usable a scheme is, the more that users will complain (and try to pick easier authentication tokens that are more vulnerable to attackers). -Some situations may assign different weights to different evaluation metrics. *Ex: On a military base, the security benefits of a hardware-based token might outweigh the problems with usability and deployability. Authors' conclusion: no scheme dominates passwords! --Multi-factor authentication (MFA): defense using depth. -Requires users to authenticate themselves using two or more authentication mechanisms. --The mechanisms should involve different modalities! *Something you know (e.g., a password) *Something you possess (e.g., a cellphone, a hardware token) *Something you are (e.g., biometrics) --Idea is that an attacker must steal/subvert multiple authentication mechanisms to impersonate a user (e.g., attacker might guess a password, but lack access to a user's phone). --Example: Google's two-factor authentication requires a password plus a cellphone which can receive authorization codes via text message. --MFA is a good idea, but empirical studies show that if users are given a second authentication factor in addition to passwords, users pick much weaker passwords! E. Discussion What other factors should we worry about in user authentication? [sec V.B] Continuous authentication, instead of session start. Migration cost from passwords / incentives for deployment (OpenID). Renewing credentials Availability / DoS attacks Why aren't alternatives widely used? --No single answer. --But convenience explains a lot: --passwords are convenient... --... and in many scenarios, security isn't important enough to justify switching cost: per-user cost on the server, on the user's end, software changes, etc. --Limited benefits of some alternative schemes. --Often hard for an individual user to improve his/her own security. Perhaps partially fixed with SSO, where users can choose a better IdP. References: Full tech report: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/ http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html ------------ 3. Privilege separation and isolation The problem is bugs. No-one knows how to prevent programmers from making mistakes. Many bugs cause security problems. We need ways to limit the damage from as-yet-unknown bugs. Example: traditional Web server setup (Apache) (a) Apache runs N identical processes, handling HTTP requests; all processes run as user 'www' (or 'httpd') (b) Each Apache process has all application code: executes requests for many users; executes lots of different kinds of requests (log in, read e-mail, etc.) process executes application code (PHP, for example) (c) Storage: SQL database stores application state (passwords, cookies, messages, etc.) typically one connection with full access to DB so the entire app has access to the entire DB In the event of bugs, this arrangement is very vulnerable: --if *any* component is compromised, the adversary gets all of the data --Buffer overflow / code injection gives access to all data. --bugs in file handling may give access to sensitive files e.g., code might have open("/profiles/" + user) but now what if attacker sets "user" to user=../etc/passwd or ../mail/george --SQL injection may let attacker r/w all DB data. --Attacks on Web *browsers* (cross-site scripting) [we will study these later in the semester] Response: two big (related) ideas a. Privilege separation divide up the s/w and data to limit damage from bugs designer must choose the separation scheme(s): by type of data (friend lists vs passwords) by user (my e-mail vs your e-mail) by buggyness (image resizing vs everything else) by exposure to direct attack (HTTP parsing vs everything else) by inherent privilege (hide superuser processes; hide the DB) b. Isolation construct walls between units of privilege separation to prevent exploits in one unit from spreading to others These two ideas have been very successful. Examples: client/server systems virtual machines (each web site gets its own) SSH (sshd and the agent are separated) sandboxing linux containers Challenges: Separation vs sharing Separation vs performance Hard to use O/S to enforce isolation and control sharing. [Need to use OS carefully to set things up correctly] 4. Unix's mechanisms for isolation and controlled sharing [didn't cover in class but including it in the notes, for context.] --Unix is the context for lab 3 and Tuesday's paper (OKWS) --Unix actions are taken by processes. A process is a running program. Processes are the most basix Unix tool for keeping code/data separate. A process's user ID (UID) controls many of its privileges. A UID is a small integer. Superuser (UID=0) bypasses most checks. A process also has a set of group IDs (GIDs) used in file permissions. --Sharing often depends on naming. If a process can name something, it can often access it. More important: if it *can't* name something, it usually *can't* use it. We can isolate a process by limiting what names it can use. (sounds simple, but it's a deep idea) So we want to know about the name-spaces Unix provides: PIDs, UIDs, memory, files, file descriptors, network connections. --What types of objects does Unix let processes manipulate? I.e. what do we need to control to enforce isolation, allow precise sharing? Processes. Processes with same UID can send signal, wait for exit & get status, debug (ptrace) each other. Otherwise not much direct interaction is allowed. Debugging, sending signals: must have same UID (almost). Various exceptions; this gets tricky in practice. Waiting / getting exit status: must be parent of that process. So: processes are reasonably well isolated for different UIDs. Process memory. One process cannot directly name or access memory in another process. Exceptions: debug mechanisms (ptrace), memory mapped files. So: process memory is reasonably well isolated for different UIDs. Files, directories. File operations: read, write, execute, change perms, .. Directory operations: lookup, create, remove, rename, change perms, .. Each inode has an owner user and group. Each inode has read, write, execute perms for user, group, others. E.g. "george staff rwxr-x---" Who can change a file's permissions? Only its owner (process UID). Execute for directory means being able to lookup names (but not ls). Checks for process opening file /etc/passwd: Must be able to look up 'etc' in /, 'passwd' in /etc (x permission). Must be able to open /etc/passwd (r or w permission). Unix rwx scheme is simple but not very expressive; cannot e.g. have two owners, or permissions for specific users. Suppose you want file readable to intersection of group1 and group2. Is it possible to implement this in Unix? So: can control which processes (UIDs) can access a specific file. But hard to control the set of files a specific process can access. File descriptors (FDs). A process has one FD per open file and open IPC/network connection. File access control checks performed at file open. Once process has an open file descriptor, can continue accessing. Processes cannot see or interfere with each others' FDs. Processes can pass file descriptors (via Unix domain sockets). So: FDs are well isolated -- process-local names, not global. Local IPC -- "Unix domain sockets" -- socketpair(). OKWS uses these for most of its inter-server communication. As used by OKWS, they have no names. A process can create a connection -- gets two FDs. It can then give the connection end FDs to other processes, either via fork()/exec() or by sending over existing connections. So: Unix domain connections are well isolated. Networking. Operations: bind to a port connect to some address read/write a connection send/receive raw packets Rules: - only root (UID 0) can bind to ports below 1024; (e.g., arbitrary user cannot run a web server on port 80.) - any process can connect to any port as a client. - can only read/write data on connection that a process has an fd for. (not really true; bad people may snoop/inject on network) (So: servers have to be careful who they talk to.) - only root can send/receive raw packets. Additionally, firewall (possibly running on server itself) imposes its own checks, unrelated to processes. How is a process's UID set? Superuser (UID 0) can call setuid(uid) and setgid(gid). Non-superuser processes can't change their UID (to first approx) UID/GID often initially set by login, from /etc/passwd. UID inherited during fork(), exec(). Where does the user ID, group ID list come from? On a typical Unix system, login program runs as root (UID 0) Checks supplied user password against /etc/shadow. Finds user's UID based on /etc/passwd. Finds user's groups based on /etc/group. Calls setuid(), setgid(), setgroups() before running user's shell How do you regain privileges after switching to a non-root user? Could use file descriptor passing (but have to write specialized code) Kernel mechanism: setuid/setgid binaries. When the binary is executed, set process UID or GID to binary owner. Specified with a special bit in the file's permissions. For example, su / sudo binaries are typically setuid root. Even if your shell is not root, can run "su otheruser" su process will check passwd, run shell as otheruser if OK. Many such programs on Unix, since root privileges often needed. Why might setuid-binaries be a bad idea, security-wise? Many ways for adversary (caller of binary) to manipulate process. In Unix, exec'ed process inherits environment vars, file descriptors, .. Libraries that a setuid program might use not sufficiently paranoid Historically, many vulnerabilities (e.g. pass $LD_PRELOAD, ..) One more Unix isolation trick: chroot() Problem: it is too hard to ensure that there are no sensitive files that a program can read, or write; 100,000+ files in a typical Unix install; applications are often careless about setting permissions. Solution: chroot(dirname) causes / to refer to dirname for this process and descendants, so they can't name files outside of dirname. e.g. chroot("/var/okws/run") causes subsequent absolute pathnames to start at /var/okws/run, not the real /. Thus the program can only name files/dirs under /var/okws/run. chroot() is typically used to prevent a process from interacting at all with other processes via files, i.e. complete isolation. Overall, Unix is awkward at precisely-controlled isolation+sharing: Many global name spaces: files, UIDs, PIDs, ports. Each may allow processes to see what others are up to. Each is an invitation for bugs or careless set-up. No idea of "default to no access". Thus hard for designer to reason about what a process can do. No fine-grained grants of privilege. Can't say "process can read only these three files." Privileges are coarse-grained, via UID, or implicit, e.g. wait() for children. Chroot() and setuid() can only be used by superuser. So non-superusers can't reduce/limit their own privilege. Awkward since security suggests *not* running as superuser. ** why is it a security vulnerability if chroot is setuid root? like what happens if user processes can "confine" themselves? attack: --attacker sets up jail's directory /tmp/dir --within /tmp/dir, hard link to passwd, su, login programs (not many restrictions placed on hard linking): /tmp/dir/sbin/passwd /tmp/dir/sbin/login etc. --create fake /tmp/dir/etc/passwd --chroot() into /tmp/dir --the binaries are hard-coded to look at /etc/passwd. when they run in the jail, they will be looking at the wrong version --yet, they will have privilege (they are setuid) --result: they will apply their privilege to the wrong environment, and allow the attacker to, say, login as root... --------------------------------------------------------------------------- Acknowledgment: MIT's 6.858 staff