Class 25 CS 202 30 April 2020 On the board ------------ 1. Last time 2. Trusting trust A. Background B. Adding a new feature to a language C. Context D. Goal: bug login E. Self-reproducing programs F. Result G. Moral, discussion 3. Further thoughts on trust --------------------------------------------------------------------------- 1. Last time Unix protection model Attacks and problems 2. Trusting trust --first of all, the word "trust" is a bad thing in computer security (this is an unfortunate linguistic fact). to "trust" something means to "assume it correct", which in turn means "to be in trouble if the assumption is false". so "removing trust" is a good thing. so is making things "trust*worthy*" (that is, worthy of being assumed correct), but it is in general hard to make any given component truly trustworthy. --you'll notice that the "trusted computing" initiatives from various powerful interests subvert this word. who exactly is being trusted and who is exactly isn't being trusted? "trusted computing" sounds great linguistically, but "trusted computing platforms" do not actually mean what they sound like A. background on this paper by Thompson: Thompson gave this lecture/paper after winning the Turing Award, which is considered by many to be the Nobel prize of Computer Science. The paper is stunning but takes patience and a few readings to understand. We're going to reproduce most of what Thompson did but will follow the ideas in an order different from the one in the paper. B. adding a feature to a language What if we wanted to add a feature to Java? Say that the Java compiler is written in C, in a file called java.c. So we modify java.c, and rerun the C compiler on java.c, producing a new Java compiler that understands a new feature of Java Now what if we wanted to add a feature to the C programming language? Well, for all practical purposes, the C compiler is also written in C, and let's assume that the entire C compiler is implemented in a file called "cc.c". To add a feature to the C programming language, we need to modify cc.c, and run the old C compiler on the new file. At this point, we have a new C compiler that understands a new feature of the language. C. Context As sometimes happens today, earlier versions of Unix were distributed with a full set of binaries and source for those binaries. This source included source for the compiler, the OS, the program 'login', etc. Because the system was quite small, it was common for people to make a change in one source file and then to recompile all of their programs. So program recompilation happened a lot. D. In this environment, how could someone as clever as Thompson add a bug to the login program without leaving a trace in the source files? **GOAL: have no source files hint at the bug, and meanwhile, the bug will persist across all recompilations [DRAW PICTURES] E. How can we write a self-reproducing program in pseudocode? X = "Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X." Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X. Run that, and you get itself. Here's a version that includes other instructions: X = "[execute whatever.] Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X." [execute whatever.] Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X. Here is a simpler version: Print this followed by its quotation: "Print this followed by its quotation". [BTW, the GNU Public License works like this. It's a self-replicating license! the license specifies that to make a copy of the code, you have to release the source **with the license itself included**. the license talks about itself, just as a self-replicating program must.] Here's a self-replicating program in Scheme: ((lambda (x) `(,x ',x)) '(lambda (x) `(,x ',x))) Self-replicating programs in other languages: https://rosettacode.org/wiki/Quine F. Result: some well-known string in the C compiler source now compiles to binary that does the following: << (1) if compiling "login", insert a bug (2) if you see the well-known string in the C compiler itself, replace it with everything between << >> >> G. What's the moral of the story? What if you disassembled the binaries? Would the attack be visible there? (Depends on whether the disassembler was also bugged.) H. Postscript, 1 Russ Cox reports: "The original hack, by the way, did not work perfectly. It made the compiler just a little bigger each time it compiled itself. Eventually someone discovered this and tried to figure out why, and they compiled via an assembly listing (cc -S x.c; as x.s), and the hack disappeared. (It was not enabled when printing an assembly listing with -S.)." [follow-up: Ken Thompson reports: "it was a '\0' added to a string every time."] H. Postscript, 2 While Ken Thompson's version of this bug was never widely distributed, the same attack was identified in the wild in 2009. Win32.Induc.A, a worm identified in 2009, attacked the Delphi compiler in order to inject code into compiled applications. [https://www.veracode.com/blog/2009/08/trust-your-own-code-trust-your-own-compiler] As Ken Thompson notes in the paper, he was not the first one to come up with this attack. Karger and Schell had described a similar concern when analyzing MULTICS security, and labelling attacks of this type a "compiler trap door." Their original report on this can be found at: https://csrc.nist.gov/csrc/media/publications/conference-paper/1998/10/08/proceedings-of-the-21st-nissc-1998/documents/early-cs-papers/karg74.pdf (see pp51-52) They revisited this report 30 years later, and wrote up a new paper Thirty Years Later: Lessons from the Multics Security Evaluation, that can be found at https://hack.org/mc/texts/classic-multics.pdf. (see section 3.2.4) Compiler trap doors are only one of a large number of security vulnerabilities they identified, several of which -- including "installation and booting trap doors" (now known as root kits), distribution trap doors (and others) -- continue to be concerns today. 3. Further thoughts on trust Question: what do you have to trust to be sure that no one is aware of, nor can ever be aware of, your IM conversation with someone else? no eavesdropping no dumping of data to be analyzed later IM binary isn't bugged Question: what if the hardware itself is buggy? What do we do then? (People are worried about this.) Question: can you ensure privacy through encryption? --answer: DEPENDS ON THE THREAT MODEL (and what you mean by privacy). --what if you know that two nations are talking? --what if you know that Coca Cola is talking to bankruptcy lawyers? (sometimes, you don't need to know *what* information is being relayed. sometimes, it's enough to know that two parties are communicating.) Why isn't gmail encrypted? 4. Stray notes on security, left over from last time... F. Other issues: can create hard link to any file, even if we can't read or write it attacks: 1. keep around buggy setuid code --attack: create local hard link to buggy code (call it something innocuous so no one notices). now you can keep using it as an attack vector, even after root rm's the buggy version and installs the fixed version. the attacker keeps using it through the innocuous name. 2. keep around sensitive data --attack: use hard links to keep a copy of sensitive data around. then, later, if you are able to obtain root access, you can read the data. --another issue: (used to be) easy to escape from chroot with uid=0 fd=open("/"), chroot("/tmp"), fchdir(fd), chroot("./../../../..") ---------------------------------------------------------------------------