Class 24
CS 202
29 April 2024

On the board
------------

1. Last time 
2. Trusting trust
    A. Background
    B. Adding a new feature to a language
    C. Context
    D. Goal: bug login
    E. Self-reproducing programs
    F. Result
    G. Moral, discussion
3. Further thoughts on trust

---------------------------------------------------------------------------

1. Last time

    Stack smashing
    
2. Trusting trust

    --first of all, the word "trust" is a bad thing in computer security
    (this is an unfortunate linguistic fact). to "trust" something means
    to "assume it correct", which in turn means "to be in trouble if the
    assumption is false". so "removing trust" is a good thing. so is
    making things "trust*worthy*" (that is, worthy of being assumed
    correct), but it is in general hard to make any given component
    truly trustworthy.

	--you'll notice that the "trusted computing" initiatives from
	various powerful interests subvert this word. who exactly is
	being trusted and who is exactly isn't being trusted? "trusted
	computing" sounds great linguistically, but "trusted computing
	platforms" do not actually mean what they sound like

    A. Background on this paper by Thompson:

	Thompson gave this lecture/paper after winning the Turing Award,
	which is considered by many to be the Nobel prize of Computer
	Science. The paper is stunning but takes patience and a few
	readings to understand. We're going to reproduce most of what
	Thompson did but will follow the ideas in an order different
	from the one in the paper. 

    B. Adding a feature to a language

	What if we wanted to add a feature to Java? Say that the Java
	compiler is written in C, in a file called java.c. So we modify
	java.c, and rerun the C compiler on java.c, producing a new Java
	compiler that understands a new feature of Java

	Now what if we wanted to add a feature to the C programming
	language? Well, for all practical purposes, the C compiler is
	also written in C, and let's assume that the entire C compiler
	is implemented in a file called "cc.c". To add a feature to the
	C programming language, we need to modify cc.c, and run the old
	C compiler on the new file. At this point, we have a new C
	compiler that understands a new feature of the language.

    C. Context

	As sometimes happens today, earlier versions of Unix were distributed with
	a full set of binaries and source for those binaries. This source included
	source for the compiler, the OS, the program 'login', etc.

	Because the system was quite small, it was common for people to make a
	change in one source file and then to recompile all of their programs. So
	program recompilation happened a lot.

    D. In this environment, how could someone as clever as Thompson add
    a bug to the login program without leaving a trace in the source
    files?

	**GOAL: have no source files hint at the bug, and meanwhile, the
	bug will persist across all recompilations

	[DRAW PICTURES]

    E. How can we write a self-reproducing program in pseudocode?

	X = "Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X."
	Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X.

	Run that, and you get itself.

        Here's a version that includes other instructions:
        
        X = "[execute whatever.] Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X."
	[execute whatever.] Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X.

	Here is a simpler version:

	    Print this followed by its quotation: "Print this followed
	    by its quotation".

	    [BTW, the GNU Public License works like this. It's a
	    self-replicating license! the license specifies that to make
	    a copy of the code, you have to release the source **with
	    the license itself included**. the license talks about
	    itself, just as a self-replicating program must.]

	Here's a self-replicating program in Scheme:

	    ((lambda (x) `(,x ',x))
	    '(lambda (x) `(,x ',x)))


	Self-replicating programs in other languages: 
		https://rosettacode.org/wiki/Quine

    F. Result:

	some well-known string in the C compiler source now compiles to
	binary that does the following:

	    <<
	    (1) if compiling "login", insert a bug
	    
	    (2) if you see the well-known string in the C compiler
	    itself, replace it with everything between << >>
	    >>

    G. What's the moral of the story?

    What if you disassembled the binaries? Would the attack be visible
    there? (Depends on whether the disassembler was also bugged.)

    H. Postscript, 1

        We now have the original code:
            https://research.swtch.com/nih

        Context: Russ Cox asked Ken Thompson for it in October 2023, and
        got it running on a simulator of the relevant CPU (PDP-11) and
        version of Unix (Research Unix Sixth Edition). 

        Russ's post above is highly recommended reading. Among many
        other things, he shows that the attack works exactly as Thompson
        describes in his paper, figures out the timeline of the hack,
        connects it to Go (which he leads), and gives relevant context
        of the attack.

    H. Postscript, 2

        While Ken Thompson's version of this bug was never widely distributed,
        the same attack was identified in the wild in 2009.  Win32.Induc.A, a
        worm identified in 2009, attacked the Delphi compiler in order to inject
        code into compiled applications. 
        [https://www.veracode.com/blog/2009/08/trust-your-own-code-trust-your-own-compiler]

        Here's a recent one: 
            https://www.crowdstrike.com/blog/sunspot-malware-technical-analysis/

        As Ken Thompson notes in the paper, he was not the first one to
        come up with this attack. Karger and Schell had described a
        similar concern when analyzing MULTICS security, and labelling 
        attacks of this type a "compiler trap door." Their
        original report on this can be found at:

        https://csrc.nist.gov/csrc/media/publications/conference-paper/1998/10/08/proceedings-of-the-21st-nissc-1998/documents/early-cs-papers/karg74.pdf
        (see pp51-52)

        They revisited this report 30 years later, and wrote up a new paper
        Thirty Years Later: Lessons from the Multics Security Evaluation,
        that can be found at https://hack.org/mc/texts/classic-multics.pdf.
        (see section 3.2.4)

        Compiler trap doors are only one of a large number of security vulnerabilities
        they identified, several of which -- including "installation and booting trap doors"
        (now known as root kits), distribution trap doors (and others)
        -- continue to be concerns today.

    I. Postscript, 3

        March 2024 sophisticated supply chain attack on open source. See:
            https://www.openwall.com/lists/oss-security/2024/03/29/4
            https://research.swtch.com/xz-script
            https://research.swtch.com/xz-timeline

3. Further thoughts on trust

    Question: what do you have to trust to be sure that no one is aware
    of, nor can ever be aware of, your IM conversation with someone
    else?

        no eavesdropping

        no dumping of data to be analyzed later

        IM binary isn't bugged

    Question: what if the hardware itself is buggy? What do we do then?
    (People are worried about this.)

    Question: can you ensure privacy through encryption?

	--answer: DEPENDS ON THE THREAT MODEL (and what you mean by
	privacy).

	--what if you know that two nations are talking?

	--what if you know that Coca Cola is talking to bankruptcy
	lawyers?

	(sometimes, you don't need to know *what* information is being
	relayed. sometimes, it's enough to know that two parties are
	communicating.)

    Why isn't gmail encrypted?