Class 14
CS 202
22 March 2023

On the board
------------

1. Last time
2. WeensyOS
3. Page faults: intro and mechanics
4. Page faults: uses

---------------------------------------------------------------------------


1. Last time

    - case study of x86-64: multilevel page tables

    - hardware "walks" those page tables

    - caches the result in the TLB

    some clarifications

        - correction: cr0-cr3 introduced in 80386. numbering is arbitrary

        - G bit in PTE: "Global". tells the processor not to invalidate
        the TLB entry corresponding to the page upon a MOV to CR3
        instruction. Bit 7 (PGE) in CR4 must be set to enable global
        pages.

2. WeensyOS

    [draw picture of the software stack: two instances of virtualization]

    advice: start now!!!

    processes, files with p-*

    kernel code, files with k-*

    processes just allocate memory. system call: sys_page_alloc().
    analogous to brk() or mmap() in POSIX systems.

        look at process.h for where the system call happens

        see exception_return() for where the return back into user space
        happens

        %rax is what the application return value is.

    - figures (the animated gifs) are from 32-bit version of the lab. so
    you'll see some differences.

    - you'll use the virtual_memory_map() function
        pay attention to the "allocator" argument
        (and make sure your allocator initializes the new page table)

    - how many page tables are allocated for 3MB? what's the structure?

        - 3MB virtual address space, but the L4 page table that handles
         [2MB, 3MB) is allocated only on demand.

            - thus, make sure when calling virtual_memory_map that you're
            passing in a non-NULL allocator when you're supposed to.

    - process control block (PCB): this is the "struct proc" in kernel.h

    - recall:
        register %rax is the system call return value
        register %rdi contains the system call argument

    - remember: bugs in earlier parts may show up only later

    - pageinfo array:
    
        typedef struct physical_pageinfo {
            int8_t owner;
            int8_t refcount;
        } physical_pageinfo;

        static physical_pageinfo pageinfo[PAGENUMBER(MEMSIZE_PHYSICAL)];

        one physical_pageinfo struct per _physical_ page.

    - x86_64_pagetable....array of 512 entries (each 8 bytes)

    - note: recall the picture from the last handout, where in Linux,
    the kernel is mapped at the top of every user-level process's
    address space (which has lately been modified, to address Meltdown
    and Spectre). in lab 4, it doesn't work like that. in lab4, the kernel has
    its own separate page table.


    ----

    [what's below here are detailed notes from a prior recitation on
    lab4.]


    - Kernel virtual address
      Kernel is setup to use an identity mapping
      [0, MEM_PHYSICAL) -> [0, MEM_PHYSICAL)

    - Physical pages' meta data is recorded in physical_pageinfo array,
        whose elements contains refcount, owner
        owner can be kernel, reserved, free, or pid

    - Process control block:
      * Process registers, process state
      * Process page table - a pointer (kernel virtual address, which is
      the identical physical address) to an L1 page table
                           L1 page table's first entry points to a page
                           table, and so on...
      Our job mainly consists of manipulating the page tables, and pageinfo array

    - High level evolution of the lab:
      We have five programs: kernel + 4 processes
      Ex1. five processes share the same page table. 
          virtual addresses are all PTE_U | PTE_P | PTE_W
          Job: mark some of the addresses as (PTE_P | PTE_W), i.e. not user accessible

      Ex2. Each program uses its own page table.
          kernel already has a page table
          The job is to allocate and populate a page table for each process.
          The process break-down as helper functions:
            1. allocate a new page for process pid, and zero it (important)
                [use memset to zero-out]
            2. populate the new page table. can memcpy kernel's, but
            easier to copy the kernel's mappings individually.
                  In order to achieve the screenshot, after memcpying, we have to
                  mark [prog_addr_start, virtual_addr_size) as not-present.

      Ex3. Physical page allocation
        Motivation:
          Before this, during sys_page_alloc, when process asks for a specific virtual page,
          the identity mapping is employed to find the physical page.
          But it is too restrictive and with virtual memory, the process does not really 
          care which physical page it gets
        If we have implemented the function 1 mentioned in Ex2 (allocate
            a free page), then we are mostly good to go and just use that function.
        We also need to connect virtual-physical by setting the corresponding page table
            entry. Use virtual_memory_map

      Ex4. Overlapping virtual addresses
        Motivation:
          Every process has its own page table & accessible virtual addresses (PTE_P portions),
          we don't need to restrict processes to use different parts of the virtual memory.
          They can overlap, as long as the physical pages backing them are not overlapped.

        Easy to do: in process_setup, we use (MEMSIZE_VIRTUAL - PAGESIZE) instead of the  
          arithmetic to compute the process's stack page.

      Ex5. Fork
        High level goal:
          Produce a mostly identical process (minus the register rax).
        What does it mean to be an identical process?? 
              1 same binary
              2 same process registers
              3 AND same memory state / contents
          3 basically covers 1 because the binary is loaded in memory too.
          2 is easy to achieve (copy the registers; can do this with a
            single line of C code)
          The goal here is mainly to achieve 3.
        Fork creates a copy: the memory state has to be a copy!
        Question: 
            What does it mean to make a copy of memory?
              - They are backed by physical pages, so we alloc new physical pages
                and copy the content to new pages (memcpy)
              - Then connect virtual to physical by setting the page table
            The address space is potentially 256 TB large, do we copy 256 TB? 
            How do we know which parts to copy?
              - Iterate over the virtual address space; find pages that is (PTE_P | PTE_U | PTE_W)
            Given a page table entry, how do you check if it is user RW-able?
                Fill in the blanks...
                    pte_val _ (PTE_P | PTE_W | PTE_U) == ___
            How do you find its corresponding physical page?
                PTE_ADDR

    - Useful functions to implement for said manipulations:
      * find a PO_FREE physical page and assign it to a process (Useful for ex2, 3, 4, 5)
      * allocate empty page dir + page table for a process (Ex2, 4)
      * make a copy of existing page table and assign it to a process (Ex2, 5)
      * implement your own helper functions as you see fit
    Tip: Zero the allocated page before using it!! (memset)

    - Some useful functions/macros:
       PTE_ADDR : PTE_ENTRY -> Physical address
       PAGENUMBER : a phyiscal address -> corresponding index into page info array 
       PAGEADDR : PAGENUMBER^{-1}
       virtual_memory_lookup(pagetable, va)


3. Page faults: intro and mechanics

    We've discussed these a bit. Let's go into a bit more detail...

    Concept:
    
        a reference is illegal, either because it's not mapped in the
        page tables or because there is a protection violation.

        requires the OS to get involved

        this mechanism turns out to be hugely powerful, as we will see.

    Mechanics

        --what happens on the x86?

	    --processor constructs a trap frame and transfers execution to an
	    interrupt or trap handler

		    ss     [stack segment; ignore]
		    rsp    [former value of stack pointer]
		    rflags [former value of rflags]
		    cs     [code segment; ignore]
                    rip    [instruction that caused the trap]
           %rsp --> [error code] 

	    %rip now points to code to handle the trap
	        [how did processor know what to load into %rip?]

	    error code:

	        [see handout]

	        [ ................................ U/S | W/R | P]

	        U/S: user mode fault / supervisor mode fault
	        R/W: access was read / access was write
	        P: not-present page / protection violation

	    on a page fault, %cr2 holds the faulting virtual address

        --intent: when page fault happens, the kernel sets up the
        process's page entries properly, or terminates the process

4. Uses of page faults

    --Best example: overcommitting physical memory (the classical use of
    "virtual memory")

	--your program thinks it has, say, 64 GB of memory, but your
	hardware has only 16 GB of memory

	--the way that this worked is that the disk was (is) used to
	store memory pages

	--advantage: address space looks huge

	--disadvantage: accesses to "paged" memory (as disk pages that
	live on the disk are known) are sllooooowwwww:

	--Rough implementation:

	    --on a page fault, the kernel reads in the faulting page

	    --QUESTION: what is listed in the page structures? how does
	    kernel know whether the address is invalid, in memory,
	    paged, what?

	    --kernel may need to send a page to disk (under what
	    conditions? answer: two conditions must hold for kernel to
	    HAVE to write to disk)

		(1) kernel is out of memory

		(2) the page that it selects to write out is dirty

	--Computers have lots of memory, so less common to hear the
	sound of swapping these days. You would need multiple large memory
	consumers running on the same computer.


    --Many other uses

	--store memory pages across the network! (Distributed Shared
	Memory)

	    --basic idea was that on a page fault, the page fault
	    handler went and retrieved the needed page from some
	    other machine

	--copy-on-write

	    --when creating a copy of another process, don't copy
	    its memory. just copy its page tables, mark the pages as
	    read-only

	    --QUESTION: do you need to mark the parent's pages
	    as read-only as well? 

	    --program semantics aren't violated when programs do
	    reads

	    --when a write happens, a page fault results. at that
	    point, the kernel allocates a new page, copies the 
	    memory over, and restarts the user program to do a write

		--then, only do copies of memory when there is a
		fault as a result of a write

	    --this idea is all over the place; used in fork(), mmap(),
	    etc.

	--accounting

	    --good way to sample what percentage of the memory pages
	    are written to in any time slice: mark a fraction of
	    them not present, see how often you get faults

	--if you are interested in this, check out the paper
	"Virtual Memory Primitives for User Programs", by Andrew W.
	Appel and Kai Li, Proc. ASPLOS, 1991.

        --high-level idea: by giving kernel (or even user-level
        program) the opportunity to do interesting things on page
        faults, you can build interesting functionality


    --Paging in day-to-day use

        --Demand paging: bring program code into memory "lazily"

        --Growing the stack (contiguous in virtual space, probably not
        in physical space)

        --BSS page allocation (BSS segment contains the part of the
        address space with global variables, statically initialized to
        zero. OS can delay allocating and zeroing a page until the
        program accesses a variable on the page.)

        --Shared text 

        --Shared libraries 

        --Shared memory