CS202 Review Session 6
Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021
Edited by Khanh Nguyen, TA Spring 2022
Edited by Charlie Chen, TA Spring 2023
Edited by Sophia Watts, TA Spring 2024
Edited by Saeed Bafarat, TA Fall 2024
Edited by Andrew Hua and Yuxia Zhan, TA Fall 2025
Edited by Shiv Dhar, TA Spring 2026

1. Lab 4 intro
  1.1 How to read the map
  1.2 Objectives
2. Debugging
  2.1 tmux + gdb
  2.2 log_printf
3. Key tools/functions
  3.1 pageinfo
  3.2 Page tables
  3.3 Checks for invariants
4. Overview of steps
5. WeensyOS
6. Q+A


---------------------------------------------------------------------

1. Lab 4 intro
    - Lab 4 has you writing parts of the kernel of a lightweight but functional OS called 
      WeensyOS, implementing virtual memory...
      - in a sense from scratch since the existing kernel doesn't *really* provide "virtual" 
        memory
      - but not really since you have a bunch of helper functions and a neat visualization 
        to help you along, which make the problem a lot easier
    - As a result, Lab 4 requires solid conceptual understanding of virtual memory, ideally
      before you write your first line of code
      - The actual volume of code you need to write is pretty limited, my solution last semester
        added slightly more than 100 lines of new code to the provided template

1.1 How to read the map
    - Upper map represents physical memory, lower map cycles through every process' view of 
      virtual memory
      - Why is the lower map larger? How much should it be larger by?
    - Each of the maps is just a grid of all the possible pages - each cell represents a 
      page, and encodes two important pieces of information
      - First, the owner of the page is named in the cell 
        - pid for user-level processes
        - 'K' for kernel
        - 'R' for reserved
      - Second, in the lower map, a cell is in "reverse video" (black text and colored 
        background) if the process can access that page

1.2 Objectives
    - The WeensyOS kernel provided is functional, but is really casual with memory
      - Memory is not isolated -> every process can mess with everything, including kernel 
        memory
      - Memory is not well-utilized -> we are constrained by our choice to work with physical
        addresses, forcing us to block out a contiguous chunk of memory for each process
    - The goal is to fix these issues: we will fix permissions, and decouple virtual addresses
      from physical addresses to enable processes to use all the available memory
      - (And also implement fork!)

    - p-allocator.c
      - For context, we are dealing with processes that essentially do nothing but request 
        memory
      - All this program does is find the end of process code/data, and start requesting more
        and more memory from there, until the bottom of the stack (easy to define, stack is 
        just one page) is hit OR the page allocation syscall returns an error <- note that,
        before you implement VM, these conditions occur together

Process Memory Visualization for p-allocator.c:

Start:
┌────────────────┐
│      Stack     │
└────────────────┘ <- stack_bottom
|                |
                  
|                |
                  
|                |

|                |

|                |
                 
├────────────────┤ <- heap_top, end
│      Data      │
├────────────────┤
│  Program Code  │
└────────────────┘

Some time later:
┌────────────────┐ 
│      Stack     │
└────────────────┘ <- stack_bottom
|                |
                  
|                |
                  
├────────────────┤ <- heap_top
│                │
│                │
│      Heap      │
│                │
│                │
├────────────────┤ <- end
│      Data      │
├────────────────┤
│  Program Code  │
└────────────────┘

Eventually:
┌────────────────┐
│      Stack     │
├────────────────┤ <- stack_bottom, heap_top
│                │
│                │
│                │ 
│                │
│      Heap      │
│                │
│                │
│                │
│                │
│                │
├────────────────┤ <- end
│      Data      │
├────────────────┤
│  Program Code  │
└────────────────┘
After this, spin forever...

    
2. Debugging
    - Debugging for this lab involves slightly more effort than past labs - because we need the
      visualization to observe/evaluate the functioning of the lab, we have to redirect print
      statements and run gdb in a separate window

2.1 tmux + gdb
    - tmux allows you to use and manage multiple windows running independent shell instances -
      useful tool in general, but critical use case for this particular lab is being able to 
      see gdb and your console at the same time
    - Basic tmux commands
      - tmux - start tmux
      - Ctrl+b - prefix key, press this before any other commands except exit
      - % - split vertical (side-by-side panels)
      - " - split horizontal (panels one on top of the other)
      - arrow keys - navigate between windows
      - exit - kill a window
    - Running gdb is mildly more annoying here - gdb runs in one shell instance and "attaches"
      to the running program in a separate shell instance
      - In one pane, type in "make run-gdb", which instructs the lab4 code to wait for gdb to
        attach to it
      - In the other pane, type in ./gdb-wrapper.sh, which is a shell-script that helps invoke
        gdb while saving you from having to deal with the details

2.2 log_printf
    - Using print statements is still a good way to identify where/when your code is breaking,
      but regular 'printf' will not be printed to the screen -> use log_printf instead, which
      writes the desired output to a temporary log that remains visible/accessible after the
      program has been terminated
      - Anywhere you'd want to print, just throw in a log_printf statement instead - the usage
        is the *same*, just a different function name
      - Once your program is done running (or, more likely if you're at this step, crashes), 
        navigate to /tmp (in the root directory of your Docker container) to find and inspect
        log.txt

    - Would personally recommend using both -> use print statements as a coarse-grained tool
      to identify roughly which function/loop/block your code is breaking in -> once identified, 
      set your breakpoint and use gdb to run until that breakpoint, then start single-stepping 
      so you can identify the actual instruction that breaks and determine what's wrong
      - There are a few loops in this lab - I personally liked using log_printf statements to 
        call out what the loop was doing on each iteration so I could quickly identify what 
        inputs were causing the instruction to error


3. Key tools and functions
    - The two critical sets/types of data structures that we need to worry about are a) the 
      pageinfo array, which stores information about all physical pages, and b) our page tables,
      which map virtual addresses to physical addresses for each process

3.1 pageinfo
    - Just an array of structs, with the struct at index i storing data about the ith phys page
    - Specifically, each struct stores the id of the process that owns the page, and the number
      of times that physical page is referenced
      - We never explicitly mark pages as "free" or "allocated" -> how do we check this? 
        - refcount == 0 for free pages
    - assign_physical_page <- provided function to make the required modifications to pageinfo
      when a page is allocated to some process
      - Important to distinguish this step from the mapping step! We are only recording data
        about the physical page. Enabling the process to address that page through a va (as
        provided through the hardware) is a separate step
    - find/allocate physical page <- NOT provided, you may wish to write a function that 
      traverses pageinfo to find (and, potentially, assign) a free physical page, will be
      pretty handy throughout the lab

3.2 page tables
    - WeensyOS runs on (simulated) x86-64 architecture
      - How many levels of page tables do we have? -> 4
    - Two helper functions help us read from and write to the page tables really cleanly
    - virtual_memory_map
      - This function is the main tool for creating and modifying PTEs in this lab
      - To create a mapping in a given set of page tables, we provide the following fields
        to the function:
        - pagetable
        - va
        - pa
        - sz <- useful if you want to map a block that is multiple pages long
        - perm <- we provide an int that, in binary, has set and unset bits corresponding
          to the permissions that exist or do not
          - In the context of this lab, we have PTE_P (Present), PTE_W (Writeable), PTE_U
            (User-Accessible)
          - Test a bit using & (bitwise-and)
          - Set a bit using | (bitwise-or)
        - allocator <- keep in mind that we may not always have the page tables we need to
          represent a given mapping! virtual addresses represent a bunch of concatenated
          *indices*, so the PTEs for a mapping can't just go anywhere in the page tables
          -> we give virtual_memory_map an allocator function it can call if/when it needs 
          to allocate new physical pages for additional page tables
    - virtual_memory_lookup
      - Now that we understand virtual_memory_map, this is straightforward -> tell the
        function what VA you want to inspect and which page table to look in, and it will
        return a struct with the physical page number, physical address (returning both
        is somewhat redundant, why?) and the permissions

3.3 Checks for invariants
    - check_page_table_mappings
      - Checks kernel memory to make sure it is identity-mapped and writable -> doesn't
        check user memory, but at least makes sure that none of your operations clobbered
        kernel memory
    - check_page_table_ownership
      - Checks ownership of the pages that hold the page tables -> manually traverses the
        given page table starting with L1, and checks that each of the physical pages 
        containing the page tables has the correct refcount and owner marked in pageinfo
    - check_virtual_memory
      - Runs the above two for every set of pagetables
      - Also checks that every physical page is either free or owned by an *active* process
        - If a physical page is owned by a process that has exited -> memory was not freed
          at exit and is now being wastefully occupied as a result -> memory leak, very bad


4. Overview of steps

Step 1: isolate kernel memory
    - Initially, WeensyOS allows any process to access any valid page in memory
      - Lowest hanging fruit -> we want to ensure user-level processes don't touch
        kernel memory
      - Because we're just thinking about two classes of processes (kernel and user-level)
        we don't really need to make whole new sets of pagetables yet, because we have ways
        to write PTEs that treat kernels and user-level processes differently
      - We're also still dealing with a shared pagetable for all, which simplifies the work
      - Just need to modify permissions (using virtual_memory_map) for kernel memory
 
Step 2: create process pagetables, implement process isolation
    - In reality, step 1 doesn't give us nearly enough granularity
      - Ensuring user-level processes don't mess with the kernel is critical for safety,
        but we also need to ensure user-level processes don't mess with *each other*
      - We now implement proper memory isolation -> permissions bits are not enough here,
        so we need to give each process its own page table

    - Giving a process a page table is not complicated, every process has a field that 
      takes a pointer to its L1 page table
    - All we need to do is modify the portion of process_setup that sets a process' page 
      table -> instead of handing every process a pointer to the kernel's L1 page table, 
      we now need to make a new set of page tables for the process
      - We do this by making a copy of the kernel page table, but adding in one key
        deviation -> everything in the user area of memory (PROC_START_ADDR onward) should
        be user-inaccessible
      - Can be done by keeping all those mappings in but setting permissions, but better to
        just get rid of those mappings entirely
        - Later in the lab, we will implement virtual page allocation, i.e. va != pa, and 
          having a bunch of identity mappings sitting in our pagetable isn't a very good 
          idea for consistency -> we could end up with the same physical page mapped twice
    
      - Now, when a process is allocated a page, that page gets mapped into its pagetable
        and access will be granted, but everything else in the user area that hasn't been
        mapped for that pagetable will remain inaccessible -> we've implemented process 
        isolation!


    - Main challenge of this step: copying the kernel pagetable to create a new per-process
      pagetable while finding a way to get rid of mappings in the user area
    - Remember, page tables live in memory, we need to worry about actually allocating 
      the page tables themselves, you can:
      - Allocate them all and link manually <- make sure you're thinking about how many 
        pages you need - weensy memory is very small
        - Can we just memcpy the old pagetables into new pages in memory?
           - No! The links between pagetables will be completely incorrect - the new L1 
             pagetable for our process will end up pointing to the *kernel's* L2 pagetable
      - Allocate just the L1, and let virtual_memory_map handle the rest, assuming you've
        written an allocator already
        - I *need* an L1 to have something to give virtual_memory_map, but don't need 
          anything beyond that -> the function can allocate the necessary L2-4 page tables 
          anytime it wants to map something for which all 4 pagetables don't exist yet 
          (pretty infrequent, for this lab)


Step 3: implement virtual page allocation (va != pa)
    - Now, we implement virtual page allocation, saying virtual addresses can refer to 
      totally different physical addresses
    - Let's step through the pre-existing page allocation protocol:
      - When a process wants a page, it'll invoke the kernel through a system call, and 
        pass in the desired address -> this address is how the process will see the page
        it wants -> i.e. this is the *virtual address*
      - Right now, INT_SYS_PAGE_ALLOC just attempts to assign the physical page at
        that address -> va = pa all the time, which really constrains how we can allocate
        memory
        - Because giving processes the illusion they have a big, contiguous block of memory
          will require *actually* allocating a big, contiguous block of physical memory
    - Instead, we rewrite INT_SYS_PAGE_ALLOC to find a physical page - *any* physical page
      in user memory - and just map the desired va to that pa using virtual_memory_map
    - This is a deceptively simple step for what it accomplishes - since we've already gone 
      to the trouble of implementing per-process page tables, this actual step just takes a 
      few lines of code to decouple VAs from PAs and suddenly memory looks very different

Step 4: assign overlapping virtual address spaces
    - We until now have worked with processes that have a pre-defined set of addresses
      assigned to them as their block, and only ever requested addresses in that block
      - Recall p-allocator.c
    - This was helpful when we were trying to assign the exact physical page requested
      every time, but not needed anymore -> processes can now request the same virtual 
      addresses as one another, or even request addresses beyond the limit of physical
      memory (but within the limit of virtual memory) because our procedure for finding
      a physical page to satisfy a request is completely disconnected from the virtual 
      address requested, which lets us make really good use of all of our physical memory 
      while keeping processes very happy
    - All this takes is throwing away our previously set limits and setting each process' 
      stack to start from the very end of virtual memory (stacks are only one page long, so 
      the effect is that the last page of VM is the stack, and everything in between the end 
      of program data/code and that last page is fair game for heap allocations -> a lot of 
      space!

Step 5: implement fork()
    - What exactly does fork() need to do?
      - Create a new process that is essentially identical to the parent (forking) process
      - From the *system's perspective* - the child is a different process, gets a different 
        pid and proc struct
      - From the *parent's perspective* - business as usual, fork() returns the child id
        and execution continues
      - From the *child's perspective* - the child is a near-exact copy of the parent, with
        two minor deviations
        - What does it mean to be a near-exact copy? 
          - In terms of execution, we really care about the view of memory, the values in 
            the registers, and other stuff like file descriptors (that this lab won't address)
          - Convince yourself that if I give two processes the same view of memory, registers,
            and environment, they should execute identically
        - So, what are the deviations?
          - fork() returns different values in the parent and child
            - In the parent, fork() returns the child's pid
            - In the child, fork() returns 0
            - How do we achieve this?
              - fork() executes in the kernel, which is helpful in two ways: neither process
                is actually running, so we only need to deal with saved registers in memory
                rather than with active registers + we have a sort of "bird's eye" view of
                both processes and can easily manipulate their data
          - *View* of memory is the same, but parent and child should not write to the same
            physical pages -> anything writable should be on separate physical pages, but 
            look the same to the parent and child (i.e. have the same VA in their pagetables)
            - Remember that, at least immediately after the fork, the parent and child
              execute the same code -> both will execute instructions that reference the same
              virtual addresses
            - For anything read-only, separation is not needed!
            - Think about something like Brightspace (as an analogy, not an example) 
              - Things like my course content and announcements are all read-only, totally 
                fine to maintain one copy that everyone reads from
              - Things like my assignment submissions -> I navigate to these the same way
                as anyone else would, but uploading my assignment would of course have to 
                write to a different folder than what my classmates write to, which is the 
                same idea as above!
              - More concretely, when I write to some variable in my code, that variable
                is compiled down to some memory reference -> I need an instruction like
                "move 5 to 0x10000" to behave identically in the parent and child, while
                not *actually* writing to the same physical location


5. WeensyOS
    - Ideally, in doing this lab, you should attempt to understand as many of the moving
      parts of WeensyOS as possible, because it'll help you understand OS as a whole better
      - Some selected points below, but I'd encourage you to read more of the kernel code

    - Scheduler
      - Pretty simple round-robin, just cycles through and runs the first runnable process

    - p-fork.c
      - Kernel sets up a single process, and the process calls fork (the one you implement!)
        two times
      - How many processes should I end up with? Why?
         - 4, because fork is first called only in the parent, then called in both the parent
           and child, for 3 total forks -> 1 parent + 3 new children = 4 threads

    - process_setup
      - Call process_init to initialize certain special-purpose registers
      - Set the pagetable
      - Load code/data
      - Tell the process where its block of process memory is
      - Give the process a stack top, map it into the page table
      - Set as runnable

    - run
      - Make sure the target process is indeed runnable
      - Set the value of 'current' so the kernel knows what pid is active
      - Set pagetable
      - Set registers


6. Q+A