CS202 Review Session 4 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Charlie Chen, TA Spring 2023 Edited by Sam Frank, TA Spring 2024 1. Background Knowledge 2. Lab 4 overview 3. Tips 4. Q&A --------------------------------------------------------------------- 1. Background Knowledge **Virtual memory** is component of the operating system that helps OS safely run multiple applications atop the same physical memory. Each process gets its own virtual memory address space. These virtual addresses are mapped to specific physical addresses. This gives the process an illusion of a contiguous memory space in which only its data exists. **Kernel** is the core program of the OS that runs with full machine privilege to manage processes and their virtual memory. Its main goals are to share machine resources among processes, and provide convenient and safe access to hardware while protecting the OS from malicious programs. In Lab 4, you will write a mini OS, WeensyOS, that implements the virtual memory architecture and a few important system calls. The OS supports 3MB of virtual memory on top of 2MB of physical memory. Recall the point of virtualization, from the perspective of the process, it thinks it has 3MB of memory. But in reality, it doesn't. Q: Assume page size to be 4KB, each entry in the page table is 64 bit. How to we support 3MB of virtual memory? How many L4 pagetable do we need? A: 2 L4 page tables 2. Lab 4 overview Goal: Build a mini OS, called WeensyOS. We focus on building the kernel and see how it manages virtual memory. We will implement the ability to allocate memory and fork. Fork() is a system call that allows duplicating process's virtual memory. For extra credit, you can work on freeing memory and share read-only memory. a. WeensyOS data structures - Pageinfo array: an array of physical page. Each entry is a struct of type physical_pageinfo, which has owner and reference count. ```kernel.c typedef struct physical_pageinfo { int8_t owner; int8_t refcount; } physical_pageinfo; static physical_pageinfo pageinfo[PAGENUMBER(MEMSIZE_PHYSICAL)]; ``` - processes array: an array of type struct procs, indexed by its pid. Each entry represents process control block. The struct contains process id, the process's state, registers and its pagetables ```kernel.h typedef struct proc { pid_t p_pid; // process ID x86_64_registers p_registers; // process's current registers procstate_t p_state; // process state (see above) x86_64_pagetable* p_pagetable; // process's page table } proc; ``` b. MACROS - Lots of helpful macros in lab 4. Read them in the lab writeup and be familiar with them. They come in handy when converting between VA -> PA, PA -> index, etc... WeensyOS begins with the kernel and all processes sharing a single address space. This is defined by the kernel_pagetable. Examples [see handout] Kernel's pagetable is identity-mapped: Virtual address X maps to physical address X. As you work through the project, you will shift processes to use independent address space where each process can access only a subset of physical memory. Most of the work takes place in kernel.c file. It's helpful to understand some functions defined in other files. The general rule of thumb is: k-*.c/.h defines kernel-related things. p-*.c/.h defines process-related things. Throughout the lab, you will need to use these functions so it's best to understand how they work on a high level: [code walkthrough] - virtual_memory_lookup: Lookup a physical page using pagetable and virtual memory. - virtual_memory_map: map virtual address -> physical address in the provided pagetable. If there's an allocator function, it will allocate L4 pagetable on demand. - assign_physical_page: assign the owner to the provided physical page. Don't modify this directly. Step 1: Kernel Isolation Initially, an application is allowed to access all of the memory. Any process can modify any part of the physical memory, including memory that belongs to the kernel. Therefore, our job is to add isolation to the kernel page table such that each process cannot access them. The lab writeup is enough to get this working. Step 2: After step 1, processes no longer access the kernel memory. But they are still sharing the same address space by accessing kernel's pagetable. ``` processes[pid].p_pagetable = kernel_pagetable; ``` In this step, we will give each process its own independent page table to access its own pages. High level logic: - Allocate a new physical page for process pid. Zero out the page (Hint: memset) - Iterate through each entry, use virtual_memory_lookup and virtual_memory_map with an allocator function to map the addresses accordingly. Note: owner and refcount of a physical page needs to be set properly (Hint: Use assign_physical_page) 3. Tips - Read the instructions (maybe multiple times)! - Have mental images in minds - Just as what we have done in the handout - Read the descriptions of the functions - header files (*.h) are helpful - use shortcuts to go to the definition of the variables/functions - Don't use ChatGPT - It writes buggy codes. If you are not familiar with the concepts, it's hard to spot the error in the codes. It may be harmful to your understanding. - If you are thinking how to "train" ChatGPT to work, you have less brain power to understand and think deeply through the lab. - Think and ask yourself - What data structure am I looking for? - What functions/MACROS can I use? - What do I aim to implement? - Patience and focus - Try to allocate at least one hour whenever you work on the lab. One hour grind is more efficent than two half-hour intervals. - If you stuck, don't be frustrated. It happens! Maybe come back later and it will become clearer (remember to come back though). Q: Do we need to write our own allocator? A: Yes, you do. Q: I follow the code but I can't see where alloc or fork is handled in the kernel. Where can I find them? A: The handler is defined in exception() function in kernel.c