CS202 Review Session 5 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from Fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Charlie Chen, TA Spring 2023 Edited by Sam Frank, TA Spring 2024 Edited by Alex Liu, TA Fall 2024 Edited by Yash Pazhianur, TA Spring 2025 1. Background Knowledge 2. Lab 4 Intro 3. Lab 4 Overview 4. Tips 5. Q&A --------------------------------------------------------------------- 1. Background Knowledge Does everybody understand conceptually how virtual memory works? Why virtual memory: so OS can safely run multiple processes using same physical memory. gives process illusion of contiguous memory space where only its data exists. each process gets its own virtual address space. virtual address map to specific physical address. What is Kernel: core program of OS that runs with full machine privilege manages processes and their virtual memory Main Goals: share machine resources among processes provide convenient and safe access to hardware while protecting the OS from malicious programs 2. Lab 4 Intro In Lab 4, you will write: a mini OS (WeensyOS) implement virtual memory architecture and a few important system calls the OS supports 3MB of virtual memory on top of 2MB of physical memory. We will implement the ability to allocate memory and fork. For extra credit, you can work on freeing memory and share read-only memory. 3. Lab 4 Overview a. WeensyOS data structures - Pageinfo array: each entry is a struct of type physical_pageinfo. ```kernel.c struct physical_pageinfo { int8_t owner int8_t refcount } physical_pageinfo pageinfo[PAGENUMBER(MEMSIZE_PHYSICAL)] ``` pageowner: FREE RESERVED KERNEL USER PROCESS (PID) - processes array: array of process descriptors, indexed by its pid. ```kernel.h struct proc { pid_t p_pid x86_64_registers p_registers procstate_t p_state x86_64_pagetable* p_pagetable } proc processes[NPROC] ``` procstate_t: FREE RUNNABLE BLOCKED BROKEN b. MACROS - Lots of helpful macros in lab 4. Read them in the lab writeup and be familiar with them. They come in handy when converting between VA -> PA, PA -> index, etc... x86-64.h: #define PAGESIZE (1 << PAGEOFFBITS) #define NPAGETABLEENTRIES (1 << PAGEINDEXBITS) #define PAGENUMBER(ptr) ((int) ((uintptr_t) (ptr) >> PAGEOFFBITS)) #define PAGEADDRESS(pn) ((uintptr_t) (pn) << PAGEOFFBITS) #define PTE_P 1 #define PTE_W 2 #define PTE_U 4 kernel.h: #define NPROC 16 #define PROC_START_ADDR 0x100000 #define MEMSIZE_PHYSICAL 0x200000 #define MEMSIZE_VIRTUAL 0x300000 #define NPAGES (MEMSIZE_PHYSICAL / PAGESIZE) WeensyOS begins with the kernel and all processes sharing a single address space. This is defined by the kernel_pagetable. Kernel's pagetable is identity-mapped: Virtual address X maps to physical address X. As you work through the project, you will shift processes to use independent address space where each process can access only a subset of physical memory. Most of the work takes place in kernel.c file. It's helpful to understand some functions defined in other files. The general rule of thumb is: k-*.c/.h defines kernel-related things. p-*.c/.h defines process-related things. Throughout the lab, you will need to use these functions so it's best to understand how they work on a high level [share screen - code] - virtual_memory_lookup: Lookup a physical page using pagetable and virtual memory. - virtual_memory_map: map virtual address -> physical address in the provided pagetable. If there's an allocator function, it will allocate L4 pagetable on demand. - assign_physical_page: assign the owner to the provided physical page. Don't modify this directly. [share screen - gifs] Step 1: Kernel Isolation Initially, all processes can access/modify all memory, even kernel memory. Your job: modify kernel page table add isolation so user processes can't access kernel pages This part is straighforward. Just follow lab instructions. Step 2: Now processes no longer can access kernel memory. But they are still sharing kernel pagetable, so they share same address space. In this step, you will give each process its own independent page table, to access its own pages. High-level logic: - Allocate new physical page for process pid. Zero out the page (Hint - memset) - Iterate through each entry, use virtual_memory_lookup and virtual_memory_map with allocator function, to map the addresses accordingly. Note: "owner" and "refcount" of physical page need to be set properly (Hint - Use assign_physical_page) Step 3: So far, WeensyOS processes use physical page allocation for process memory. The process would ask for a virtual page X and we would use identity mapping to find and map to the same physical page X. However, this is extremely inflexible. A process shouldn't care which physical page it gets. We now "automate" the process of finding a free physical page. We then use virtual_memory_map to map the requested virtual memory to the newfound physical memory. A common mistake we encounter is not checking the validity of requested memory. A process can request a bad memory address so remember to do a sanity check. Step 4: Processes are isolated but they are not taking full advantage of virtual memory. Isolated processes don't have to use disjoint addresses for their virtual memory. You would want to change each process to use the same virtual address for different physical memory. For instance (what we have now): Process 1: VA 500 -> PA 1000 Process 2: VA 600 -> PA 2000 Process 2 VA 500 is free but it doesn't use it to map to PA2000 because Process 1 already use VA 500. However, they have their own page table so this is not necessary. Each process should take advantage of its own page table. In this step, we change it so that the stack page is mapped using the same virtual address (0x300000 == MEMSIZE_VIRTUAL) in each process's page table. Step 5: Fork() The fork system call creates a new child process by duplicating the calling parent process. The fork system call appears to return twice, once to each process. It returns 0 to the child process and the child's pid to the parent process. High-level logic: - Copy the parent process page table into a new page table for the child. - Map using virtual_memory_lookup and virtual_memory_map - Allocate a new physical page if that page is writable. Copy the content of the memory into the child's page table (Hint - memcpy) - Set the owner and refcount correctly 4. Tips - Start on-time, which is much, much earlier than you think!!! This one takes much longer than you might realize. - Read the instructions (maybe multiple times)! - Have mental images in minds - Just as what we have done in the handout - Read the descriptions of the functions - header files (*.h) are helpful - use shortcuts to go to the definition of the variables/functions - Don't use ChatGPT - It writes buggy codes. If you are not familiar with the concepts, it's hard to spot the error in the codes. It may be harmful to your understanding. - If you are thinking about how to "train" ChatGPT to work, you have less brain power to understand and think deeply through the lab. - Think and ask yourself - What data structure am I looking for? - What functions/MACROS can I use? - What do I aim to implement? - Patience and focus - Try to allocate at least one hour whenever you work on the lab. One hour grind is more efficient than two half-hour intervals. - If you are stuck, don't be frustrated. It happens! Maybe come back later and it will become clearer (remember to come back though). Q: Do we need to write our own allocator? A: Yes, you do. Q: I follow the code but I can't see where alloc or fork is handled in the kernel. Where can I find them? A: The handler is defined in exception() function in kernel.c