CS202 Review Session 6 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Charlie Chen, TA Spring 2023 Edited by Sophia Watts, TA Spring 2024 Edited by Saeed Bafarat, TA Fall 2024 1. Lab 4 overview 2. Tips 3. Q&A 1. Lab 4 overview Recap of the main points of the previous review session: Goal: Build a mini OS, called WeensyOS. We focus on building the kernel and see how it manages virtual memory. We will implement the ability to allocate memory and fork. Fork() is a system call that allows duplicating process's virtual memory. For extra credit, you can work on freeing memory and share read-only memory. a. WeensyOS data structures - Pageinfo array: an array of physical page. Each entry is a struct of type physical_pageinfo, which has owner and reference count. - processes array: an array of type struct procs, indexed by its pid. Each entry represents process control block. The struct contains process id, the process's state, registers and its pagetables b. MACROS - Lots of helpful macros in lab 4. Read them in the lab writeup and be familiar with them. They come in handy when converting between VA -> PA, PA -> index, etc... WeensyOS begins with the kernel and all processes sharing a single address space. This is defined by the kernel_pagetable. Kernel's pagetable is identity-mapped: Virtual address X maps to physical address X. As you work through the project, you will shift processes to use independent address space where each process can access only a subset of physical memory. Most of the work takes place in kernel.c file. It's helpful to understand some functions defined in other files. The general rule of thumb is: k-*.c/.h defines kernel-related things. p-*.c/.h defines process-related things. Throughout the lab, you will need to use these functions so it's best to understand how they work on a high level: - virtual_memory_lookup: Lookup a physical page using pagetable and virtual memory. - virtual_memory_map: map virtual address -> physical address in the provided pagetable. If there's an allocator function, it will allocate L4 pagetable on demand. - assign_physical_page: assign the owner to the provided physical page. Don't modify this directly. Step 1: Kernel Isolation Initially, an application is allowed to access all of the memory. Any process can modify any part of the physical memory, including memory that belongs to the kernel. Therefore, our job is to add isolation to the kernel page table such that each process cannot access them. The lab writeup is enough to get this working. Look into the virtual_memory_map function here and see the arguments which it requires to understand how to use it. Step 2: After step 1, processes no longer access the kernel memory. But they are still sharing the same address space by accessing kernel's pagetable. ``` processes[pid].p_pagetable = kernel_pagetable; ``` In this step, we will give each process its own independent page table to access its own pages. High level logic: - Allocate a new physical page for process pid. Zero out the page (Hint: memset) - Iterate through each entry, use virtual_memory_lookup and virtual_memory_map with an allocator function to map the addresses accordingly. To do this you need: - The allocator function which looks through the pageinfo array for a free page then assigns it using assign_physical_page function -copy_pagetable function which should allocate a new page using the allocator function you have created and then copy the mapping using virtual_memory_lookup and virtual_memory_map Note: owner and refcount of a physical page needs to be set properly (Hint: Use assign_physical_page ) Step 3: So far, WeensyOS processes use physical page allocation for process memory. The process would ask for a virtual page X and we would use identity mapping to find and map to physical page X. However, this is not necessary. Given the new page table view for individual process, it doesn't have to care which physical page it gets. If you did step 2 correctly, modification to step 3 is straightforward. Finally, we use virtual_memory_map to map requested virtual memory to the newfound physical memory. A common mistake we encounter is not checking the validity of requested memory. A process can request a bad memory address so remember to do a sanity check. Example: Suppose a process requests to map virtual page 0x400000. Before mapping, you should first check whether it falls within MEMSIZE_VIRTUAL otherwise it is an invalid mapping Step 4: Processes are isolated but they are not taking full advantage of virtual memory. Isolated processes don't have to use disjoint address for their virtual memory. You would want to change each process to use the same virtual address for different physical memory. For instance: Process 1: VA 500 -> PA 1000 Process 2: VA 600 -> PA 2000 [Made up addresses for example]. Process 2 VA 500 is free but it doesn't use it to map to PA2000 because Process 1 already use VA 500. However, they have their own page table so this is not necessary. Each process should take advantage of its own page table. In this step, we change it so that the stack page is mapped using the same virtual address in each process's page table. This is done in process_setup. Step 5: Fork() The fork system call creates a new child process by duplicating the calling parent process. The fork system call appears to return twice, once to each process. It returns 0 to child process and child's pid to parent process. High level logic: - Copy the parent process page table into a new page table for the child. - Map using virtual_memory_lookup and virtual_memory_map - Allocate a new physical page if that page is writable. Copy the content of the memory into child's page table (Hint: memcpy) -Copy the registers of the parent - Set the owner and refcount correctly -Remember to return twice once for the child process and once for the parent. 2. Tips - Start on-time, which is much, much earlier than you think!!! This one takes much longer than you might realize. - Read the instructions (maybe multiple times)! - The lab doesn't require a lot of lines of code to finish so if you are writing too much you might be overcomplicating it - Have mental images in minds - Just as what we have done in the handout - Read the descriptions of the functions - header files (*.h) are helpful - use shortcuts to go to the definition of the variables/functions - Don't use ChatGPT - It writes buggy codes. If you are not familiar with the concepts, it's hard to spot the error in the codes. It may be harmful to your understanding. - If you are thinking how to "train" ChatGPT to work, you have less brain power to understand and think deeply through the lab. - Think and ask yourself - What data structure am I looking for? - What functions/MACROS can I use? - What do I aim to implement? - Patience and focus - Try to allocate at least one hour whenever you work on the lab. One hour grind is more efficent than two half-hour intervals. - If you stuck, don't be frustrated. It happens! Maybe come back later and it will become clearer (remember to come back though). 3. Q&A Q: Do we need to write our own allocator? A: Yes, you do. Q: I follow the code but I can't see where alloc or fork is handled in the kernel. Where can I find them? A: The handler is defined in exception() function in kernel.c Q: What should we consider when copying the parent’s page table for fork()? A: Only allocate new physical pages for writable sections to avoid duplicating read-only segments