CS202 Review Session 5
Andrew Hua, TA Spring 2026

0.      Introduction

1.      Why Virtual Memory?
1.1     The Lie
1.2     Paging
1.3     Address Translation

2.      Page Tables
2.1     Linear
2.2     Multi-Indexed

3.      Conclusion
4.      Extra: Huge Pages
5.      Problems
6.      Up next
7.      Q&A
8.      References
9.      Answers to Problems

----------------------------------------

0.  Introduction
    - Reinforce understanding on how virtual memory works
    - Will be a combination of explanations and problems building upon them
    - All notes, extras, recordings will be posted to their usual locations

1.  Why Virtual Memory?

1.1 The Lie
    - OS's goal with virtual memory is to give applications (and the programmers who write them) the illusion that they
    control a large amount of "virtual" memory, while managing the physical memory and providing it to processes
    - Process, when accessing memory address, does not know the physical address which virtual address corresponds to
    - In order to create this illusion, the OS and computer hardware have to work together
    - Whenever memory address is dereferenced by a program (instruction fetch, read/write), MMU has to translate VA->PA
    - OS maintains data structures, such as page tables, which MMU reads off of, also handles bad requests(faults)

1.2 Paging
    - Imagine that for every byte of virtual memory, we stored how the byte would be mapped to a physical byte, what's bad?
    - First off: inefficient in terms of overhead, might need as many more bytes to store mapping than actually used
    - Second off: Fails to take advantage of spatial locality
    - So, paging is the idea that we treat the "page" as the base unit of virtual memory operations
    - Group memory into chunks of fixed size, then only care about mapping virtual pages to physical pages
    - Means that we need to keep track of fewer mappings
    - Size varies, but can nicely align with hardware of RAM, disks

1.3 Address Translation
    - So, given these pages, our question is how to map virtual address to physical address?
    - At the most abstract level, we can imagine the job of the MMU to use a mapping from VPN to PPN to translate addresses
    - Each process has some mapping function, requiring extra information about the mapping function to be stored somewhere
    - Importantly, if each process has its own mapping function, then VA of one proc bears no relation to same VA in another proc
    - Ignore the top 16 bits, take bottom 48 bits as virtual address
        - How many bytes are addressable?
        - Given pages are 4 KiB (2^12 bytes), can we tell how many pages?
        - Can we tell how many physical pages? (No, depends on system)
    - Translate VPN to PPN, copy offset 

2.  Page Tables
    - Big question: How does OS store this mapping between VPN and PPN? How to make it space-efficient? Time-efficient? Both?

2.1 Linear
    - What is the simplest way to store a list?
    - First idea: create an array, use the VPN as an index to get to the page number
    - If each process had one of these lists, then same VA in two procs could map to different PA by changing this data structure
        - Emphasize this point, this creates separation between processes
    - Wait, how many pages? If each entry in this list needs 8 bytes(take as given), how many bytes is that? For just *one* process?
    - Okay, this method is clearly unfeasible, we need a way to not store redundant area

2.2 Multi-Indexed
    - Mike mentioned created a 512-ary (512 children max per node) tree, what does that mean?
        - If we start from a root table, view that as root node with entries that link to other pagetable nodes with entries that... etc.
        - Define each layer as honing in on the actual pagetable entry with the actual PPN we care about
        - Remember, this is all just taking a VPN and translating it to PPN
    - Now, instead of treating the VPN as one large number, split it into chunks
    - For example, the x86-64 architecture splits the 36-bit VPN into 4 indices of 9 bits each
    - Use each index to figure out which child to descend into, gives traversal of the pagetable tree
    - Index "walk" (HANDOUT)
       - Imagine that you are the MMU, and that I have given you a pagetable for accessing pages relevant to certain topics
        - The pagetable has 2 layers, first index is the first letter, second index is everything that follows (we ignore offset)
        - Now, let's work through translating "VPN" Algol to its "PPN"
            - Split "VPN" into indices "A", "lgol"
            - First index into top pagetable, find page, second index into sub-pagetable, find PPN
        - If you have an 'A' copy, what page is Algol found at? If 'B'?
            - Different pagetables => different mappings for same virtual address
        - What does invalid mean?
        - Pagetables can have different tree structures, note how 'A' has "S" page, 'B' has "T" page
    - Analogously, how does walking through x86-64 structure actually work?
        - Use first index to index into L1 pagetable, pagetable entry yields PPN of L2 page table
            - How many entries are there in an L1 pagetable?
        - Use second index to index into L2 pagetable, pagetable entry yields PPN of L3 page table
            - How many possible pages are accessible from a given L2 pagetable?
        - Use third index to index into L2 pagetable, pagetable entry yields PPN of L3 page table
            - How many possible L3 pagetables are there if virtual address space fully used?
        - Use last index to index into L4 pagetable, pagetable entry yields desired PPN
            - How many memory accesses needed to walk+access memory?
        - Extra question: If each pagetable entry uses 8 bytes, why is a 9-bit index extra nice?
    - What are some differences between this analogy and x86-64?
        - 2 levels instead of 4
        - Offset nonexistent (already mentioned, but want to emphasize difference)
        - Virtual page space different type from physical page space (translate "leading character" to 9 leading bits of VPN)
        - Different way of splitting "VPN"
        - Pagetables separate from physical memory, you'll see the pagetables are actually stored in memory
    - Why is multi-indexing so useful?
        - Sparsity: We don't need a page for 'B' entries, nor for 'D', ...
        - Let's say each pagetable takes a page, in our index how many pages would we need for the full alphabet?
            - How many pages do we actually use in this case?
        - Correspondingly: We don't need L2 page tables for the blank space
        - This is really important!
    - Problems:
        - How many pages in the x86-64 architecture needed to allocate just one byte of memory?
        - How many pages needed to allocate 5 pages worth of memory?
        - 2^9? (both best and worst case)
        - 2^9+1? (in best case)
        - What if I spawned 32 processes, each allocates 8 pages, in best case?
        - Finally, 2 processes, one which allocates 1 GiB (2^30), another which allocates 1 KiB (2^10) (again, best case)
    - What are the costs of multi-indexed page tables?
        - Greater number of memory queries required per address if we solely translate using page tables
            - If no caching, how many memory accesses needed to execute a simple instruction?
            - 0x500    movq 0x200000, %rax
        - TLB solves this by caching the page translations that are used the most
            - If both previous page translations cached, but not actual memory cached, how many memory accesses needed to then exec?
            - 0x504    movq 0x200008, %rbx
        - Time-space-complexity tradeoff

3.  Conclusion
    - What do we get from all this?
        - Complexity without careful decision-making is unnecessary
    - Programmability
        - Each process gets its own mapping without having to worry about other processes, OS and hardware handle all messiness
    - Process isolation
        - Each process has its own mapping, so there is no way for it to define another process's memory
    - Over-allocation of memory
        - Will only touch on briefly, but does a valid VA need to always point to a valid PA?
        - Demand paging is just the operating system + hardware extending that lie, saying pages "exist" in memory when not actually
            - To go back to index analogy, imagine if I told you that it was page 200, but needed to fetch that page from a filing cabinet
        - Raises more questions
        - This is another degree of caching: Have large but slow swap space, fast but small RAM, how to manage swapping to avoid slow disk?

4.  Huge Pages [if time]
    - There are ways to define pages that are larger than our usual 4 KiB size, for caching reasons (see TLB in next classes)
    - If an entry on an L3 page table pointed straight to a huge page, what size would make the most sense for this huge page?
    - How many such pages could we allocate at once?
    - If we wanted to allocate 2 huge pages and 2000 regular pages, what is the minimum number of pages that we'd need to allocate?
    - We can go even further beyond, can define a 1 GiB (2^30 byte) page
    - What level pagetable makes the most sense for entries that point to these gigantic pages?

5.  Problems
    - Collection of problems from review session, along with a few new ones
    - Problems I consider harder marked (*)
    - Unit reference (using binary units to be precise, if you don't understand this ignore the 'i'):
        - 1 KiB = 2^10 bytes ~ 10^3 bytes
        - 1 MiB = 2^20 bytes ~ 10^6 bytes
        - 1 GiB = 2^30 bytes ~ 10^9 bytes
        - 1 TiB = 2^40 bytes ~ 10^12 bytes
        - 1 PiB = 2^50 bytes ~ 10^15 bytes
    - Warmup: Given the following number of bits in a virtual address, physical address, and pagesize, calculate the following values:
        - VA: 56 bits, PA: 48 bits, pagesize: 16 KiB
        - VPN bits
        - PPN bits
        - Offset bits
        - How many virtual pages possible?
        - How many physical pages possible?
    - X86-64 pagetable rapid-fire from 2.2 (Multi-indexed):
        - How many entries are there in an L1 pagetable?
        - How many possible pages are accessible from a given L2 pagetable?
        - How many possible L3 pagetables are there if virtual address space fully used?
        - How many memory accesses are needed to access a given memory address, assuming no faults or errors?
        - (*) If each pagetable entry uses 8 bytes, why is a 9-bit index extra nice?
    - Page allocation from 2.2 (Multi-indexed):
        - How many pages in the x86-64 architecture needed to allocate just one byte of memory? [draw pagetable tree]
        - How many pages needed to allocate 5 pages worth of memory? [extend tree by appending 4 more pages]
        - 2^9? (both best and worst-case) [prompt for both optimal and pessimal cases, ask for what the pagetable tree looks like in each case]
        - 2^9+1? (in best case)
        - What if I spawned 32 processes, each allocates 8 pages, in best case?
        - (*) Finally, 2 processes, one which allocates 1 GiB (2^30), another which allocates 1 KiB (2^10) (again, best case)
    - Memory reads from 2.2 (Multi-indexed):
        - Greater number of memory queries required per address if we solely translate using page tables
            - If no caching, how many memory accesses needed to execute a simple instruction?
            - 0x0FF8    movq 0x200000, %rax
        - TLB solves this by caching the page translations that are used the most
            - If both previous page translations cached, but not actual memory cached, how many memory accesses needed to then exec?
            - 0x1000    movq 0x200008, %rbx 
    - (*) Huge Pages from 4. (Huge Pages):
        - If an entry on an L3 page table pointed straight to a huge page, what size would make the most sense for this huge page?
        - How many such pages could we allocate at once?
        - If we wanted to allocate 2 huge pages and 2000 regular pages, what is the minimum number of 4 KiB pages needed to allocate?
        (treat allocating 1 huge page as allocating the same amount of memory's worth in regular pages)
        - We can go even further beyond, can define a 1 GiB (2^30 byte) "gigantic" page
        - What level pagetable makes the most sense for entries that point to these gigantic pages?
    - x86 Mutations
        - In the context of x86-64 architecture, if each pagetable entry takes 8 bytes, how many bytes would be needed to store
        a complete linear pagetable? (binary and human form) How does this compare to modern-day computer capabilities?
        - The x86 32-bit architecture, obviously, uses 32 bits for virtual addressing
            - Instead of 4 pagetable layers, it has 2, while pages remain at 4 KiB
            - How many bits are given to each pagetable index?
            - How many entries would each pagetable have?
            - How large is each pagetable, assuming each entry takes 4 bytes?
            - (*) In each pagetable entry, how many bits are unused by the PPN, assuming PPNs are 20 bits?
        - (*) Imagine we changed the x86-64 architecture a little
            - Instead of 4 PTE layers, we set up 3
            - Uneven, L1 index takes 18 bits, L2 and L3 each take 9 bits
            - How large would the L1 pagetable be? (binary and human)
            - If I wanted to allocate 1 page under this regime, how many physical pages needed (physical pages remain 4 KiB)?
            - 513? (best case)

6. Up next
    - Next lecture, Mike will go over the exact details of x86-64 virtual memory workings and TLB
    - After midterm to explain how demand paging exactly works, including page faults, eviction policies

7. Q&A
    - Midterm studying
        - Use practice midterms
        - Work on making the meet sheet yourself, by working on this you get to know what you're weak at
    - Deadlocks

8. References
    - x86-64 reference (1st page of https://cs.nyu.edu/~mwalfish/classes/25sp/lectures/handout09.pdf)
    - Blog for more huge pages explanation: https://www.hudsonrivertrading.com/hrtbeat/low-latency-optimization-part-1/

9. Answers to Problems
    - Warmup
        - 56 - log_2(16 * 1024) = 56 - 14 = 42 bits
        - 48 - 14 = 34 bits
        - log_2(16 * 1024) = 14 bits
        - 2^42 = 4 trillion
        - 2^34 = 16 billion
    - Rapid-fire
        - 512, 2^9 indices for an L1 pagetable
        - 2^27 = 128 million, L2 can reach 2^9 L3, each of which can reach 2^9 L3, each of which can reach 2^9 L4 so 2^(9+9+9)
        - 2^18 = 256 thousand, L1 can reach 2^9 L2, each of which can reach 2^9 L3, so 2^(9+9)
        - 5, walk L1->L4, then read actual page
        - (*) There are 512 entries per pagetable, so 8 bytes each means that a pagetable takes 2^12 bytes, which is a page
    - Page allocation
        - 5 (L1 -> L2 -> L3 -> L4 -> actual page)
        - 9 (L1 -> L2 -> L3 -> L4 -> actual pages)
        - 2^9 + 4 in best case (ditto), 4 * 2^9 + 1 = 2^11 + 1 = 2049 in worst case (each page requires own L2, L3, L4)
        - 2^9 + 6 (second L4 page table needed)
        - 32 * (8+4) = 2^7 * 3 = 384, each process needs 12 pages to allocate 8 pages
        - (*) 2^18 + 2^9 + 5 (L1 -> L2 -> L3 -> 2^9 L4 -> 2^18 pages for big + 5 pages for small)
    - Memory reads
        - 2 * 5, each memory access requires full walk of pagetables
        - 6, TLB caches page translation for latter read, not for former
    - (*) Huge Pages
        - 2^21, take L4 index bits transfer to offset
        - 2^27 (all 2^27 PTEs in all possible L3 pagetables point to a huge page)
        - 3 + 2 * 512 + 4 + 2000 = 3037 (L1 -> L2 -> L3 -> 2 huge pages and 4 L4 which point to regular pages)
        - L2, by same logic as first question, a 2^30 byte page needs 2^30 offset bits, so take L3 and L4 indices, so L2 points directly to gigantic
    - x86 Mutations
        - x86 32-bit
            - 2^36 virtual pages * 2^3 bytes per PTE = 2^39 bytes = 512 GB, which vastly outstrips personal computers
            - 10 bits (20 VPN bits / 2)
            - 2^10 = 1024 PTEs
            - 4096 bytes (1024 PTEs * 4 bytes / PTE)
            - (*) 12 bits (32 bits in PTE - 20 bits for PPN)
        - Collapsed indices
            - 2^21 bytes = 2 million bytes (2^18 PTEs * 2^3 bytes per entry)
            - 515 (512 pages for L1 + 1 L2 + 1 L3 + 1 actual page)
            - 1028 (512 pages for L1 + 1 L2 + 2 L3 + 513 actual pages)