CS202 Review Session 7 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Jinli Xiao, TA Spring 2023 Edited by Sam Frank, TA Spring 2024 Edited by Saeed Bafarat, TA Fall 2024 Edited by Yash Pazhianur, TA Spring 2025 1. Background knowledge 1.1 An example 1.2 File system abstractions 1.3 inodes 1.4 Hard link vs soft link 2. Lab 5 overview 2.1 File system 2.2 Practice Problems 2.3 Inode_block_walk and Inode_get_block 2.4 FUSE --------------------------------------------------------------------- 1. Background knowledge 1.1 An example When you enter `cat ~/Documents/notes.txt` this is what happens: - start from home directory - go to directory `Documents` - read file `notes.txt` - display to console data for `notes.txt` is stored in hard drive of local machine don't have to remember where data is physically located (e.g. on the platter, the track of sector) instead, able to access data through intuitive interface provided by a file system Perspective: - user's view: path/filename point to byte array - file system's view: group of disk block 1.2 File system abstractions There are at least three key abstractions: file, filename, folder File: - reality: the data of file might not be placed next to each other on the disk - abstraction: to users files can be thought of as one contiguous string of bytes Filename: - reality: users don't have to remember any of the physical information about how the file is stored, the sector number where it starts for instance - abstraction: user assigns file with intuitive name Directory: - containers of other files or directories for nested/multilevel organization Q: How do we achieve this? A: In Unix, we use inodes, which is a fixed-size array of inode. An inode is a data structure that resembles an imbalanced tree. 1.3 inodes inode contains metadata such as: permissions timestamps (access/modify) link count size also contains pointers to: data block indirect pointer double indirect pointer also triple indirect pointer (if necessary) [draw picture of inode data structure - see whiteboard] inode is intentionally imbalanced to handle both small and large files if file is small, all data can fit in direct data blocks else, remaining data can be allocated in indirect pointer or double indirect pointer each inode is referred to by its i-number i-number -> inode data structure directory contents are implemented similarly to files each entry is a mapping filename -> inode to lookup file `~/Documents/notes.txt` the file system must: - look into home directory - query the name `Documents` (it maps to an inode) - look into `Documents` inode (using i-number) - query the name `notes.txt` - the file is reached 1.4 Hard link vs soft link hard link: - has the same inode as file it refers to - can't refer to directory or file on different file system soft link: - allocated its own inode - its content contains the path to the file it refers to - can refer to file on different file system - can refer to directory [draw picture of both link structures - see whiteboard] 2. Lab 5 overview 2.1 File system - There is only 1 region in which both inode and data block reside. Usually, they are divided into 2 separate regions: inode region and data block region. - Each inode is allocated its own disk block instead of being packed alongside other inodes in a single disk block. - this means lots of wasted space - Sector performs a read of 512B. The file system read in chunk of 4KB block size, which is 8 sectors. - Superblock is block 0. It holds metadata about the FS and pointer to the root directory. - Bitmap is an array of bits. Each bit at index i indicates if block i is allocated or not. 1 indicates it's free and 0 indicates it's used. - Each inode contains: - 10 direct data block. Each of size 4KB. - 1 indirect pointer. It has 1024 direct blocks. - 1 double indirect pointer. It has 1024 indirect blocks. 2.2 Practice Problems Note: the below calculation are using a 1-based index. This is different from lab 5 where we use a 0-based index. Q: Which table or where would 'fileblock'th numbered - 5? A: In the 5th direct data block. - 1032? A: (1032 - 10) = 1022. In the first indirect pointer at 1022th entry. - 2060? A: 2060 - 10 - 1024 = 1026. In the double indirect pointer, the first indirect pointer in that mapping and the 3rd entry in the direct pointer mapping. 2.3 `inode_block_walk` and `inode_get_block` Purpose of `inode_block_walk`: - Locate the disk block number corresponding to a specific block (`filebno`) in the inode (ino) - Return a pointer (*ppdiskbno) to the appropriate location High Level Logic: Direct Blocks: - If `filebno` is within the range of direct blocks, return the corresponding slot directly Indirect Blocks: - If `filebno` falls within the range of indirect blocks: Check if the indirect block exists - If not and allocation is allowed, allocate and clear the indirect block - Then return a pointer to the appropriate slot in the indirect block Double-Indirect Blocks: - If `filebno` requires traversal into double-indirect blocks: Check if the double-indirect block exists - If not and allocation is allowed, allocate and clear it. Navigate to the appropriate indirect block within the double-indirect block - If the target indirect block doesn't exist and allocation is allowed, allocate and clear it - Then return a pointer to the appropriate slot in the indirect block Hints: - Draw Visuals: - visualize direct, indirect, and double-indirect block structures - how would you go to a specific block within these data structures - Memory Initialization: - Ensure any newly allocated block is cleared before use - Helper Functions: - Consider extracting common logic into helper functions Purpose of `inode_get_block`: - Obtain the in-memory address of the `filebno`th block for a given inode (ino). - Allocate the block if it does not already exist High Level Logic: Ensure `filebno` is within the valid range for the inode Use `inode_block_walk` to locate or allocate the corresponding disk block for filebno If the block slot is empty: Allocate a new block using `alloc_disk_block` and initialize it Then update the block slot with the new block number and return that memory address Hints: - Use `inode_block_walk` to handle block lookup and allocation logic - Use `diskblock2memaddr` to get the memory address - Clear any newly allocated block to ensure it is properly initialized 2.4 FUSE You will be writing a REAL file system in lab 5, including the "disk". How? FUSE: Filesystem in Userspace. [draw picture - see whiteboard] Good luck!