CS202 Review Session 7 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Jinli Xiao, TA Spring 2023 Edited by Sam Frank, TA Spring 2024 Edited by Saeed Bafarat, TA Fall 2024 1. Background knowledge 1.1 An example 1.2 File system abstractions 1.3 inodes 1.4 Hard link vs soft link 2. Lab 5 overview 2.1 File system 2.2 Practice Problems 2.3 Inode_block_walk and Inode_get_block 2.4 FUSE --------------------------------------------------------------------- 1. Background knowledge 1.1 An example When you enter `cat ~/a/b.txt`, you are issuing a command to read b.txt under directory a under the home directory. We will start from our home directory, go to directory a and then go to file named b.txt. This is very convenient. In this case, the data I look for is on the hard disk of my local machine. I don't have to remember physically where the data is (e.g: on the platter, the track of sector). Instead, I am able to access the data through an intuitive interface provided by a file system. Perspective: - user's view: named bytes in storage - file system's view: group of disk block 1.2 File system abstractions There are at least three key abstractions: file, filename, folder **File**: in reality, the data of on file might not be placed next to each other on the disk. But it looks to the users that the data of a file can be thought of as one big, long contiguous string of bytes. **Filename**: users don't have to remember any of the physical information about how the file is stored, the sector number where it starts for instance. Instead, users can assign the file with an intuitive name. **Directory**: containers of other files or directories to help organize. Q: How do we achieve this? A: In Unix, we use inodes, which is a fixed-size array of inode. An inode is a data structure that resembles an imbalanced tree. 1.3 inodes An inode contains metadata such as file permissions, times for file access, link count, etc... Additionally, it contains pointers to data block, indirect pointer and double indirect pointer. There is also triple indirect pointer if necessary. [Draw picture of inode data structure - see whiteboard] The inode is intentionally imbalanced to handle both small and large files. If there is a small file, it can fit into the direct data blocks. Else, it can be allocated indirect pointer or double indirect pointer. Each inode is referred to by its i-number. Mental model: -> Directory entries are implemented similarly to files. Its contents entries contain a mapping: -> To look up file b.txt in directory a under home. The FS (file system) look into home directory, query the name a, which maps to an inode. Using the i-number, go to directory a's inode and query the name b.txt. Eventually, the file is reached and user can carry out file operation. 1.4 Hard link vs soft link - Hard link: + have the same inode as the file it refers to. + can't refer to directory or file on a different file system - Soft link: + Allocated its own inode. Its content contains the path to the file it refers to (can be thought of as an indirection or pointer-like) + can refer to file on different file system + can refer to directory [Draw a picture - see whiteboard] 2. Lab 5 overview 2.1 File system - There is only 1 region in which both inode and data block reside. Usually, they are divided into 2 separate regions: inode region and data block region. - Each inode is allocated its own disk block instead of being packed alongside other inodes in a single disk block. - Sector performs a read of 512B. The file system read in chunk of 4KB block size, which is 8 sectors. - Superblock is block 0. It holds metadata about the FS and pointer to the root directory. - Bitmap is an array of bits. Each bit at index i indicates if block i is allocated or not. 1 indicates it's free and 0 indicates it's used. - Each inode contains: - 10 direct data block. Each of size 4KB. - 1 indirect pointer. It has 1024 direct blocks. - 1 double indirect pointer: It has 1024 indirect blocks. 2.2 Practice Problems Note: the below calculation are using a 1-based index. This is different from lab 5 where we use a 0-based index. Q: Which table or where would 'fileblock'th numbered - 5? A: In the 5th direct data block. - 1032? A: (1032 - 10) = 1022. In the first indirect pointer at 1022th entry. - 2060? A: 2060 - 10 - 1024 = 1026. In the double indirect pointer, the first indirect pointer in that mapping and the 3rd entry in the direct pointer mapping. 2.3 Inode_block_walk and Inode_get_block Purpose of Inode_block_walk: Locate the disk block number corresponding to a specific block (filebno) in the inode (ino) Return a pointer (*ppdiskbno) to the appropriate location High Level Logic: Direct Blocks: If filebno is within the range of direct blocks, return the corresponding slot directly Indirect Blocks: If filebno falls within the range of indirect blocks: Check if the indirect block exists If not and allocation is allowed, allocate and clear the indirect block Then return a pointer to the appropriate slot in the indirect block Double-Indirect Blocks: If filebno requires traversal into double-indirect blocks: Check if the double-indirect block exists If not and allocation is allowed, allocate and clear it. Navigate to the appropriate indirect block within the double-indirect block If the target indirect block doesn't exist and allocation is allowed, allocate and clear it Then return a pointer to the appropriate slot in the indirect block Hints: Draw Visuals: Visualize direct, indirect, and double-indirect block structures and how you would go to a specific block within these data structures Memory Initialization: Ensure any newly allocated block is cleared before use Helper Functions: Consider extracting common logic into helper functions Purpose of Inode_get_block: Obtain the in-memory address of the filebnoth block for a given inode (ino). Allocate the block if it does not already exist High Level Logic: Ensure filebno is within the valid range for the inode Use inode_block_walk to locate or allocate the corresponding disk block for filebno If the block slot is empty: Allocate a new block using alloc_disk_block and initialize it Then update the block slot with the new block number and return that memory address Hints: Use inode_block_walk to handle block lookup and allocation logic Use diskblock2memaddr to get the memory address Clear any newly allocated block to ensure it is properly initialized 2.4 FUSE You will be writing a REAL file system in lab 5, including the "disk". How? FUSE: Filesystem in Userspace. [Draw picture - see whiteboard] Good luck!