CS202 Review Session 5 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Jinli Xiao, TA Spring 2023 1. Background knowledge 1.1 An example 1.2 File system abstractions 1.3 inodes 1.4 Hard link vs soft link 2. Lab 5 overview 2.1 File system 2.2 Pointers to pointer mental model 2.3 Essential Functions 2.4 FUSE 4. Q&A --------------------------------------------------------------------- 1. Background knowledge 1.1 An example When you enter `cat ~/a/b.txt`, you are issuing a command to read b.txt under directory a under the home directory. We will start from our home directory, go to directory a and then go to file named b.txt. This is very convenient. In this case, the data I look for is on the hard disk of my local machine. I don't have to remember physically where the data is (e.g: on the platter, the track of sector). Instead, I am able to access the data through an intuitive interface provided by a file system. Perspective: - user's view: named bytes in storage - file system's view: group of disk block 1.2 File system abstractions There are at least three key abstractions: file, filename, folder **File**: in reality, the data of on file might not be placed next to each other on the disk. But it looks to the users that the data of a file can be thought of as one big,long contiguous string of bytes. **Filename**: users don't have to remember any of the physical information about how the file is stored, the sector number where it starts for instance. Instead, users can assign the file with an intuitive name. **Directory**: containers of other files or directories to help organize. Q: How do we achieve this? A: In Unix, we use inodes, which is a fixed-size array of inode. An inode is a data structure that resembles an imbalanced tree. 1.3 inodes An inode contains metadata such as file permissions, times for file access, link count, etc... Additionally, it contains pointers to data block, indirect pointer and double indirect pointer. There is also triple indirect pointer if necessary. [Draw picture of inode data structure - see whiteboard] The inode is intentionally imbalanced to handle both small and large files. If there is a small file, it can fit into the direct data blocks. Else, it can be allocated indirect pointer or double indirect pointer. Each inode is referred to by its i-number. Mental model: -> Directory entries are implemented similarly to files. Its contents entries contain a mapping: -> To look up file b.txt in directory a under home. The FS (file system) look into home directory, query the name a, which maps to an inode. Using the i-number, go to directory a's inode and query the name b.txt. Eventually, the file is reached and user can carry out file operation. 1.4 Hard link vs soft link - Hard link: + have the same inode as the file it refers to. + can't refer to directory or file on a different file system - Soft link: + Allocated its own inode. Its content contains the path to the file it refers to (can be thought of as an indirection or pointer-like) + can refer to file on different file system + can refer to directory [Draw a picture - see whiteboard] 2. Lab 5 overview 2.1 File system - There is only 1 region in which both inode and data block reside. Usually, they are divided into 2 separate regions: inode region and data block region. - Each inode is allocated its own disk block instead of being packed alongside other inodes in a single disk block. - Sector performs a read of 512B. The file system read in chunk of 4KB block size, which is 8 sectors. - Superblock is block 0. It holds metadata about the FS and pointer to the root directory. - Bitmap is an array of bits. Each bit at index i indicates if block i is allocated or not. 1 indicates it's free and 0 indicates it's used. - Each inode contains: - 10 direct data block. Each of size 4KB. - 1 indirect pointer. It has 1024 direct blocks. - 1 double indirect pointer: It has 1024 indirect blocks. 2.2 Pointers to pointer mental model int a = 1; int *b = &a; int **c = &b; 0x100 [ 1 ] a 0x104 [ 0x100 ] b 0x108 [ 0x104 ] c Think of it as box whose content is the memory address. A dereference means following the content in the box and memory address. 2.3 Essential Functions - inode_block_walk: point to the 'fileblock'th block and allocate if necessary. - inode_get_block: point to where 'fileblock'th block points to and allocate if necessary Note: the below calculation are using a 1-based index. This is different from lab 5 where we use a 0-based index. Q: Which table or where would 'fileblock'th numbered - 5? A: In the 5th direct data block. - 1032? A: (1032 - 10) = 1022. In the first indirect pointer at 1022th entry. - 2060? A: 2060 - 10 - 1024 = 1026. In the double indirect pointer, the first indirect pointer in that mapping and the 3rd entry in the direct pointer mapping. 2.4 FUSE You will be writing a REAL file system in lab 5, including the "disk". How? FUSE: Filesystem in Userspace. [Draw picture - see whiteboard] 3. Q&A Good luck!