CS202 Review Session 7
Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021
Edited by Khanh Nguyen, TA Spring 2022
Edited by Jinli Xiao, TA Spring 2023
Edited by Sam Frank, TA Spring 2024
Edited by Saeed Bafarat, TA Fall 2024
Edited by Yash Pazhianur, TA Spring 2025

1. Background knowledge
  1.1 An example
  1.2 File system abstractions
  1.3 inodes
  1.4 Hard link vs soft link
2. Lab 5 overview
  2.1 File system
  2.2 Practice Problems
  2.3 Inode_block_walk and Inode_get_block
  2.4 FUSE


---------------------------------------------------------------------


1. Background knowledge


1.1 An example

When you enter `cat ~/Documents/notes.txt`

this is what happens:
	- start from home directory
	- go to directory `Documents`
	- read file `notes.txt`
	- display to console

data for `notes.txt` is stored in hard drive of local machine
don't have to remember where data is physically located (e.g. on the platter, the track of sector)
instead, able to access data through intuitive interface provided by a file system

Perspective:
	- user's view: path/filename point to byte array
	- file system's view: group of disk block


1.2 File system abstractions

There are at least three key abstractions: file, filename, folder

File:
	- reality: the data of file might not be placed next to each other on the disk
	- abstraction: to users files can be thought of as one contiguous string of bytes

Filename:
	- reality: users don't have to remember any of the physical information about how the file is stored, the sector number where it starts for instance
	- abstraction: user assigns file with intuitive name

Directory:
	- containers of other files or directories for nested/multilevel organization

Q: How do we achieve this?
A: In Unix, we use inodes, which is a fixed-size array of inode. An inode is a data structure that resembles an imbalanced tree.


1.3 inodes

inode contains metadata such as:
	permissions
	timestamps (access/modify)
	link count
	size

also contains pointers to:
	data block
	indirect pointer
	double indirect pointer

also triple indirect pointer (if necessary)

[draw picture of inode data structure - see whiteboard]

inode is intentionally imbalanced to handle both small and large files
if file is small, all data can fit in direct data blocks
else, remaining data can be allocated in indirect pointer or double indirect pointer

each inode is referred to by its i-number
		i-number   ->   inode data structure

directory contents are implemented similarly to files
each entry is a mapping
		filename   ->   inode

to lookup file `~/Documents/notes.txt`
the file system must:
	- look into home directory
	- query the name `Documents` (it maps to an inode)
	- look into `Documents` inode (using i-number)
	- query the name `notes.txt`
	- the file is reached


1.4 Hard link vs soft link

hard link: 
  - has the same inode as file it refers to
  - can't refer to directory or file on different file system

soft link:
  - allocated its own inode
	- its content contains the path to the file it refers to
  - can refer to file on different file system
  - can refer to directory

[draw picture of both link structures - see whiteboard]


2. Lab 5 overview


2.1 File system

- There is only 1 region in which both inode and data block reside.
	Usually, they are divided into 2 separate regions: inode region and data block region.

- Each inode is allocated its own disk block instead of being packed alongside other inodes in a single disk block.
- this means lots of wasted space

- Sector performs a read of 512B.
	The file system read in chunk of 4KB block size, which is 8 sectors.

- Superblock is block 0.
	It holds metadata about the FS and pointer to the root directory.

- Bitmap is an array of bits.
	Each bit at index i indicates if block i is allocated or not. 1 indicates it's free and 0 indicates it's used.

- Each inode contains:
		- 10 direct data block. Each of size 4KB.
 		- 1 indirect pointer. It has 1024 direct blocks.
 		- 1 double indirect pointer. It has 1024 indirect blocks.


2.2 Practice Problems

Note: the below calculation are using a 1-based index. This is different from
lab 5 where we use a 0-based index.

Q: Which table or where would 'fileblock'th numbered 

- 5?

A: In the 5th direct data block.

- 1032?

A: (1032 - 10) = 1022. In the first indirect pointer at 1022th entry.

- 2060?

A: 2060 - 10 - 1024 = 1026. In the double indirect pointer, the first indirect pointer in that mapping and the 3rd entry in the direct pointer mapping.


2.3 `inode_block_walk` and `inode_get_block`

Purpose of `inode_block_walk`:
	- Locate the disk block number corresponding to a specific block (`filebno`) in the inode (ino)
	- Return a pointer (*ppdiskbno) to the appropriate location

High Level Logic:
	Direct Blocks:
		- If `filebno` is within the range of direct blocks, return the corresponding slot directly 
	Indirect Blocks:
		- If `filebno` falls within the range of indirect blocks: Check if the indirect block exists
		- If not and allocation is allowed, allocate and clear the indirect block
		- Then return a pointer to the appropriate slot in the indirect block
	Double-Indirect Blocks: 
		- If `filebno` requires traversal into double-indirect blocks: Check if the double-indirect block exists
		- If not and allocation is allowed, allocate and clear it. Navigate to the appropriate indirect block within the double-indirect block
		- If the target indirect block doesn't exist and allocation is allowed, allocate and clear it
		- Then return a pointer to the appropriate slot in the indirect block
	Hints:
		- Draw Visuals:
				- visualize direct, indirect, and double-indirect block structures
				- how would you go to a specific block within these data structures
		- Memory Initialization:
				- Ensure any newly allocated block is cleared before use
		- Helper Functions:
				- Consider extracting common logic into helper functions 

Purpose of `inode_get_block`:
	- Obtain the in-memory address of the `filebno`th block for a given inode (ino).
	- Allocate the block if it does not already exist

High Level Logic:
	Ensure `filebno` is within the valid range for the inode
	Use `inode_block_walk` to locate or allocate the corresponding disk block for filebno
	If the block slot is empty: Allocate a new block using `alloc_disk_block` and initialize it
	Then update the block slot with the new block number and return that memory address
	Hints:
		- Use `inode_block_walk` to handle block lookup and allocation logic
		- Use `diskblock2memaddr` to get the memory address
		- Clear any newly allocated block to ensure it is properly initialized


2.4 FUSE

You will be writing a REAL file system in lab 5, including the "disk". How?

FUSE: Filesystem in Userspace. 

[draw picture - see whiteboard]


Good luck!