CS202 Review Session 7
Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021
Edited by Khanh Nguyen, TA Spring 2022
Edited by Jinli Xiao, TA Spring 2023
Edited by Sam Frank, TA Spring 2024
Edited by Saeed Bafarat, TA Fall 2024


1. Background knowledge
  1.1 An example
  1.2 File system abstractions
  1.3 inodes
  1.4 Hard link vs soft link
2. Lab 5 overview
  2.1 File system
  2.2 Practice Problems
  2.3 Inode_block_walk and Inode_get_block
  2.4 FUSE


---------------------------------------------------------------------

1. Background knowledge

1.1 An example

When you enter `cat ~/a/b.txt`, you are issuing a command to read b.txt under 
directory a under the home directory.

We will start from our home directory, go to directory a and then go to file 
named b.txt. This is very convenient. In this case, the data I look for is 
on the hard disk of my local machine. I don't have to remember physically
where the data is (e.g: on the platter, the track of sector). Instead, I am
able to access the data through an intuitive interface provided by a file
system.

Perspective:
- user's view: named bytes in storage
- file system's view: group of disk block

1.2 File system abstractions

There are at least three key abstractions: file, filename, folder

**File**: in reality, the data of on file might not be placed next to each
other on the disk. But it looks to the users that the data of a file can
be thought of as one big, long contiguous string of bytes.

**Filename**: users don't have to remember any of the physical information about
how the file is stored, the sector number where it starts for instance. Instead,
users can assign the file with an intuitive name.

**Directory**: containers of other files or directories to help organize.

Q: How do we achieve this?
A: In Unix, we use inodes, which is a fixed-size array of inode. An inode is a
data structure that resembles an imbalanced tree.

1.3 inodes

An inode contains metadata such as file permissions, times for file access,
link count, etc... Additionally, it contains pointers to data block, indirect
pointer and double indirect pointer. There is also triple indirect pointer if
necessary.

[Draw picture of inode data structure - see whiteboard]

The inode is intentionally imbalanced to handle both small and large files. If
there is a small file, it can fit into the direct data blocks. Else, it can
be allocated indirect pointer or double indirect pointer. 

Each inode is referred to by its i-number. Mental model:
<i-number> -> <inode data structure>

Directory entries are implemented similarly to files. Its contents entries 
contain a mapping: 

<Filename> -> <inode>

To look up file b.txt in directory a under home. The FS (file system) look into
home directory, query the name a, which maps to an inode. Using the i-number,
go to directory a's inode and query the name b.txt. Eventually, the file is reached
and user can carry out file operation.

1.4 Hard link vs soft link

- Hard link: 
  + have the same inode as the file it refers to. 
  + can't refer to directory or file on a different file system

- Soft link:

  + Allocated its own inode. Its content contains the path to the file it refers
  to (can be thought of as an indirection or pointer-like)
  + can refer to file on different file system
  + can refer to directory

[Draw a picture - see whiteboard]

2. Lab 5 overview

2.1 File system

- There is only 1 region in which both inode and data block reside. Usually,
they are divided into 2 separate regions: inode region and data block region.

- Each inode is allocated its own disk block instead of being packed alongside
other inodes in a single disk block.

- Sector performs a read of 512B. The file system read in chunk of 4KB block size,
which is 8 sectors.

- Superblock is block 0. It holds metadata about the FS and pointer to the root
directory.

- Bitmap is an array of bits. Each bit at index i indicates if block i is 
allocated or not. 1 indicates it's free and 0 indicates it's used.

- Each inode contains:
 - 10 direct data block. Each of size 4KB.
 - 1 indirect pointer. It has 1024 direct blocks.
 - 1 double indirect pointer: It has 1024 indirect blocks.


2.2 Practice Problems

Note: the below calculation are using a 1-based index. This is different from
lab 5 where we use a 0-based index.

Q: Which table or where would 'fileblock'th numbered 

- 5?

A: In the 5th direct data block.

- 1032?

A: (1032 - 10) = 1022. In the first indirect pointer at 1022th entry.

- 2060?

A: 2060 - 10 - 1024 = 1026. In the double indirect pointer, the first indirect
pointer in that mapping and the 3rd entry in the direct pointer mapping.


2.3 Inode_block_walk and Inode_get_block

Purpose of Inode_block_walk:
Locate the disk block number corresponding to a specific block (filebno) in the inode (ino)
Return a pointer (*ppdiskbno) to the appropriate location
High Level Logic:
	Direct Blocks:
		If filebno is within the range of direct blocks, return the corresponding slot directly 
	Indirect Blocks:
		If filebno falls within the range of indirect blocks: Check if the indirect block exists
		If not and allocation is allowed, allocate and clear the indirect block
		Then return a pointer to the appropriate slot in the indirect block
	Double-Indirect Blocks: 
		If filebno requires traversal into double-indirect blocks: Check if the double-indirect block exists
		If not and allocation is allowed, allocate and clear it. Navigate to the appropriate indirect block within the double-indirect block
		If the target indirect block doesn't exist and allocation is allowed, allocate and clear it
		Then return a pointer to the appropriate slot in the indirect block
	Hints:
	Draw Visuals: Visualize direct, indirect, and double-indirect block structures and how you would go to a specific block within these data structures
	Memory Initialization: Ensure any newly allocated block is cleared before use
	Helper Functions: Consider extracting common logic into helper functions 

Purpose of Inode_get_block:
Obtain the in-memory address of the filebnoth block for a given inode (ino). Allocate the block if it does not already exist
High Level Logic:
	Ensure filebno is within the valid range for the inode
	Use inode_block_walk to locate or allocate the corresponding disk block for filebno
	If the block slot is empty: Allocate a new block using alloc_disk_block and initialize it
	Then update the block slot with the new block number and return that memory address
	Hints:
	Use inode_block_walk to handle block lookup and allocation logic
	Use diskblock2memaddr to get the memory address
	Clear any newly allocated block to ensure it is properly initialized

2.4 FUSE

You will be writing a REAL file system in lab 5, including the "disk". How?

FUSE: Filesystem in Userspace. 

[Draw picture - see whiteboard]


Good luck!