CS202 Review Session 3 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Charlie Chen, TA Spring 2023 Edited by Alex Liu, TA Fall 2024 Edited by Noah Golub, TA Spring 2026 (incorporated content from Yuxia Zhan, TA Fall 2025) 1. Lab 2 overview 1.1. Motivation 1.2. What is ls? 1.3. File definition 1.4. Flags 1.5. File permissions 1.6. Functions 1.7. Helper functions 1.8. Using stat() 1.9. Test output 2. My Advice --------------------------------------------------------------------- Introduction: Hello everyone, welcome to RS3! My name is Noah Golub, and I am the head TA for CS202 this semester. A bit about me: I graduated from Stanford in 2018 and worked in several industry positions before returning to NYU as a PhD student in Mike's group, so I have experience both as a student and as a software engineer in industry. Today, I will give you an overview of lab 2, cover some key technical points that will help you implement te lab, and give you some higher level advice meta advice on how to approach and tackle lab. 1. Lab 2 overview 1.1. Motivation - You will implement ls, which is a command-line tool, in lab 2. Origins of ls trace back to Multics, where ls stood for "list segments". It was later adopted by Unix, and has been a staple of Unix-like operating systems ever since. - A command-line tool is a user-space program that interacts with the operating system through system calls and APIs to other user-space programs. - This lab will give you practice reading man pages, working with system calls, toying with system-call and helper function APIs, and designing/implementing/refactoring/debugging a non-trivial program. 1.2 What is ls? - Best way to learn about a command line tool is to play around with it in the terminal! You can try to type in `man ` to see its manual, and you will find their usage there. - `ls -1` print each entry in a line. - `ls -1a` hidden files (i.e., those nodes with names starting with `.`). They are `hidden` mainly for neatness and convenience, not for security. It's more of a historical convention than anything else. - `ls -1l` the metadata of each entry (entry could be a file or a directory). - `ls -1R` recursively list all files (including those inside subdirectories). - You can combine flags together, e.g., `ls -1alR`.' That covers 3 out of the 4 flags you need to implement in lab 2, the fourth being the `-n` flag which counts entries instead of listing them (this is a custom flag for this lab, not part of the real ls command). 1.3. File definition - Normal files can be: "test.txt", "main.c", ... - Normal directories can be: "foo/", "bar/", "foo/bar/", ... - Some files and directories start with ".". If you are curious, you can inspect ".git" in your repository. This is where git stores your information and commit history. These are usually hidden from the users when they invoke `ls`. Specifically, to see them, you would need `ls -a` - In addition, "." also means current directory. ".." means parent directory. "~" means home directory. Every directory you create will have these 2 "pseudo-directories". - With that, there are also relative and absolute paths: + A relative path is a path from the current directory. You can prefix it with "." and "..". For instance: `./my/relative/path/to/file`. + An absolute path is a path from the root directory. For instance: `home/user/path`. 1.4. Flags - ` ./ls -alR foo/ bar/` | program | flags | args - Flags refer to -a, -l, -R. You will need to support combination of them as well. - To make it more clear, let's look at getopt() (learn more `man 3 getopt`). **====> Example: man 3 getopt <====** int getopt(int argc, char* const argv[], const char *optstring) - argc: number of arguments supplied (from main) - argv: array of arguments supplied (from main) - optstring: the flags we want to parse - optind: a global variable specifying the index in argv to parse the next time around, initialized to 1 - optarg: a global variable pointing to the argument of the current option, if it exists If getopt() finds another option character, from the optstring you provided, in the arguments supplied, it returns that character, updating the variable optind so the next call to getopt() can resume the scan. It returns -1 otherwise. **====> Example: getopt code (modified from man pages)<====** #include #include #include int main(int argc, char *argv[]) { int x_flag, y_flag, z_flag; char *z_argument; char *optstring = "xyz:"; // options x, y without argument, z with argument int identified_opt; x_flag = 0; y_flag = 0; z_flag = 0; z_argument = NULL; while ((identified_opt = getopt(argc, argv, optstring)) != -1) { switch (identified_opt) { case 'x': x_flag = 1; break; case 'y': y_flag = 1; break; case 'z': z_flag = 1; z_argument = optarg; break; default: /* '?' for unrecognized option */ printf("Error. Usage: %s [-z ] [-xy] name\n", argv[0]); exit(EXIT_FAILURE); } } printf("x_flag=%d; \ny_flag=%d; \nz_flag=%d;\nz_argument: %s;\ncurrent " "optind=%d\n", x_flag, y_flag, z_flag, z_argument, optind); if (optind >= argc) { printf("Error. Expected `name` argument after options\n"); exit(EXIT_FAILURE); } printf("Get `name` argument = %s\n", argv[optind]); exit(EXIT_SUCCESS); } - Will parse -x and -y, -z , and all permutations. - Expect at least 1 argument after the flag - Try to compile, run and observe the output: `./a.out -x Noah`, `./a.out -y Noah`, `./a.out -xy`, `./a.out -n foo` - With argument ./a.out -x -z arg Noah - This is magic! Having to handle all permutations would be tedious, so getopt() comes to the rescue. - getopt_long is quite similar. The exception is that getopt_long allows us to parse flags in long format, which prefixes with `--` instead of `-` (like --help and --hack) - Read about getopt_long: `man 3 getopt_long` 1.5. File permissions - Permissions are a key part of the metadata of a file. They determine who can read, write, or execute a file. - Permissions are important for access control. On a multi-user system, you don't want just anyone to be able to read or modify your files. - Even on a single-user system, permissions can help prevent accidental modifications or deletions of important files (i.e. OS files). - Lab 2 will have you print the permission of a file, which has the form: -rwxr-xr–-. There are 10 chars. - First character (-): Indicates the type of file. - is regular file, d is directory. - Next three characters (rwx): Permissions for the owner (read, write, execute). - Next three characters (r-x): Permissions for the group (read, no write, execute). - Last three characters (r--): Permissions for others (read, no write, no execute). - Numeric representation - 777? Each number corresponds to 3 characters in the string representation. **====> Example: permission bits <====** 700 -> 7,0,0 -> 111,000,000 -> rwx------ 1.6. Functions: - The main building blocks of your program are: opendir, readdir and closedir. - Read through their man pages carefully. - Don't forget to call closedir when you finish with your function. We don't want resources to be leaked. **====> Code Example: [`man 3p readdir`](https://man9.org/linux/man-pages/man3/readdir.3p.html) example code <====** ``` #include #include #include #include static void lookup(const char *arg) { DIR *dirp; struct dirent *dp; if ((dirp = opendir(".")) == NULL) { perror("couldn't open '.'"); return; } do { errno = 0; if ((dp = readdir(dirp)) != NULL) { if (strcmp(dp->d_name, arg) != 0) continue; (void) printf("found %s\n", arg); (void) closedir(dirp); return; } } while (dp != NULL); if (errno != 0) perror("error reading directory"); else (void) printf("failed to find %s\n", arg); (void) closedir(dirp); return; } ``` - `opendir()` to get the `DIR* dirp`; - Iterate the directory's entries (`struct dirent* dp`) using `readdir()`. - struct dirent contains the filename and other fields. - Don't forget to `closedir()` to prevent memory leakage. 1.7. Helper functions: - We provide lots of handy helper functions and macros such as: - PRINT_PERM_CHAR: prints the permission character (e.g., 'r', 'w', 'x') if a certain permission exists, and '-' otherwise - uname_for_uid: returns human-readable user id, given numeric uid - group_for_gid: returns human-readable group id, given numeric gid - date_string: converts a timespec (from stat.st_mtime) to a human-readable date string - Please read through them carefully. Lots of students try to reinvent their own way of doing this, which is more painful than necessary. In fact, I did this myself when attempting the assignment with PRINT_PERM_CHAR! 1.8. Using stat(): **====> Example: man 3 stat <====** - The stat() function is a system call in Unix-like operating systems that retrieves information about a file or directory. It provides detailed metadata, such as file size, permissions, timestamps, and more. - Important Fields in struct stat: st_mode: File mode (permissions). st_nlink: Number of hard links to the file. st_uid: User ID of the file's owner. st_gid: Group ID of the file's owner. st_size: Size of the file in bytes. st_mtime: Time of last modification. 1.9. Test output: - This info is mostly drawn from the lab page itself, so you can use that as a second reference - Simplest: run `make test` to run the tests automatically, with their output being printed in the terminal. **====> Testing: make test <====** - Execute `make test` to show example results - Test discrepancies will be in the form of a diff, which describes the differences between two files - We run diff between your output (./ls) and system output (ls). - Next you can dissect the individual tests to see what is going on under the hood - `./mktest.sh [DIR_NAME]` creates the directory used for testing in the specified location - recommend `/tmp/test` for consistency - If need to remove, use rm -r [DIR_NAME] (be careful with lab files) **====> Testing: test.bats <====** - Each section describes a test - First 2 lines are important, result executes your ls, compare executes correct ls - Unpack the commands pipe by pipe to understand process - Note you can always redirect w/ > to get the input in a text file (show with final compare result) **====> Testing: Debugging <====** - If you want to use gdb with this, need to use set an option to make it debug-friendly - ASAN_OPTIONS=detect_leaks=0 gdb ls **====> Testing: Diff <====** ``` cs202-user@3c6f6a19ff9d:~/cs202-labs$ diff a.txt b.txt 1c1,2 < Hello I am Noah! --- > Hello I am Noah > Good luck! 3,4d3 < Happy hacking! < Goodbye ``` - 1c1,2 means replace line 1 in a.txt with lines 1-2 in b.txt - 3,4d3 means delete lines 3-4 in a.txt to match 2. My Advice. Here are some recommendations for approaching the lab: 1. Prepare: - Read the entire lab handout carefully to understand the requirements. - Ensure that the starter code compiles and that you can run the provided tests. - Play around with the `ls` command in your terminal to see how it works, particularly with the flags you'll be implementing. - Create a battle plan: Decompose the lab into smaller tasks and set milestones for each part. - (Optional, but recommended): Create a `playground.c` file where you can test out small snippets of code, especially when working with unfamiliar functions or system calls. 2: Implement: - Implement one feature/flag at a time, testing thoroughly (both on the command-line and using the provided test cases) before moving onto the next (hint: the recommended sequence is a good one). - Always compare your output with the system `ls` output to ensure correctness. - Don't reinvent the wheel! Make sure that you are using provided helper functions whenever possible. 3. Debug: - If you are stuck because you encounter a bug, use `gdb` to step through your code or add print statements to isolate the issue. - If you are stuck because you don't understand how a function/system call works, create small test cases in `playground.c` to isolate and understand the behavior. If you are feeling ambitious, create your own test cases in test.bats. - If you are stuck because you don't understand the lab requirements, re-read the lab handout carefully or clarify any questions with TAs or instructors. Word to the wise: the reason why we say start "earlier than you think" isn't because there are 14000 lines of code to write and you must be writing 1k lines a day to finish on time (this isn't a race), rather it is because it takes time to synthesize the information, requirements, and APIs. If you are rushed and you are trying to comprehend requirements AND code AND debug at the same time, you will find yourself overwhelmed. So here is a challenge: Start earlier than you think you need to, and on day 1, hold off from coding ENTIRELY. Spend the day reading the lab handout, explore the system ls command, and take detailed notes on any particulary scary/unfamiliar details or assumptions you are uncertain about. On day 2, start coding, but not the assignment! Spend the day writing small test cases in `playground.c` to understand the APIs you will be using. If recursion scares the daylight out of you, write some simple recursive functions that operate on a test directory. If system calls are your worst nightmare, become an expert in them. Read the man pages thoroughly, and write some small programs that use system calls in exhaustive and creative ways. If you are feeling really ambitious, do test-driven development: write your own test cases ahead of time! Good luck!