CS202 Review Session 3 Notes from [Xiangyu Gao](https://xiangyug.github.io/), TA from fall 2021 Edited by Khanh Nguyen, TA Spring 2022 Edited by Charlie Chen, TA Spring 2023 Edited by Alex Liu, TA Fall 2024 1. C review 1.1. Stack and Heap memory allocation 1.2. Array and string 1.3. String concatenation 1.4. Struct 2. Lab 2 overview 2.1. Motivation 2.2. File definition 2.3. File permissions 2.4. Flags 2.5. Functions 2.6. Helper functions 2.7. Test output 3. Q&A --------------------------------------------------------------------- "There's no magic in systems" Personally, I think this is very reassuring. It gives me lots of confidence in learning about different systems and reading through the code. Hopefully, it gives you more confidence in reading, learning and building systems as well! 1. C review A large part of this course is programming (primarily in C), so having a solid understanding for C can largely reduce the time you spend on Labs. If you are unsure about your C skills, I recommend looking at K&R: The C programming language. 1.1. Stack and Heap memory allocation - Variables in C programming can be stored in three different memory locations: the stack, the heap, and the data segment. It is important to note that these memory locations have no relevant relationship with the data structures "stack" and "heap." (although the stack for a C process supports push and pop, as the data structure does) - Variables stored in the stack are temporary for the usage of the functions, they are automatically freed once the function returns. This makes them relatively safe and an ideal choice for most scenarios. - Variables stored in the heap are allocated dynamically and persist even after the call stack is complete. This feature provides greater flexibility, but also brings the risk of memory leaks if not managed properly. As a result, it is important to use the heap with caution and only when necessary. e.g. linked list - It is generally recommended to use the stack in priority, as it is safer than the heap. - Global and static variables are not stored in either the stack or the heap, but rather in the "data segment", which is a more persistant region in memory. Examples: ``` int main() { // Stack allocation int a = 0; char b[10]; // Heap allocation int *p = (int*) malloc(sizeof(int)); *p = 10; char *q = (char*) malloc(5*sizeof(char)); q[0] = 'a'; // In heap, we need to free the memory manually. free(p); free(q); return 0; } ``` [Whiteboard drawing for memory layout. Check the scribbles] 1.2. Array and string - There is no built-in type string in C :O - 2 ways to access element in array: subscript and pointer arithmetic. - A character array is terminated with a null byte, indicating the end of the string. E.g: // will compile sometimes but this is not correct char name[5] = "Alice"; // correct way to initialize string with length 5. Need 1 more for null byte char name[6] = "Alice"; /* * A fancier and safer way to do this would be: * const char* name = "Alice"; * or * char name[] = "Alice"; */ /* * * "Alice" is a string constant in read-only region. Write/modify can lead to * undefined behavior. */ char * name = "Alice"; 1.3. String concatenation - There are many ways to concatenate strings: strcat, strncat, etc... - I'd avoid using strcat because it can lead to buffer overflow if you are not careful. Lab 2 suggests using snprintf (learn more by `man snprintf`), which I think is very handy. int snprintf(char *str, size_t size, const char *format, ...) - char *str: buffer to write INTO (or in other words, the output after concatenation) - size_t size: maximum size to write into the buffer - char *format: format for the concatenation, similar to how you would use in printf - ...: optional arguments. Variables to be formatted in the "format" string, also similar to the syntax for "printf". Example: ``` char buffer[50]; char a[] = "folder"; char b[] = "file"; int c = 0; snprintf(buffer, 50, "file location: %s/%s/%d", a, b, c); printf("%s\n", buffer); ``` 1.4. Struct - C doesn't have a concept of class, like in Java, but it has `struct`. Struct is a collection of data items grouped into a single thing. - You can define a struct as: ``` struct student { int age; char* name; } ``` - To initialize: ``` struct student alice; alice.age = 22; alice.name = "Alice"; ``` - struct can contain any number of variables, including pointers. In lab 2, you will deal with struct pointer. - For example: ``` struct student *palice = &alice; ``` - struct is a data structure, so the return type can be a struct. For example, the signature of a function can be ``` struct student * get_student(char* name); ``` Mental model: | age = 22 | <---- alice. addr: 0x100 | name = "Alice" | | 0x100 | <---- palice - To access members of the struct, you can do: ``` (*palice).age; OR palice->age; // -> operator = dereference and then access the member ``` 2. Lab 2 overview 2.1. Motivation - You will implement ls, which is a command, in lab 2. - It will help you practice reading man pages, working with system calls, toying with the APIs, and design as well as re-design/refactor your code. 2.2. File definition - Normal files can be: "test.txt", "main.c", ... - Normal directories can be: "foo/", "bar/", "foo/bar/", ... - Some files and directories start with ".". If you are curious, you can inspect ".git" in your repository. This is where git stores your information and commit history. These are usually hidden from the users when they invoke `ls`. Specifically, to see them, you would need `ls -a` - In addition, "." also means current directory. ".." means parent directory. "~" means home directory. Every directory you create will have these 2 "pseudo-directories". - With that, there are also relative and absolute path: + Relative is a path from the current directory. Usually, it prefixes with "." and "..". For instance: `./my/relative/path/to/file`. +_Absolute path is a path from the fullpath you are providing. For instance: `~/my/absolute/path`. Example (is a path a normal directory?): if(is_dir(pathandname) && (strcmp(name,".")!=0 &&strcmp(name,"..")!=0)) 2.3. File permissions - Lab 2 will have you print the permission of a file, which has the form: -rwxr-xr–-. There are 10 chars, but we are concerned with the last 9. - First character (-): Indicates the type of file. - is regular file, d is directory. - Next three characters (rwx): Permissions for the owner (read, write, execute). - Next three characters (r-x): Permissions for the group (read, no write, execute). - Last three characters (r--): Permissions for others (read, no write, no execute). - Numeric representation - chmod 777? Each number corresponds to 3 characters in the string representation. Example: 700 -> 7,0,0 -> 111,000,000 -> rwx------ 2.4. Flags - ` ./ls -alR foo/ bar/` | program | flags | args - Flags refer to -a, -l, -R. You will need to support combination of them as well. - To make it more clear, let's look at getopt() (learn more `man 3 getopt`). int getopt(int argc, char* const argv[], const char *optstring) - argc: number of arguments supplied (from main) - argv: array of arguments supplied (from main) - optstring: the flag we want to parse - optind: a global variable specifying the index in argv to parse the next time - around, initialized to 1. If getopt() finds another option character, from the optstring you provided, in the arguments supplied, it returns that character, updating the variable optind so the next call to getopt() can resume the scan. It returns -1 otherwise. Example: // Example from man 3 getopt #include #include #include int main(int argc, char *argv[]) { int flags, opt; int nsecs, tfnd; nsecs = 0; tfnd = 0; flags = 0; while ((opt = getopt(argc, argv, "nt:")) != -1) { switch (opt) { case 'n': flags = 1; break; case 't': nsecs = atoi(optarg); tfnd = 1; break; default: /* '?' */ fprintf(stderr, "Usage: %s [-t nsecs] [-n] name\n", argv[0]); exit(EXIT_FAILURE); } } printf("flags=%d; tfnd=%d; nsecs=%d; optind=%d\n", flags, tfnd, nsecs, optind); if (optind >= argc) { fprintf(stderr, "Expected argument after options\n"); exit(EXIT_FAILURE); } printf("name argument = %s\n", argv[optind]); /* Other code omitted */ exit(EXIT_SUCCESS); } - Will parse -n and -t. - Expect at least 1 argument after the flag - Try compile, run and observe the output: `./a.out -n`, `./a.out -t`, `./a.out -nt`, `./a.out -n foo bar` - getopt_long is quite similar. The exception is that getopt_long allows us to parse flags in long format, which prefixes with `--` instead of `-` - Read about getopt_long: `man 3 getopt_long` 2.5. Functions: - The main building blocks of your program are: opendir, readdir and closedir. - Read through their man pages carefully. - Don't forget to call closedir when you finish with your function. We don't want resources to be leaked. 2.6. Helper functions: - We provide lots of handy helper functions and macros such as: - PRINT_PERM_CHAR: print out the corresponding permission character - uname_for_uid: convert to human-readable user id - group_for_uid: convert to human-readable group id - date_string: convert to formatted date string - ftype_to_string: print out the file type - Please read through them carefully. Lots of students try to reinvent their own way of doing this, which is more painful than necessary. 2.7 Using stat(): - The stat() function is a system call in Unix-like operating systems that retrieves information about a file or directory. It provides detailed metadata, such as file size, permissions, timestamps, and more. - Important Fields in struct stat: st_mode: File mode (permissions). st_size: Size of the file in bytes. st_mtime: Time of last modification. st_uid: User ID of the file's owner. st_gid: Group ID of the file's owner. Example: bool is_dir(char* pathandname) { struct stat res; stat(pathandname,&res); return S_ISDIR(res.st_mode); } 2.8. Test output: - Part of the lab is figuring out what it is testing. Don't be intimidated if your test output is really big and strange. - We run diff between your output (./ls) and system output (ls). - diff command: `diff your_output.txt system_output.txt` Example: 2,3c1,3 < < < ---- > > > - In the example above, everything in the 2nd line, before `---`, is your output. The rest is the system output. Or in other words, everything before `---` is the result of the file on the left of the diff command and everything after is the result of the file on the right. It's showing the difference between your output and system output. - The first line, `2,3c1,3` refers to what needs to be changed so that the differences are resolved. - `2,3c1,3`: Your file, line 2-3 needs to be changed to system output file, line 1-3. - c: change - a: add - d: delete