CS202: Lab 2: Ls

CS202: Lab 2: Ls

This lab is intended to:

In this lab, you will build a version of ls, a Unix utility that lists files in a directory. Yours will print one file per line (this is equivalent to ls -1 on Linux, OS X or other Unix systems) in any mode. When run in the directory containing your code, this should result in output similar to what is shown below:

$ ./ls
ls
main.c
main.c.o
Makefile
README.md

Note the difference between ls and ./ls. The first one is the “system-supplied” ls (type $ which ls to see where this ls lives). The second one is your ls. The shell knows the difference because . means the current directory, so ./ls tells the shell “I want a program named ls in the current directory.” Likewise, .. means “the directory above the current one.” So if you were in a directory underneath your lab directory, ../ls would again refer to your ls. You should have seen the concepts of . and .. in CSO.

The ls implementation for this lab also needs to support a subset of the options supported by ls. These include the following; the detailed specification is in a section below.

We recommend that you now read through the whole lab, before executing any of it. (Even it looks quite long!)

Getting Started

NOTE. We recommend attempting to do this lab without searching on the Internet. This will force you to make use of “man pages” and the documentation included on the system. These often contain much more detailed information that are directly useful for this lab than StackOverflow and similar sources.

Obtain and update the lab files as follows. We assume that you have set up the upstream as described in the lab setup. Then run the following on your local machine (either on the local machine or within the container, and if on Windows, always within WSL):

$ cd ~/cs202
$ git fetch upstream
$ git merge upstream/main
.... # output omitted
$ ls
cs202-run-docker  docker  lab1  lab2 learn-ctags  README.md

This lab’s files are located in the lab2 subdirectory. But first we need to update the Docker image. Do this:

$ cd docker
$ ./cs202-build-docker
# may take a few minutes but not as long as the initial build when setting up

Now, let’s ensure that you can build the code you just pulled. Enter the Docker environment:

$ cd ..
$ ./cs202-run-docker
cs202-user@172b6e333e91:~/cs202-labs$ cd lab2/
cs202-user@172b6e333e91:~/cs202-labs/lab2$ ls
answers.txt  main.c  Makefile  mktest.sh  test.bats
cs202-user@172b6e333e91:~/cs202-labs/lab2$ make

The rest of these instructions presume that you are in the Docker environment. We omit the cs202-user@af8b1be95427:~/cs202-labs part of the prompt.

You should see output similar to the following:

cc -pedantic -Wall -std=c11 -g -D_DEFAULT_SOURCE -fsanitize=address,undefined -ggdb -fno-omit-frame-pointer -c main.c -o main.c.o
main.c: In function ‘main’:
main.c:208:29: warning: variable ‘list_all’ set but not used [-Wunused-but-set-variable]
  208 |     bool list_long = false, list_all = false;
      |                             ^~~~~~~~
main.c:208:10: warning: unused variable ‘list_long’ [-Wunused-variable]
  208 |     bool list_long = false, list_all = false;
      |          ^~~~~~~~~
At top level:
main.c:91:15: warning: ‘date_string’ defined but not used [-Wunused-function]
   91 | static size_t date_string(struct timespec* ts, char* out, size_t len) {
      |               ^~~~~~~~~~~
main.c:77:12: warning: ‘group_for_gid’ defined but not used [-Wunused-function]
   77 | static int group_for_gid(gid_t gid, char* buf, size_t buflen) {
      |            ^~~~~~~~~~~~~
main.c:65:12: warning: ‘uname_for_uid’ defined but not used [-Wunused-function]
   65 | static int uname_for_uid(uid_t uid, char* buf, size_t buflen) {
      |            ^~~~~~~~~~~~~
cc ./main.c.o -o ls -fsanitize=address,undefined

You will also find that an executable named ls has been created in the directory.

Type:

$ ./ls

to see what happens.

Coding

Now that you have ensured that you can build the project, you can start coding. You need to edit only main.c. Read the rest of the lab for utilities, documentation, building blocks, requirements, and hints.

Compiler warnings. Make sure that, when your code is compiled, the compiler produces no warnings. (Example warnings are given above [“variable set but not used”, etc.]; these happen because the code isn’t fleshed out.) We will be deducting points for compile-time warnings.

Programming style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. Please read this style guide.

Argument parsing

Look at the main() function. main() uses getopt_long() for argument parsing. You will need to understand this function, which requires finding and reading documentation. To start, let us use the apropos command to find where the documentation for getopt_long lives. Type:

$ apropos getopt_long

This should result in:

$ apropos getopt_long
getopt_long (3)      - Parse command-line options
getopt_long_only (3) - Parse command-line options

This output indicates that getopt_long provides a mechanism for parsing command-line options, and is documented in section 3 of the manual pages. You can now read this documentation by running:

$ man 3 getopt_long

This shows you information on getopt_long(), including example use. In this case, the man command provides access to the Linux manual pages, the 3 specifies that you are looking for a page in section 3 of the manual, and getopt_long specifies the function you are looking for.

Documentation

Individual man pages are often referenced using notation like GETOPT(3), which indicates that you are looking at the GETOPT man page in section 3. Similarly, the convention in referring to functions is that function_name(num) refers to a function (or syscall) documented at section num in the man pages. For example, getopt(3) refers to “the getopt() documented in section 3 of the man pages”; it does not refer to invoking getopt with an argument of 3. We’ll be using the func(num) notation below.

As we showed above, you can find out what section documents a function using the apropos command.

In fact, you can find more documentation for apropos itself by running

$ apropos apropos
apropos (1)          - search the manual page names and descriptions
$ man apropos

Building blocks

You will need to use the following syscalls. Learn about them through man pages.

On 3 versus 3p: this is confusing. The ‘3’ refers to section 3 in the man pages of Linux (which of course is an operating system) whereas the ‘3p’ refers to section 3 in the man pages of POSIX, which is a standard for Unix-like systems that Linux mostly implements, and in some cases extends. The documentation you see in the ‘3’ man pages is authoritative for the Linux system, but the ‘3p’ pages for readdir and opendir may give you inspiration, as they contain example usage. For this lab, it’s best to consult both sets of pages.

Requirements

Here are the required pieces; later in the lab we’ll describe steps for building out your solution to address the requirements.

This applies to non-pseudo directories; see below for handling of pseudo-directories.

Note that the pseudodirectories . and .. do not have slashes appended.

Debugging

The supplied Makefile passes in compile flags that enable detection of memory leaks, underflow, and overflow. The use of these leak detection tools can at times make it annoying to use GDB for debugging your program. You therefore need to disable sanitization when debugging your program. This can be done by setting the ASAN_OPTIONS environment variable to detect_leaks=0. Thus to start gdb you should use a command such as:

$ ASAN_OPTIONS=detect_leaks=0 gdb ls

Recommendations and hints

Some general recommendations followed by some specific steps:

We recommend building your solution in pieces (and creating a git commit after each piece). Each piece may necessitate some refactoring and rewriting versus the prior piece; this is common in software development. Here’s one possible sequence:

Testing

To run a set of tests, invoke make test. Before submitting your work, you should make sure that you can pass all of the supplied tests. Note, however, that these tests are not exhaustive; you should write further tests for yourself.

Understanding the test framework will save you a lot of time in understanding its output. The test framework uses a test system called bats(1) (see man 1 bats and man 7 bats for more information). The basic idea is that the file test.bats defines functions called setup(), teardown(), and various tests. Then, the bats framework first calls setup(), then calls our tests, then teardown(). The functions have particular return values (to the shell) if the test succeeds, and other return values if they fail.

You might find some of the commands used in setup() useful when conducting your own tests. Specifically, setup() calls mktest.sh, and the commands in mktest.sh are useful to read and see. For example, touch -t allows you to set the mtime for a file.

When you see a test failing, you will need to understand what the test itself is testing. Sometimes, you can do this because the tests are written to provide a diff between the output of your ls and what was expected. (If reading diffs is new to you, please see the section below on diffs.) However, sometimes – either because the test is complex or because your ls is crashing (with a SEGFAULT or other error) and not producing any output – you have to bite the bullet and inspect (in your editor) the relevant test in test.bats. You can see there how the test is invoking your ls as well as how the test is invoking the system ls. Some of the commands involve pipelines; to really understand them, it can be helpful to manually run the commands you see.

To manually run a test, call for example:

 $ export MY_TEST_DIR=/tmp/d$(date '+%Y%m%d%H%M%S')
 $ ./mktest.sh ${MY_TEST_DIR}

This will create a directory (/tmp/d20240...) and a set of files for which we have set mtimes, owners, and groups in a manner that will exercise different portions of your program. Then actually run a command that you are trying to understand. Here is an example (this is only an example, you should pick the command that you are trying to understand):

 $ ls -1a --color=never --file-type ${MY_TEST_DIR} | grep -v "^total" | sed 's/@$//g' | sed 's!\./!.!' | awk '{print $1}' | sort

Above, it can be helpful to strip away the pipeline phases. So you would start with just the ls -1a..... up through the pipe character (|). Then add in the next pipeline phase, for example ls -1a --color=never --file-type ${MY_TEST_DIR} | grep -v "^total". Continue in this fashion until you understand the pipeline.

Note you can simplify the ./mktest.sh command by doing ./mktest.sh /tmp/test, but the ./mktest.sh script will refuse to run if the directory specified as an argument already exists. You can delete the directory once done using rm -r as usual. If you do this, then replace ${MY_TEST_DIR} in the pipeline with /tmp/test.

Reading diffs. The format of a text diff is very common on Unix-like systems. If it is unfamiliar to you, we strongly suggest gaining fluency. Here is a reference.

Extra Credit

We have a few extra credit options for this assignment:

Implement the -h flag

ls -lh prints “human readable” file sizes rather than file sizes in bytes. This means that ls -lh will print file sizes in KB, MB or GB as appropriate; the unit chosen is the largest unit for which file size is greater than 0. For example

$ ./ls -lh
-rwxr-xr-x 1 apanda apanda  17K May 29 12:03 ls
-rw-r--r-- 1 apanda apanda 2.9K May 29 11:59 main.c
-rw-r--r-- 1 apanda apanda 3.2K May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda  307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda   46 May 29 10:33 README.md

Corresponds to the output for ls -l shown above. Implementing this will require writing functionality to print sizes as appropriate.

Symbolic links are Unix features that allow to create “pseudo-files” that point to other files. You can create one using ln -s, for example running

$ ln -s main.c main.c.link

creates a new file main.c.link that points to main.c. The system ls has special handling for printing symbolic links when showing long listing format. For example in the following output:

$ ls -l
total 36
-rwxr-xr-x 1 apanda apanda 17128 May 29 12:03 ls
-rw-r--r-- 1 apanda apanda  2874 May 29 11:59 main.c
lrwxrwxrwx 1 apanda apanda     6 May 29 14:02 main.c.link -> main.c
-rw-r--r-- 1 apanda apanda  3224 May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda   307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda    46 May 29 10:33 README.md

Observe that main.c.link is displayed as main.c.link -> main.c. Implement similar handling for your version of ls. In implementing this you might find the readlink(2) syscall useful. Also observe that the permission string for the link in this case uses l to indicate links.

Implement the cheeky hack of calling the “real” ls

If the argument to your ./ls is --hack, invoke the system-supplied ls as a fork()ed process to solve the lab. The child process should not directly write to stdout; instead it should deliver its output to a pipe that is shared with the parent (your ./ls).

Use the system calls fork(), exec(), pipe(), dup2() (you may need others). Do not use the system() system call. You need not follow the error handling that is specified for the rest of the assignment.

The basic idea is that you should create a pipe (a file descriptor pair) in your ls. Then you will need to fork(). In the child process, use dup2() to rearrange the output to go to the write end of the pipe, then exec() to run the system ls. In the parent process (your ./ls), read that pipe and output the results.

Handin Procedure

Handing in consists of three steps:

  1. Executing this checklist:
    • Make sure your code builds, with no compiler warnings.
    • Make sure you’ve used git add to add any files that you’ve created.
    • Fill out the top of the answers.txt file, including your name and NYU Id
    • Make sure you’ve answered every question in answers.txt
    • Create a file called slack.txt noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
  2. Push your code to GitHub, so we have it (from outside the container or, if on Mac, this will also work from within the container):
   $ cd ~/cs202/lab2    
   $ git commit -am "hand in lab2"
   $ git push origin 

   Counting objects: ...
   ....
   To ssh://github.com/nyu-cs202/labs-24fa-<username>.git
     7337116..ceed758  main -> main

If git push origin does not work from within Docker and you are on a Mac, then make sure your ssh-agent is running. See here. If you are not on a Mac, you can either push from outside Docker using your host’s git or you can use GitHub’s Personal Access Tokens.

  1. Actually submit, by timestamping and identifying your pushed code:

    • Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command git log -1 --format=oneline. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to use the tool gitk. Another is git log -p, as explained here, or git show.
    • Create a new file commit.txt locally with only the commit id you just copied.
    • Now go to Gradescope and select the “Lab 1” assignment; then choose Upload as the submission method. Submit only commit.txt.
    • You can submit multiple times before the deadline. Your last submission will determine your grade.

ATTENTION. Please make sure the full name of your submit file (including file extension) is commit.txt. Submitting the file with names such as commit, commit.txt.txt might lead to the consequence getting style points deduction!

Please also make sure you only include your commit id, which should be a string of 40 characters. What we would do is to take the first 40 characters of commit.txt and checkout that repo from your Github repository. So, failing to conform to this guideline might also lead to the consequence of not being to grade your submission properly!

Ground truth is what and when you submitted to Gradescope. And, the time of your submission for the purposes of tracking lateness is the time when you submit the assignment through Gradescope, not the time when you executed git commit.

After you submit your assignment, we will process it as follows:

We’ll checkout your Github repository using the commit ID you provided in the commit.txt file and upload this code to your Gradescope assignment. As a result, you may notice that your latest submission on Gradescope is marked as “late,” even if you submitted on time. Don’t worry about this discrepancy; it’s an expected part of our process. What matters is that you submit your commit.txt file on time. To verify your actual submission time, always refer to the latest submission of commit.txt in the “Submission History” on Gradescope. This is the timestamp we use to determine if your assignment was submitted by the deadline. We use this method to enable direct annotation of your code on Gradescope, allowing us to provide more detailed feedback.

Remember, as long as your commit.txt is submitted before the deadline with the correct commit ID, your assignment will be considered on time, regardless of when the code appears on Gradescope.

This completes the lab.

Acknowledgments

The scaffolding and architecture of this lab are due to Aurojit Panda, with modifications by the CS202 staff.