CS202: Lab 2: Ls

CS202: Lab 2: Ls

This lab is intended to:

In this lab, you will build a version of ls, a Unix utility that lists files in a directory. Yours will print one file per line (this is equivalent to ls -1 on Linux, OS X or other Unix systems) in any mode. When run in the directory containing your code, this should result in output similar to what is shown below:

$ ./ls
ls
main.c
main.c.o
Makefile
README.md

Note the difference between ls and ./ls. The first one is the “system-supplied” ls (type $ which ls to see where this ls lives). The second one is your ls. The shell knows the difference because . means the current directory, so ./ls tells the shell “I want a program named ls in the current directory.” Likewise, .. means “the directory above the current one.” So if you were in a directory underneath your lab directory, ../ls would again refer to your ls. You should have seen the concepts of . and .. in CSO.

The ls implementation for this lab also needs to support a subset of the options supported by ls. These include the following; the detailed specification is in a section below.

We recommend that you now read through the whole lab, before executing any of it.

Getting Started

NOTE. We recommend attempting to do this lab without searching on the Internet. This will force you to make use of “man pages” and the documentation included on the system. These often contain much more detailed information than Stack Overflow and similar sources.

Obtain and update the lab files as follows. We assume that you have set up the upstream as described in the lab setup. Then:

$ cd ~/cs202
$ git fetch upstream
$ git merge upstream/main

This lab’s files are located in the lab2 subdirectory.

Let’s begin by ensuring that you can build the code you just pulled:

$ cd ~/cs202/lab2
$ make

You should see output similar to the following:

$ make
cc -pedantic -Wall -std=c11 -g -D_POSIX_C_SOURCE=200112L -c main.c -o main.c.o
main.c: In function ‘main’:
main.c:196:29: warning: variable ‘list_all’ set but not used [-Wunused-but-set-variable]
     bool list_long = false, list_all = false;
                             ^~~~~~~~
main.c:196:10: warning: unused variable ‘list_long’ [-Wunused-variable]
     bool list_long = false, list_all = false;
          ^~~~~~~~~
At top level:
main.c:56:12: warning: ‘group_for_gid’ defined but not used [-Wunused-function]
 static int group_for_gid(gid_t gid, char *buf, size_t buflen) {
            ^~~~~~~~~~~~~
main.c:44:12: warning: ‘uname_for_uid’ defined but not used [-Wunused-function]
 static int uname_for_uid(uid_t uid, char *buf, size_t buflen) {
            ^~~~~~~~~~~~~
main.c:15:12: warning: ‘err_code’ defined but not used [-Wunused-variable]
 static int err_code;
            ^~~~~~~~
cc ./main.c.o -o ls 

You will also find that an executable named ls has been created in the directory.

Type:

$ ./ls

to see what happens.

Coding

Now that you have ensured that you can build the project, you can start coding. You need to edit only main.c. Read the rest of the lab for utilities, documentation, building blocks, requirements, and hints.

Compiler warnings. Make sure that, when your code is compiled, the compiler produces no warnings. (Example warnings are given above [“variable set but not used”, etc.]; these happen because the code isn’t fleshed out.) We will be deducting points for compile-time warnings.

Programming style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. Please read this style guide.

Argument parsing

Look at the main() function. main() uses getopt_long() for argument parsing. You will need to understand this function, which requires finding and reading documentation. To start, let us use the apropos command to find where the documentation for getopt_long lives. Type:

$ apropos getopt_long

This should result in:

$ apropos getopt_long
getopt_long (3)      - Parse command-line options
getopt_long_only (3) - Parse command-line options

This output indicates that getopt_long provides a mechanism for parsing command-line options, and is documented in section 3 of the manual pages. You can now read this documentation by running:

$ man 3 getopt_long

This shows you information on getopt_long(), including example use. In this case, the man command provides access to the Linux manual pages, the 3 specifies that you are looking for a page in section 3 of the manual, and getopt_long specifies the function you are looking for.

Documentation

Individual man pages are often referenced using notation like GETOPT(3), which indicates that you are looking at the GETOPT man page in section 3. Similarly, the convention in referring to functions is that function_name(num) refers to a function (or syscall) documented at section num in the man pages. For example, getopt(3) refers to “the getopt() documented in section 3 of the man pages”; it does not refer to invoking getopt with an argument of 3. We’ll be using the func(num) notation below.

As we showed above, you can find out what section documents a function using the apropos command.

In fact, you can find more documentation for apropos itself by running

$ apropos apropos
apropos (1)          - search the manual page names and descriptions
$ man apropos

Building blocks

You will need to use the following syscalls. Learn about them through man pages.

On 3 versus 3p: this is confusing. The ‘3’ refers to section 3 in the man pages of Linux (which of course is an operating system) whereas the ‘3p’ refers to section 3 in the man pages of POSIX, which is a standard for Unix-like systems that Linux mostly implements, and in some cases extends. The documentation you see in the ‘3’ man pages is authoritative for the Linux system, but the ‘3p’ pages for readdir and opendir may give you inspiration, as they contain example usage. For this lab, it’s best to consult both sets of pages.

Requirements

Here are the required pieces; later in the lab we’ll describe steps for building out your solution to address the requirements.

This applies to non-pseudo directories; see below for handling of pseudo-directories.

Note that the pseudodirectories . and .. do not have slashes appended.

Debugging

The supplied Makefile passes in compile flags that enable detection of memory leaks, underflow, and overflow. The use of these leak detection tools can at times make it annoying to use GDB for debugging your program. You therefore need to disable sanitization when debugging your program. This can be done by setting the ASAN_OPTIONS environment variable to detect_leaks=0. Thus to start gdb you should use a command such as:

$ ASAN_OPTIONS=detect_leaks=0 gdb ls

Recommendations and hints

Some general recommendations followed by some specific steps:

We recommend building your solution in pieces (and creating a git commit after each piece). Each piece may necessitate some refactoring and rewriting versus the prior piece; this is common in software development. Here’s one possible sequence:

Testing

You can run a set of tests by invoking make test. Note that these tests are not exhaustive; you should write further tests for yourself. You can read through the test script by editing test.bats, which is a test system called bats(1) (see man 1 bats and man 7 bats for more information). You might find some of the commands used in setting up the test in mktest.sh useful when conducting your own tests. For example, touch -t allows you to set the mtime for a file.

We have written the tests so they provide a diff between the output of your ls and what was expected. However, sometimes a bug in your code might result in your version of ls crashing (with a SEGFAULT or other error) when run on our test directory. You can use the supplied mktest.sh script to debug such scenarios. To create a test directory named test simply call

$ ./mktest.sh test

This will create a directory and a set of files for which we have set mtimes, owners, and groups in a manner that will exercise different portions of your program. Note the ./mktest.sh script will refuse to run if the directory specified as an argument already exists. You can delete the directory once done using rm -r as usual. Before submitting your work, you should make sure that you can pass all of the supplied tests.

Extra Credit

We have a few extra credit options for this assignment:

Implement the -h flag

ls -lh prints “human readable” file sizes rather than file sizes in bytes. This means that ls -lh will print file sizes in KB, MB or GB as appropriate; the unit chosen is the largest unit for which file size is greater than 0. For example

$ ./ls -lh
-rwxr-xr-x 1 apanda apanda  17K May 29 12:03 ls
-rw-r--r-- 1 apanda apanda 2.9K May 29 11:59 main.c
-rw-r--r-- 1 apanda apanda 3.2K May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda  307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda   46 May 29 10:33 README.md

Corresponds to the output for ls -l shown above. Implementing this will require writing functionality to print sizes as appropriate.

Symbolic links are Unix features that allow to create “pseudo-files” that point to other files. You can create one using ln -s, for example running

$ ln -s main.c main.c.link

creates a new file main.c.link that points to main.c. The system ls has special handling for printing symbolic links when showing long listing format. For example in the following output:

$ ls -l
total 36
-rwxr-xr-x 1 apanda apanda 17128 May 29 12:03 ls
-rw-r--r-- 1 apanda apanda  2874 May 29 11:59 main.c
lrwxrwxrwx 1 apanda apanda     6 May 29 14:02 main.c.link -> main.c
-rw-r--r-- 1 apanda apanda  3224 May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda   307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda    46 May 29 10:33 README.md

Observe that main.c.link is displayed as main.c.link -> main.c. Implement similar handling for your version of ls. In implementing this you might find the readlink(2) syscall useful. Also observe that the permission string for the link in this case uses l to indicate links.

Implement the cheeky hack of calling the “real” ls

If the argument to your ./ls is --hack, invoke the system-supplied ./ls to solve the lab, using the system calls fork(), exec(), pipe() (you may need others). Do not use the system() system call. You need not follow the error handling that is specified for the rest of the assignment.

Handin Procedure

Handing in consists of three steps:

  1. Executing this checklist:

    • Make sure your code builds, with no compiler warnings.
    • Make sure you’ve used git add to add any files that you’ve created.
    • Fill out the top of the answers.txt file, including your name and NYU Id
    • Make sure you’ve answered every question in answers.txt
    • Create a file called slack.txt noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
  2. Push your code to GitHub, so we have it:

    $ cd ~/cs202/lab2    
    $ git commit -am "hand in lab2"
    $ git push origin 
    
    Counting objects: ...
    ....
    To ssh://github.com/nyu-cs202/labs-21sp-<username>.git
      7337116..ceed758  main -> main
  3. Actually submit, by timestamping and identifying your pushed code:

    • Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command git log -1 --format=oneline. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to use the tool gitk. Another is git log -p, as explained here, or git show.
    • Now go to NYU Classes; there will be an entry for this lab. Paste only the commit id that you just copied.
    • You can submit as many times as you want; we will grade the last commit id submitted to NYU Classes.

NOTE: Ground truth is what and when you submitted to NYU Classes. Thus, a non-existent commit id in NYU Classes means that you have not submitted the lab, regardless of what you have pushed to GitHub. And, the time of your submission for the purposes of tracking lateness is the time when you upload the id to NYU Classes, not the time when you executed git commit.

This completes the lab.

Acknowledgments

The scaffolding and architecture of this lab are due to Aurojit Panda, with modifications by the CS202 staff.