This lab is intended to:
- reinforce the C programming language
- give you experience as a programmatic user of the file system
- give you experience using system calls, which almost always includes consulting documentation (in this case, the “man pages”)
In this lab, you will build a version of ls
, a Unix utility that lists files in a directory. Yours will print one file per line (this is equivalent to ls -1
on Linux, OS X or other Unix systems) in any mode. When run in the directory containing your code, this should result in output similar to what is shown below:
$ ./ls
ls
main.c
main.c.o
Makefile
README.md
Note the difference between ls
and ./ls
. The first one is the “system-supplied” ls
(type $ which ls
to see where this ls
lives). The second one is your ls
. The shell knows the difference because .
means the current directory, so ./ls
tells the shell “I want a program named ls
in the current directory.” Likewise, ..
means “the directory above the current one.” So if you were in a directory underneath your lab directory, ../ls
would again refer to your ls
. You should have seen the concepts of .
and ..
in CSO.
The ls
implementation for this lab also needs to support a subset of the options supported by ls
. These include the following; the detailed specification is in a section below.
- An arbitrary number of files and directories given on the command line.
-a
: list all files, including files starting with.
and the pseudo-files.
and..
(see above for the meanings of these pseudofiles).-l
: use a long listing format: this provides additional information about each file-R
: recursively lists files in subdirectories- Combinations of the flags above, for example
./ls -la
and./ls -alR
--help
: the help message- Error handling, as specified below
We recommend that you now read through the whole lab, before executing any of it.
Getting Started
NOTE. We recommend attempting to do this lab without searching on the Internet. This will force you to make use of “man pages” and the documentation included on the system. These often contain much more detailed information than Stack Overflow and similar sources.
Obtain and update the lab files as follows. We assume that you have set up the upstream as described in the lab setup. Then:
$ cd ~/cs202
$ git fetch upstream
$ git merge upstream/master
This lab’s files are located in the lab2
subdirectory.
Let’s begin by ensuring that you can build the code you just pulled:
$ cd ~/cs202/lab2
$ make
You should see output similar to the following:
$ make
cc -pedantic -Wall -std=c11 -g -D_POSIX_C_SOURCE=200112L -c main.c -o main.c.o
main.c: In function ‘main’:
main.c:196:29: warning: variable ‘list_all’ set but not used [-Wunused-but-set-variable]
bool list_long = false, list_all = false;
^~~~~~~~
main.c:196:10: warning: unused variable ‘list_long’ [-Wunused-variable]
bool list_long = false, list_all = false;
^~~~~~~~~
At top level:
main.c:56:12: warning: ‘group_for_gid’ defined but not used [-Wunused-function]
static int group_for_gid(gid_t gid, char *buf, size_t buflen) {
^~~~~~~~~~~~~
main.c:44:12: warning: ‘uname_for_uid’ defined but not used [-Wunused-function]
static int uname_for_uid(uid_t uid, char *buf, size_t buflen) {
^~~~~~~~~~~~~
main.c:15:12: warning: ‘err_code’ defined but not used [-Wunused-variable]
static int err_code;
^~~~~~~~
cc ./main.c.o -o ls
You will also find that an executable named ls
has been created in the directory.
Type:
$ ./ls
to see what happens.
Coding
Now that you have ensured that you can build the project, you can start coding. You need to edit only main.c
. Read the rest of the lab for utilities, documentation, building blocks, requirements, and hints.
Compiler warnings. Make sure that, when your code is compiled, the compiler produces no warnings. (Example warnings are given above [“variable set but not used”, etc.]; these happen because the code isn’t fleshed out.) We will be deducting points for compile-time warnings.
Programming style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. Please read this style guide.
Argument parsing
Look at the main()
function. main()
uses getopt_long()
for argument parsing. You will need to understand this function, which requires finding and reading documentation. To start, let us use the apropos
command to find where the documentation for getopt_long
lives. Type:
$ apropos getopt_long
This should result in:
$ apropos getopt_long
getopt_long (3) - Parse command-line options
getopt_long_only (3) - Parse command-line options
This output indicates that getopt_long
provides a mechanism for parsing command-line options, and is documented in section 3 of the manual pages. You can now read this documentation by running:
$ man 3 getopt_long
This shows you information on getopt_long()
, including example use. In this case, the man
command provides access to the Linux manual pages, the 3
specifies that you are looking for a page in section 3 of the manual, and getopt_long
specifies the function you are looking for.
Documentation
Individual man pages are often referenced using notation like GETOPT(3)
, which indicates that you are looking at the GETOPT
man page in section 3. Similarly, the convention in referring to functions is that function_name(num)
refers to a function (or syscall) documented at section num
in the man pages. For example, getopt(3)
refers to “the getopt() documented in section 3 of the man pages”; it does not refer to invoking getopt with an argument of 3. We’ll be using the func(num)
notation below.
As we showed above, you can find out what section documents a function using the apropos
command.
In fact, you can find more documentation for apropos itself by running
$ apropos apropos
apropos (1) - search the manual page names and descriptions
$ man apropos
Building blocks
You will need to use the following syscalls. Learn about them through man
pages.
readdir(3)
: (type:man 3 readdir
andman 3p readdir
, and see below for the difference between the3
and the3p
) will let you list all the files in a directory and figure out their type. Takes aDIR*
structure, which you can get fromopendir
.opendir(3)
: (man 3 opendir
andman 3p opendir
) will let you open a directory and get aDIR*
structure that you can then pass toreaddir
.closedir(3)
,rewinddir(3)
: (man 3 closedir
,man 3p closedir
,man 3 rewinddir
) will let you close the directory or being listing anew.stat(2)
: (man 2 stat
) will allow you to check whether a file exists and also return file information necessary when runningls -l
. See alsoinode(7)
.exit(3)
: (man 3 exit
) is how you exit out of your program with a status code. By convention, programs exiting without failure return status code 0; you can see an example of this in thehelp()
function. You will use exit() to return the error codes that are specified below.
On 3
versus 3p
: this is confusing. The ‘3’ refers to section 3 in the man pages of Linux (which of course is an operating system) whereas the ‘3p’ refers to section 3 in the man pages of POSIX, which is a standard for Unix-like systems that Linux mostly implements, and in some cases extends. The documentation you see in the ‘3’ man pages is authoritative for the Linux system, but the ‘3p’ pages for readdir
and opendir
may give you inspiration, as they contain example usage. For this lab, it’s best to consult both sets of pages.
Requirements
Here are the required pieces; later in the lab we’ll describe steps for building out your solution to address the requirements.
Your solution may not invoke the system-supplied
ls
. For example, usingsystem()
,fork()
,exec()
,pipe()
to call the “real”ls
will not receive credit for the main lab. But see extra credit.In all output from your
ls
, directories should have a/
(forward slash) appended to the name. For example:$ cd ~/cs202 $ lab2/ls lab1/ <--- NOTE: lab1/ not lab1 lab2/ <--- NOTE: lab2/ not lab2
This applies to non-pseudo directories; see below for handling of pseudo-directories.
There is no required output order. You may, if you want, output the contents of a directory in any order (the real
ls
by default sorts alphabetically).Files and directories: You need to handle an arbitrary number of files and directories given on the command line; from
ls
’s perspective, these are the “actual arguments” (the-l
,-a
are “options”). For example:$ mkdir test # create a directory named 'test' $ touch test/foo # create a file named 'foo' inside 'test' $ lab2/ls lab2 test lab2: ls main.c main.c.o Makefile README.md test: foo
When given a file, your
ls
should print the file name if it exists, for example:$ lab2/ls lab2 test/foo test/foo lab2: ls main.c main.c.o Makefile README.md
Note: as in the previous requirement, there is no required output order. You may, if you want, handle each argument in the order given.
-a
: list all files, including files starting with.
and the pseudo-files.
and..
. When executed in the directory containing your code, this should result in output similar to:$ ./ls -a . <--- NOTE: no slash .. <--- NOTE: no slash .gitignore ls main.c main.c.o Makefile README.md
Note that the pseudodirectories .
and ..
do not have slashes appended.
-l
: Use a long listing format: this provides additional information about each file including file permissions (see below), number of links (normally, this is 1 for files, and for directories, counts the number of entries), user who owns the file, the group which owns the file (see below for utility functions), its size (in bytes) and the last time file status was modified (referred to asmtime
in most OS structures). When executed in a directory containing your code, this should result in output similar to:$ ls -l -rwxr-xr-x 1 apanda apanda 17128 May 29 11:43 ls -rw-r--r-- 1 apanda apanda 3091 May 29 11:37 main.c -rw-r--r-- 1 apanda apanda 3264 May 29 11:43 main.c.o -rw-r--r-- 1 apanda apanda 307 May 29 10:16 Makefile -rw-r--r-- 1 apanda apanda 46 May 29 10:33 README.md
For those first 10 characters, you will need to convert the file mode (
st_mode
instruct stat
) into a permissions string as specified below (you should have seen the underlying concepts surrounding permissions in CSO, but for better or worse, you don’t need a firm grasp on those concepts to do this part of this assignment):- The first character indicates the type of the file. For normal files this is a
-
, for directories it is ad
, and for all other files it is a?
. Note this is a subset of the actual type indicators used by the system-suppliedls
. - The next three characters (the so-called
rwx
or permissions bits) indicate access mode for the user who owns the file:- The first of these three characters is
r
if the user can read the file or-
otherwise - The second is similarly
w
if the user can write to the file and-
otherwise - The last is
x
if the user can execute the file or-
otherwise. Similar to above, this scheme is a subset of the actual mode strings used on Unix (specifically, we do not account for the set-user-ID bit, the set-group-ID bit or the sticky bit).
- The first of these three characters is
- The next three characters are the same as above, but for the group that owns the file, rather than the user.
- Finally the last three characters indicate the permissions for “everyone else”, or “other” (non-user, non-group members). You can find information about how to interpret the file mode in
inode(7)
, orman 7 inode
- You can confirm your mode string output by comparing to the output given when running the “real”
ls -l
.
- The first character indicates the type of the file. For normal files this is a
You also need to print username and groupname. For this purpose, we have provided you with the utility functions
uname_for_uid
andgroup_for_gid
, which convert from what stat provides – numeric user ID (st_uid
) and group ID (st_gid
) – to appropriate strings.You need not correctly identify and display information for symlinks (if this last sentence doesn’t mean anything to you, then you should ignore it). See the extra credits at the end of this write-up for more information.
You do not need to precisely line up all fields as
ls
does. Make a reasonable effort at formatting your output.You should use the provided
date_string
function to format mtime.stat(3)
(man 3 stat
) may be inspiring, in that it points toman 3 fstatat
, which could be helpful.
-R
:ls -R
recursively lists files in subdirectories. For example consider the following directory:$ ./lab2/ls lab1/ lab2/ test/
Running
ls -R
here would yield output similar to:$ ./lab2/ls -R .: # printing this line is optional lab1/ lab2/ test/ ./lab1: .... ./lab2: ls main.c main.c.o Makefile README.md ./test: foo
Remember: a correct implementation of recursive listing must handle arbitrarily nested directories.
Combinations of flags:
ls -la
(orls -al
orls -a -l
) must work correctly. For example, your output for./ls -al
should be similar to:$ ./ls -al drwxr-xr-x 3 apanda apanda 4096 May 29 11:43 . drwxr-xr-x 4 apanda apanda 4096 May 29 09:52 .. -rw-r--r-- 1 apanda apanda 7 May 29 09:34 .gitignore -rwxr-xr-x 1 apanda apanda 17128 May 29 11:43 ls -rw-r--r-- 1 apanda apanda 3091 May 29 11:37 main.c -rw-r--r-- 1 apanda apanda 3264 May 29 11:43 main.c.o -rw-r--r-- 1 apanda apanda 307 May 29 10:16 Makefile -rw-r--r-- 1 apanda apanda 46 May 29 10:33 README.md
Similarly, a correct implementation of
-R
should allow combination with-l
and-a
.--help
: the help functionhelp()
should appropriately document the options you implement, including any extra credit options.Error handling: You should ensure that your code returns error code
0
when it can run successfully. You should also ensure that your program performs all actions it can successfully perform before exiting. For example consider the following:$ ./ls a/ b $ echo $? 0 $ ./ls c a d ls: cannot access c: No such file or directory. # This blank line is optional a: test # This blank line is optional ls: cannot access d: No such file or directory. $ echo $? 129
In the example above
echo $?
prints the exit code returned by the last command to be executed on this shell. As you can see, the first run of./ls
is successful and hence has an exit code of0
, while the next run encounters a problem where two of the arguments (c and d) do not exist, and it returns129
to indicate that this is the case. Also observe that despite encountering an error when attempting to listc
, the program continues to execute until all arguments have been processed.As another example consider the following run:
$ ./ls a/ b/ $ ./ls -R .: a/ b/ c/ ./a: test ls: cannot open directory ./b: Permission denied. ./c: test-c $ echo $? 130
Again, note that we return error code 130 to indicate that
ls -R
could not successfully list all subdirectories. However, even though we could not list files underb
we still continue listing files underc
.For this assignment we will be using error codes that are different from what is used by
ls
in Linux, OS X or other Unixes.You will encode errors as a bit vector, corresponding to values from 128,129,… Specifically:
- In the bit numbering below, note that “bit i” means “the bit whose value when set is 2i (two-raised-to-the-power i)”.
- If there was any error, set bit 7 (
0b10000000
in binary,0x80
in hex). In addition, set bits according to the rules below: - Set bit 0 (
0b1
in binary,0x1
in hex) if a file specified in the command-line was not found. - Set bit 1 (
0b10
in binary,0x2
in hex) if your program was denied access to a file or directory - Set bit 2 (
0b100
in binary0x4
in hex) if another type of error happened (for example, could not get a group ID or user ID).
The error code you return should have the corresponding bit set for each type of error encountered. Specifically this means that:
- If no error is encountered, you return error code 0, i.e., your program exits by calling
exit(0);
- If you only encounter errors where a specified file does not exist, return error code 129 (bit 7 and bit 0)
- If you only encounter one or more errors where access is denied return error code 130 (bit 7 and bit 1)
- If you encounter an error where a specified file does not exist and you don’t have access to another file, return error code 131 (
0b10000011
in binary). - If you only encounter an error where the
uname_for_uid
orgroup_for_gid
function return an error, you should return error code 132. Of course, if you encounter such errors and the others, you may wind up returning 133, or 134, and so on
Also, in case either
uname_for_uid
orgroup_for_gid
return an error you should not print an error message; instead you should just print the numeric UID or GID. For instance:$ ./ls -l -rw-r--r-- 1 1002 wheel 32 May 29 11:37 bad_user $ echo $? 132
Debugging
The supplied Makefile passes in compile flags that enable detection of memory leaks, underflow, and overflow. The use of these leak detection tools can at times make it annoying to use GDB for debugging your program. You therefore need to disable sanitization when debugging your program. This can be done by setting the ASAN_OPTIONS
environment variable to detect_leaks=0
. Thus to start gdb
you should use a command such as:
$ ASAN_OPTIONS=detect_leaks=0 gdb ls
Recommendations and hints
Some general recommendations followed by some specific steps:
- Use
gdb
to trace the flow of your code. We coveredgdb
in the prior lab, and a hint is immediately above. - Use the system-supplied
ls
for inspiration about what your output should look like. In cases of divergence betweenls
and the sample output here in the lab desription, you can follow the sample output here. - In general, you will be consulting man pages a lot
- Use
git commit
andgit push
after each piece of functionality that you build. See the setup page for advice. - To make your life easier, use editing tools that we saw in lab 1. For example,
Ctrl+]
in vim jumps to function definitions. - You will need to add
#include
s to the top ofmain.c
. The man pages will guide you here. - The
PRINT_ERROR
macro inmain.c
might be helpful in interpreting why some of the system calls you use return errors. - Speaking of errors, be sure you are handling the return values of syscalls correctly. Part of systems programming (or, really, any programming) is handling error cases properly.
main.c
includes some example function signatures (also known as function prototypes). You can use those if you wish, but you do not have to.- the variable
errno
will be useful when you need to check error types. See the man pages forerrno
, and individual functions’ references toerrno
in the man pages.
We recommend building your solution in pieces (and creating a git commit after each piece). Each piece may necessitate some refactoring and rewriting versus the prior piece; this is common in software development. Here’s one possible sequence:
first get
$ ./ls
working (no arguments). This should list the contents of the current directory, meaning.
, and will give you a warmup calling some of the system calls that you need (listed above:opendir
,readdir
).then handle
$ ./ls <dir1> <dir2> ...
. This should list the contents of all directories supplied, and will give you experience with argument-handling code. Make sure you handle newlines between directories correctly. As hints:- If
optind == argc
then it means that no arguments after the flags were given. - Otherwise, arguments indexed from
optind
toargc - 1
are all of the arguments after the flags.
- If
Next,
$ ./ls <dir1> <fname1> <fname2> <dir2> ...
. Now extend your code so you handle the case that filenames are passed on the command-line. For this, you will need to implement logic that determines whether an argument is a filename or directory. This will be your first use ofstat
. It’s probably a good idea to implement a function calledis_dir(char* fname)
, since you’ll need that several times.Now move onto the flags. Implement
$ ./ls -a
. This should be a relatively small modification on top of what you have. In fact, you may have been in “-a” mode all along, so your modification will be to move out of “-a” mode unless yourls
is called with the-a
flag.Implement
$ ./ls -l
. First, extend your argument-handling code to work with the-l
option. Then implement the long-printing functionality. The following may be useful here:man 3 stat
(not justman 2 stat
). Notice this points you toman 3 fstatat
.printf(3)
width specifiers (for example,printf("%6jd")
). As usual, typeman 3 printf
.snprintf(3)
(man 3 snprintf
) to concatenate strings
Implement
$ ./ls -R
. Some hints:- Start by extending the argument-handling code.
- It may help to write out pseudocode, and then translate that pseudocode to real C. The pseudocode should be something like “define a directory-printing routine that: for the given directory, prints out the entries, and then, for each of those entries that is a directory, calls that same directory-printing routine.”
- For
./ls -aR
, don’t recurse infinitely! That means not recursing on..
and.
Check the requirements section above again, to make sure you’re following the specification.
Testing
You can run a set of tests by invoking make test
. Note that these tests are not exhaustive; you should write further tests for yourself. You can read through the test script by editing test.bats
, which is a test system called bats(1)
(see man 1 bats
and man 7 bats
for more information). You might find some of the commands used in setting up the test in mktest.sh
useful when conducting your own tests. For example, touch -t
allows you to set the mtime
for a file.
We have written the tests so they provide a diff between the output of your ls
and what was expected. However, sometimes a bug in your code might result in your version of ls
crashing (with a SEGFAULT or other error) when run on our test directory. You can use the supplied mktest.sh
script to debug such scenarios. To create a test directory named test
simply call
$ ./mktest.sh test
This will create a directory and a set of files for which we have set mtime
s, owners, and groups in a manner that will exercise different portions of your program. Note the ./mktest.sh
script will refuse to run if the directory specified as an argument already exists. You can delete the directory once done using rm -r
as usual. Before submitting your work, you should make sure that you can pass all of the supplied tests.
Extra Credit
We have a few extra credit options for this assignment:
Implement the -h
flag
ls -lh
prints “human readable” file sizes rather than file sizes in bytes. This means that ls -lh
will print file sizes in KB, MB or GB as appropriate; the unit chosen is the largest unit for which file size is greater than 0. For example
$ ./ls -lh
-rwxr-xr-x 1 apanda apanda 17K May 29 12:03 ls
-rw-r--r-- 1 apanda apanda 2.9K May 29 11:59 main.c
-rw-r--r-- 1 apanda apanda 3.2K May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda 307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda 46 May 29 10:33 README.md
Corresponds to the output for ls -l
shown above. Implementing this will require writing functionality to print sizes as appropriate.
Correctly Handle Symbolic Links in ls -l
Symbolic links are Unix features that allow to create “pseudo-files” that point to other files. You can create one using ln -s
, for example running
$ ln -s main.c main.c.link
creates a new file main.c.link
that points to main.c
. The system ls
has special handling for printing symbolic links when showing long listing format. For example in the following output:
$ ls -l
total 36
-rwxr-xr-x 1 apanda apanda 17128 May 29 12:03 ls
-rw-r--r-- 1 apanda apanda 2874 May 29 11:59 main.c
lrwxrwxrwx 1 apanda apanda 6 May 29 14:02 main.c.link -> main.c
-rw-r--r-- 1 apanda apanda 3224 May 29 12:03 main.c.o
-rw-r--r-- 1 apanda apanda 307 May 29 10:16 Makefile
-rw-r--r-- 1 apanda apanda 46 May 29 10:33 README.md
Observe that main.c.link
is displayed as main.c.link -> main.c
. Implement similar handling for your version of ls
. In implementing this you might find the readlink(2)
syscall useful. Also observe that the permission string for the link in this case uses l
to indicate links.
Implement the cheeky hack of calling the “real” ls
If the argument to your ./ls
is --hack
, invoke the system-supplied ./ls
to solve the lab, using the system calls fork()
, exec()
, pipe()
(you may need others). Do not use the system()
system call. You need not follow the error handling that is specified for the rest of the assignment.
Handin Procedure
Handing in consists of three steps:
Executing this checklist:
- Make sure your code builds, with no compiler warnings.
- Make sure you’ve used
git add
to add any files that you’ve created. - Fill out the top of the
answers.txt
file, including your name and NYU Id - Make sure you’ve answered every question in
answers.txt
- Create a file called
slack.txt
noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
Push your code to GitHub, so we have it:
$ cd ~/cs202/lab2 $ git commit -am "hand in lab2" $ git push origin Counting objects: ... .... To ssh://github.com/nyu-cs202/labs-<username>.git 7337116..ceed758 master -> master
Actually submit, by timestamping and identifying your pushed code:
- Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command
git log -1 --format=oneline
. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to use the toolgitk
. Another isgit log -p
, as explained here, orgit show
. - Now go to NYU Classes; there will be an entry for this lab. Paste only the commit id that you just copied.
- You can submit as many times as you want; we will grade the last commit id submitted to NYU Classes.
- Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command
NOTE: Ground truth is what and when you submitted to NYU Classes. Thus, a non-existent commit id in NYU Classes means that you have not submitted the lab, regardless of what you have pushed to GitHub. And, the time of your submission for the purposes of tracking lateness is the time when you upload the id to NYU Classes, not the time when you executed git commit
.
This completes the lab.
Acknowledgments
The scaffolding and architecture of this lab are due to Aurojit Panda, with modifications by the CS202 staff.