CS202: Lab 3: Multi-threaded programming

CS202: Lab 3: Multi-threaded programming

Introduction

In this lab, you will implement a model of an online store. Your model will support safe parallel updates to a central inventory database, which maintains stock of all items available for sale from the online store. Updates to this inventory database will come from both suppliers (who add goods to the inventory and enable or disable deals on items, such as discounts and free shipping) and customers (who, by purchasing items from the online store, remove them from the inventory). In the model, customers and suppliers will be represented by different threads, all executing in parallel.

Though your store is only a model, the concepts you will learn and use throughout this lab could be applied similarly in a real-world setting. Suppliers and customers would be on different computers, talking to a central inventory database over the Internet. Requests sent over the Internet from suppliers and customers would be handled by different threads on the central inventory computer, and any operations on the database would have to be done with proper synchronization, just as in your model.

Coding style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. You will have to pay attention to things like indentation and spacing (which need to be consistent), variable names (which need to be clear and consistent), repeated code (which needs to be avoided), spaghetti code (which needs to not happen), etc. Follow this style guide.

Concurrency commandments. You will need to read Coding Standards for Programming with Threads, by Mike Dahlin. You are required to follow these standards for this project. In the real world, unclear multi-threaded code is particularly dangerous – because it is difficult or impossible to test. Moreover, if it's not clear, then the programmer who comes after you cannot debug it, maintain it, or add new features.

Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this project will primarily be based on reading your code. If your code is difficult to understand, then you will lose points (potentially lots of them), even if the program seems to work. Feel free to sit down with the TAs or instructor during office hours for code inspections before you turn in your project (note: to take advantage of this, start on time! [which is probably earlier than you think]).

Getting Started

Obtain and update the lab files as follows. We assume that you have set up the upstream as described in the lab setup. Then run the following on your local machine (Mac users can do this on their local machine or within the Docker container; Windows and CIMS users should do this from outside the container):

$ cd ~/cs202
$ git fetch upstream
$ git merge upstream/main
.... # output omitted
$ ls
cs202-run-docker  docker  lab1  lab2 lab3 learn-ctags  README.md

This lab’s files are located in the lab3 subdirectory.

If you have any “conflicts” from lab 2, resolve them before continuing further. Run git push to save your work back to your personal repository.

Let’s begin by ensuring that you can build the code you just pulled. Enter the Docker environment:

$ ./cs202-run-docker
cs202-user@172b6e333e91:~/cs202-labs$ cd lab3/
cs202-user@172b6e333e91:~/cs202-labs/lab3$ ls
EStore.cpp            RequestGenerator.cpp  TaskQueue.cpp         estoresim.cpp
EStore.h              RequestGenerator.h    TaskQueue.h           sthread.cpp
Makefile              RequestHandlers.cpp   answers.txt           sthread.h
Request.h             RequestHandlers.h     check-lab.sh
cs202-user@172b6e333e91:~/cs202-labs/lab3$ make

The rest of these instructions presume that you are in the Docker environment. We omit the cs202-user@172b6e333e91:~/cs202-labs part of the prompt.

You should see output similar to the following:

g++ -pipe -MD -I. -Wall -g -c  estoresim.cpp -o build/estoresim.o
estoresim.cpp:116:1: warning: ‘void* customer(void*)’ defined but not used [-Wunused-function]
 customer(void* arg)
 ^~~~~~~~
estoresim.cpp:95:1: warning: ‘void* supplier(void*)’ defined but not used [-Wunused-function]
 supplier(void* arg)
 ^~~~~~~~
estoresim.cpp:74:1: warning: ‘void* customerGenerator(void*)’ defined but not used [-Wunused-functio]
 customerGenerator(void* arg)
 ^~~~~~~~~~~~~~~~~
estoresim.cpp:43:1: warning: ‘void* supplierGenerator(void*)’ defined but not used [-Wunused-functio]
 supplierGenerator(void* arg)
 ^~~~~~~~~~~~~~~~~
g++ -pipe -MD -I. -Wall -g -c  TaskQueue.cpp -o build/TaskQueue.o
g++ -pipe -MD -I. -Wall -g -c  EStore.cpp -o build/EStore.o
g++ -pipe -MD -I. -Wall -g -c  RequestGenerator.cpp -o build/RequestGenerator.o
g++ -pipe -MD -I. -Wall -g -c  RequestHandlers.cpp -o build/RequestHandlers.o
g++ -pipe -MD -I. -Wall -g -c  sthread.cpp -o build/sthread.o
g++ -pipe -o build/estoresim build/estoresim.o build/TaskQueue.o build/EStore.o build/RequestGenerator.o build/RequestHandlers.o build/sthread.o -lpthread -lrt

A Quick Introduction to C++

A very helpful way to think of shared state is as shared objects. We're therefore going to use C++, C's object-oriented descendant, for this project.

If you haven't used C++ before, don't worry. For the subset of C++ relevant to this project, the learning curve will be gentle (especially assuming you already know Java). Furthermore, we will provide template code to which you will add the details; this should largely insulate you from having to learn much C++ syntax.

Read A Quick Introduction to C++, and you should be good to go. Note that this document was written over a decade ago, so a few of the comments on the state of standards and tools are a bit out of date (for example, the document warns against using C++ templates because debuggers didn't understand them well back then; this warning is much less applicable today.) Nonetheless, it provides a good, quick overview of the key ideas to use (and some issues/pitfalls to avoid.)

C++ References. See our references page for some tutorials and futher references on C++.

Working with Threads

As noted earlier, you will need to read and follow Coding Standards for Programming with Threads, by Mike Dahlin. Also note that we covered monitors in class, and assigned this chapter from OSTEP.

To simplify your task, we supply a simple thread package (written by Mike Dahlin) on top of the standard POSIX thread library (which is known as pthreads). The idea is to shield you from irrelevant detail. This way, you use the standard package but you also focus on the project at hand. However, you are not required to use the wrapper; you may instead use pthreads if you so choose. The code for the simple thread package (which we will refer to hereafter as sthreads) we provide is in sthread.cpp and sthread.h.

The package provides threads (sthread_ts), mutex locks (smutex_ts), and condition variables (scond_ts) as well as some other utility functions that you may need. We suggest that you read over these functions to see how they are used. It may be helpful to write a couple of example programs using sthreads before starting this project. For more information, see the man pages for the pthreads library functions used in the sthread.cpp code.

You should keep the following in mind as you code these labs:

Part A: Navigating code: grep

Before we get into the meat of the lab, we will cover a useful tool: grep.

grep can be used to search for a text string or regular expression in files. It is commonly called with the following pattern:

$ grep [flags] [expression-to-search-for] [files-to-search]

To see a concrete example, run the following command:

$ grep "void" sthread.h

grep scans the file sthread.h and prints out each line that contains the term void. You should get the following output:

void smutex_init(smutex_t *mutex);
void smutex_destroy(smutex_t *mutex);
void smutex_lock(smutex_t *mutex);
void smutex_unlock(smutex_t *mutex);
...

Multiple files can be passed into the command:

$ grep "void" sthread.h sthread.cpp
sthread.h:void smutex_init(smutex_t *mutex);
sthread.h:void smutex_destroy(smutex_t *mutex);
sthread.h:void smutex_lock(smutex_t *mutex);
sthread.h:void smutex_unlock(smutex_t *mutex);
sthread.h:void scond_init(scond_t *cond);
sthread.h:void scond_destroy(scond_t *cond);
sthread.h:void scond_signal(scond_t *cond, smutex_t *mutex);
sthread.h:void scond_broadcast(scond_t *cond, smutex_t *mutex);
sthread.h:void scond_wait(scond_t *cond, smutex_t *mutex);
sthread.h:void sthread_create(sthread_t *thrd,
sthread.h:                  void *(start_routine(void*)), 
sthread.h:                  void *argToStartRoutine);
sthread.h:void sthread_exit(void);
sthread.h:void sthread_join(sthread_t thrd);
sthread.h:void sthread_sleep(unsigned int seconds, unsigned int nanoseconds);
sthread.h:long sutil_random(void);
sthread.cpp:void smutex_init(smutex_t *mutex)
sthread.cpp:void smutex_destroy(smutex_t *mutex)
sthread.cpp:void smutex_lock(smutex_t *mutex)
sthread.cpp:void smutex_unlock(smutex_t *mutex)
sthread.cpp:void scond_init(scond_t *cond)
sthread.cpp:void scond_destroy(scond_t *cond)
sthread.cpp:void scond_signal(scond_t *cond, smutex_t *mutex __attribute__((unused)))
sthread.cpp:void scond_broadcast(scond_t *cond, smutex_t *mutex __attribute__((unused)))
sthread.cpp:void scond_wait(scond_t *cond, smutex_t *mutex)
sthread.cpp:void sthread_create(sthread_t *thread,
sthread.cpp:                void (*start_routine(void*)), 
sthread.cpp:                void *argToStartRoutine)
sthread.cpp:void sthread_exit(void)
sthread.cpp:void sthread_join(sthread_t thrd)
sthread.cpp:void sthread_sleep(unsigned int seconds, unsigned int nanoseconds)

$ grep "void" sthread.*
# (Same output!)

You can also search all files in a directory recursively with the -r flag:

$ grep -r "void" .

(This command searches all files in the current directory; it should return hundreds of lines.)

Exercise G1. Run the following command:

$ grep -r "smutex" .

Answer the following questions

  • Informally, what does the command above do?
  • Paste the first few lines of the output into answers.txt

Exercise G2. Answer the following questions in answers.txt:

  • What command can you use to search the file RequestGenerator.cpp for Task objects?
  • What command can you use to search all header files for the keyword return?

Grep can also match regular expressions. We will give a simple demo here. To learn more about grep and regular expressions, read the man pages!!

Exercise G3. Run the following command:

$ grep -r "smutex.*(.*);" .

What types of lines are returned by the program? What information does this command give you about your code? Place your answer in answers.txt.

Part B: Task Queue and Coarse-Grained Store Synchronization

For our model, we will simulate a fixed number of customers and suppliers. For each customer and each supplier, there will be a unique thread representing that customer or supplier. We will often refer to these threads throughout this document as worker threads. These worker threads will get jobs to work on from a task queue, which you will implement. There will be one queue for all customer threads and one queue for all supplier threads. This task queue must allow for multiple worker threads to simultaneously attempt to add or remove jobs while still maintaining the integrity of the queue's internal data structures by using locks (i.e., your task queue must be thread-safe).

It is a common pattern in multi-threaded programming to have a single (or small number of) task queue(s) for a large number of threads. A thread-safe task queue makes the job of allocating work to threads (and having threads allocate work to other threads, should the case arise) easy to do and (relatively) easy to reason about.

Exercise 1. Look through the queue interface in TaskQueue.h and the documentation for its methods in TaskQueue.cpp. Fill out the rest of the members in class TaskQueue in TaskQueue.h to finish the definition of your task queue class, then implement the rest of the queue methods in TaskQueue.cpp. You are welcome to use any standard C++ container to help build the functionality of your queue (such as a std::queue or std::deque), or you can create your own data structures (like a linked list) and add any needed helper structures to do the same.

A task in the task queue is represented by a struct Task (also found in TaskQueue.h), which consists of a pointer to a function and an argument to be passed to that function. When a worker thread removes a task from the task queue, it should call the function given in the struct Task with the argument given in the Task.

Run make to make sure your queue code compiles. For now, there's no executable to run.

A task queue is little good to worker threads if there is no work available to be put on it. We have provided some code to generate work requests to be put on your task queues in RequestGenerator.h and RequestGenerator.cpp. Become familiar with this code. Specifically, there are two subclasses of the main class RequestGenerator that you should know the purpose of: class CustomerRequestGenerator and class SupplierRequestGenerator. The former CustomerRequestGenerator, as its name would suggest, generates requests for customer threads to perform (such as buying items). The latter SupplierRequestGenerator generates requests for supplier threads to perform (such as adding or removing items from inventory, or putting items on sale). You will use these generator classes to implement task generator threads, which will produce work for customer and supplier threads.

Before you start implementing any thread functions, however, you should actually have some threads running. In the main source file for our simulator, estoresim.cpp, is a function called startSimulation. This function kicks off all threads in the simulator (generators, customers, and suppliers) and then, after starting all threads, waits for them to finish. When all threads have finished working, the simulation is complete.

Exercise 2. Read the documentation for, and then implement, the startSimulation function in estoresim.cpp. Use sthread_create to create threads. The worker thread functions supplierGenerator, supplier, customerGenerator and customer all reside in estoresim.cpp. Use the provided class Simulation to keep track of task queues and the number of customers and suppliers. When you have finished writing your code, run make run-sim to run the simulator. You may want to have the threads produce some output to make sure that you are starting them correctly.

Don't worry about the meaning of the fineMode variable in Simulation for now, just make sure it is set to the value passed in the parameter useFineMode.

You may also find it useful in this exercise, and in others throughout this lab, to call printf in some choice places (where worker threads start, for instance). The code we provide you doesn't print anything to the terminal, and so when you run the simulator with make run-sim, you won't see any output unless you put the printfs in yourself.

At the moment, all of the threads you have created start and then immediately stop, as none of their associated functions are implemented (they just return when called). However, since you have actual threads running, you can proceed to implement the generator, customer, and supplier thread functions. If you put printfs in these functions as suggested, you should see all the threads print to the terminal.

Exercise 3. Implement the customer, supplier, supplierGenerator, and customerGenerator functions in estoresim.cpp. Read the code referred to above (e.g. RequestGenerator.cpp) and the comments for these functions to get an idea of what they should be doing. The generator functions should look similar to each other, as should the supplier and customer functions to each other. Produce some output from these threads to make sure they're running and that jobs are being pushed between them properly.

For the supplierGenerator and customerGenerator functions, you will also have to implement the enqueueStops method in RequestGenerator.cpp.

Run make run-sim to run the simulator. You should see the output of your worker threads. You can stop the simulator by pressing Control-C.

Now you have many customers and suppliers working on jobs generated by the generator threads. But there is no inventory for items in the store, nor are any of the work functions produced by the request generators implemented. So, for the moment, our simulator spawns many workers in parallel to do very little. For the workers to have anything to work on, there should be an inventory of items to purchase from and add to. To this end, we have provided the skeleton of this inventory for you as class EStore, in the files EStore.h and EStore.cpp.

Exercise 4. Design and implement the EStore class, filling in all the method skeletons we provide and adding any new methods you may find necessary. Read through all the provided comments in the files related to the EStore class (but don't worry about the buyManyItems method for now). The EStore will have to keep track of many Items and ensure that all modifications to Items in the EStore are synchronized.

You should implement the EStore as a monitor; that is, there should be a single lock on the entire store, which is acquired upon entering any of the store's methods and released upon exit.

As a reminder, the buyItem method in EStore should wait for an item to become available or on discount if necessary. So, if a customer comes in looking to buy an item outside of the given budget, the customer should wait until the item is being sold at a sufficiently high discount and then buy, instead of just immediately returning.

Run make run-sim to make sure your code compiles and doesn't have any segfault-inducing bugs in the constructor for class EStore. For the moment, no threads are actually interacting with the EStore, so if you put any printfs in its methods, you won't see them output anything just yet.

There is only one piece remaining to make the simulator work now: the handlers for the jobs created by the request generators and pushed to the worker threads. These jobs consist of adding and removing items from the EStore inventory or setting discounts on items (for suppliers) and of purchasing items (for customers). We've provided skeletons for these handlers in RequestHandlers.cpp. Before you implement these handlers, you may find it helpful to read through all the different kinds of requests that exist in Request.h.

Exercise 5. Go through and implement each handler in RequestHandlers.cpp one-by-one. Print a message at the beginning of each handler which contains the name of the handler and the fields of the request struct passed to it (you don't need to print the value of the store pointer). This will produce a trace of the work done by your threads as they process jobs.

Your handler functions should largely make use of the methods you implemented previously in EStore.

After implementing all the handlers (including stop_handler), your simulator may not terminate on its own. This will happen in the case where a customer is blocked on buying an item, but the item never gets discounted enough for the customer to buy, which will happen sometimes and should be expected. In this case, you will still need to kill the simulator with Control-C.

Run make run-sim after you implement each handler and make sure that requests are being dispatched to your new handlers.

Here's some example output from the staff solution to part B of the lab to give you an idea of what running your simulator should look like. You don't have to worry about matching our output format exactly.

Handling AddItemReq: item_id - 77, quantity - 94, price - $6370.15, discount - 0.00
Handling BuyItemReq: item_id - 83, budget - $14308.86
Handling BuyItemReq: item_id - 35, budget - $33853.86
Handling AddItemReq: item_id - 92, quantity - 22, price - $5167.49, discount - 0.00
Handling BuyItemReq: item_id - 62, budget - $9900.27
Handling AddItemReq: item_id - 90, quantity - 64, price - $5201.59, discount - 0.00
Handling BuyItemReq: item_id - 26, budget - $6805.40
Handling AddItemReq: item_id - 26, quantity - 37, price - $892.72, discount - 0.00
...
Handling ChangeItemPriceReq: item_id - 96, new_price - $7007.23
Handling BuyItemReq: item_id - 46, budget - $19934.51
Handling ShippingCostReq: new shipping cost - $29.21
Handling BuyItemReq: item_id - 79, budget - $30974.88
Handling AddItemReq: item_id - 64, quantity - 42, price - $8883.28, discount - 0.00
...
Handling StopHandlerReq: Quitting.
Handling StopHandlerReq: Quitting.
Handling StopHandlerReq: Quitting.

You should now have a working store simulator. Run make run-sim a few times and watch the output of your workers as they process jobs. If you notice anything suspicious in the output, go back and check to make sure you did your synchronization correctly and that you are following the multi-threaded coding guidelines.

Checkpoint

At this point, commit and push your code, if you have not done so already.

$ git status  # see what files you changed
$ git diff    # examine your changes
$ make clean  # remove generated code
$ git commit -am "My partial solutions to lab3"
$ git push

Part C: Fine-Grained Store Synchronization

In part B, you implemented a store simulator using coarse-grained synchronization: there was one lock on the entire EStore that kept concurrent accesses from stepping on each other, which effectively serialized execution of the program (as no two threads could modify the inventory in EStore at the same time). However, you might have noticed that you don't necessarily need to lock the entire inventory if two threads are trying to modify two different items in the inventory. Thus, in part C, we will explore an approach that exposes more parallelism, by using fine-grained locking. Instead of using one large lock on the entire EStore, you will use locks over smaller units of code. In this way, multiple threads that aren't touching the same data can operate on the inventory concurrently.

Keep in mind throughout the next exercises that you should be able to run the simulator using your old coarse-grained locking approach by running make run-sim and using your yet-to-be-implemented fine-grained locking approach by running make run-sim-fine. The latter command, as might be expected, runs the simulator in "fine-grained mode" (this is related to the mysterious fineMode variable you saw before, which is set to false with make run-sim, but set to true with make run-sim-fine). You may want to abstract some of your locking with methods whose function varies depending on the value of fineMode. To reiterate, your simulator must use your coarse-grained locking approach from part B when we run make run-sim and use your new fine-grained locking approach to be done in part C when we run make run-sim-fine!

Exercise 6. Change your EStore implementation to allow multiple threads to access different items at the same time. If two threads try to access the same item at the same time, then only one should be allowed to modify the item at a time.

Don't worry about modifying the implementation of buyItem. In fine-grained mode, buyItem is not called. You will implement the fine-grained mode version of buyItem, called buyManyItems, in the next exercise.

Run make run-sim-fine to make sure your code still runs. Since you are now running in fine-grained mode, BuyItem requests are not being generated; instead, BuyManyItems requests are generated. This function is still unimplemented, so only suppliers will be making any modifications to the state of the EStore at the moment.

To flex the new fine-grained locking features of the simulator, you must implement the handler and EStore functionality for the BuyManyItems request. This request requires that many items be looked at (with synchronization!) and, if the right conditions hold, modified (i.e. purchased).

Exercise 7. Implement the EStore method buyManyItems and its corresponding handler in RequestHandlers.cpp. Make sure that, as with previous request handlers, you print out the name of the request and the fields of the request struct passed to the handler at the beginning of the buyManyItems handler.

Run make run-sim-fine. You should now see buyManyItems requests being handled by threads. You should also make sure that your old code still functions correctly with make run-sim.

Here's some sample output from the staff solution to part C. As before, your output doesn't have to look exactly like ours, but this should give you an idea of what we'd like to see when we run your simulator.

Handling AddItemReq: item_id - 62, quantity - 91, price - $4901.27, discount - 0.00
Handling BuyManyItemsReq: item_ids - 15 35 49 77 86 92 93 
Handling BuyManyItemsReq: item_ids - 26 40 63 
Handling AddItemReq: item_id - 36, quantity - 69, price - $53.11, discount - 0.00
...
Handling BuyManyItemsReq: item_ids - 21 29 35 37 58 74 93 95 
Handling AddItemReq: item_id - 43, quantity - 30, price - $3360.28, discount - 0.00
Handling BuyManyItemsReq: item_ids - 4 11 43 63 76 
Handling BuyManyItemsReq: item_ids - 18 
...
Handling StopHandlerReq: Quitting.
Handling StopHandlerReq: Quitting.
Handling StopHandlerReq: Quitting.

Extra Credit

Do these problems on a new branch called lab3-extended (i.e., your commits for these exercises should go on this new branch). The main branch should include your code only up until Exercise 7; we will grade under this assumption. To create the new branch and work inside that branch:

$ git checkout -b lab3-extended

To go back to the original branch (which you should do before submitting):

$ git checkout main

When submitting, you must push both branches:

$ git status
  # Make sure you are on main to avoid future pain!

$ git push origin main
$ git push origin lab3-extended

Exercise 8. Implement a version of the EStore::buyManyItems method that will wait until the order can be fulfilled instead of giving up. The implementation should not wake up threads unnecessarily. For instance, if an item decreases in price, only threads that are waiting to buy an order that includes that item should be signaled (though all such threads should be signaled).

Exercise 9. Ensure that the shipping cost and store discount do not change while processing an order in EStore::buyManyItems.

Submission

Handing in consists of three steps:

  1. Executing this checklist:

    • Make sure your code builds, with no compiler warnings.
    • Make sure you’ve used git add to add any files that you’ve created.
    • Fill out the top of the answers.txt file, including your name and NYU Id
    • Make sure you’ve answered every question in answers.txt
    • Make sure you have answered all code exercises in the files, and that your code is consistent with the expectations described in the introduction.
    • Make sure that, if you did the extra credit, you are back on the main branch (use git branch or git status to make sure).
    • Create a file called slack.txt noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
  2. Push your code to GitHub, so we have it (from outside the container or, if on Mac, this will also work from within the container):

    $ cd ~/cs202/lab3   
    $ git commit -am "hand in lab3"
    $ git push origin 
    
    Counting objects: ...
    ....
    To  git@github.com:nyu-cs202/labs-23sp-<YourGithubUsername>.git
      7337116..ceed758  main -> main

    (See the extra credit section for additional submission instructions.)

  3. Actually submit, by timestamping and identifying your pushed code:

    • Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command git log -1 --format=oneline. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to use the tool gitk. Another is git log -p, as explained here, or git show.
    • Now go to NYU Brightspace; there will be an entry for this lab. Paste only the commit id that you just copied.
    • You can submit as many times as you want; we will grade the last commit id submitted to Brightspace.

NOTE: Ground truth is what and when you submitted to Brightspace. Thus, a non-existent commit id in Brightspace means that you have not submitted the lab, regardless of what you have pushed to GitHub. And, the time of your submission for the purposes of tracking lateness is the time when you upload the id to Brightspace, not the time when you executed git commit.

This completes the lab.

Acknowledgements

Parts of the documentation for this lab, as well as the sthreads library, were taken from a lab about a network scheduler by Mike Dahlin. The lab itself was written by Victor Vu and Isami Romanowski, with contributions from Sebastian Angel, Parth Upadhyay, and Navid Yaghmazadeh.