Review session 5

Outline:
  1. Lab 4 overview [50 min]
      a. Introduction [5 min]
      b. Questions - Lab 4 [5 min]
      c. What are we trying to solve? [7 min]
      d. Surveying the code [33 min]
      e. Review what we've found [5 min]
  2. Questions [5 min]


1. Lab 4 overview

a) Introduction [5 min]

- This review session will be dedicated to going over Lab 4
- In order to do the labs effectively we need to create a mental model of what
  is going on.
    - Mapping out the code helps us to do this
    - (Reading all of the provided documentation also helps with this - in the
      case of the labs, reading the entire html file, not just the purple boxes)
- Today we're going to go over what lab 4 does and the code, section by
  section
- The goal of this is to understand how to map out code: figure out which tasks
  we need to accomplish and the relevant parts of the code
- We need to know the following:
    - At a high level: what portions of the code are associated with each
      other, and are associated with what we want to do
    - How functions are logically grouped together
    - What parts of the code modify what
    - Where information is stored


b) Questions - Lab 4

- Some of them we'll hit while we go through the code, others will be saved
  until the end


c) What are we trying to solve? [7 min]

- What is the program? -> basic OS that runs tasks

- Two main sections of lab 4:
    - Scheduling
        - The OS already has one scheduling algorithm
        - The OS probably has most of the mechanics for scheduling (selecting a
          process, yielding, interrupts) - we need to find them
        - We need to find where scheduling policies are defined
    - Synchronization
        - From the spec: user-level operations can be preempted at any time,
          while the kernel is never preempted. 
        - To get synchoniztion working properly, we'll either to use 1)
          user-space concurrency primitives, 2) atomic operations, or 3) code
          running in kernel space (since it can't be pre-empted)

- Portions of code we're concerned with:
    - Where is the scheduler? How is the policy decided? When are new processes
      chosen?
    - What data structures define a process? How do we get information about
      processes?
    - Synchronization - what concurrency primitives are available? How to we
      implement system calls to get code to run in kernel space?


d) Surveying the code [33 min]

[3 min]
- Multiple approaches to this: read all the header files, start with main and
  trace control flow...

- For most programs: the program starts at main()
    - From program instructions: kernel is initialized in kernel.c:start()
    - If the instructions weren't provided, how would we find the start? (docs,
      start at a random file and trace the control flow backwards)

- This program is small enough that we can feasibly read all of the code (this
  is not always true) - but we're still only interested in certain sections, so
  we don't want to have to understand every line of every file.

[10 min] -> 13
- kernel.c
    - proc_array: the container for process data structures
        - NPROCS: 5 ("0" is unused, and 1-4 for each user process)
        - process_t: pid, registers, state (empty, runnable, blocked, zombie),
          exit status
    - *current: a pointer to the currently running process - we don't have to
      worry about setting this
    - scheduling_algorithm: this determines which scheduling algorithm is run
    - start()
        - First step is process setup... we probably don't have to worry
          about the init() functions. If we change the process data structure
          down the line, our init code probably goes here
        - 106: sets scheduling algorithm. Using grep, looks like this is the
          only place this is set 
        - Invokes first process: it doesn't look like run() is supposed to
          return.
        - run() contains an assembly instruction to:
            - 1) reload the relevant registers, 2) call the assembly instruction 
              iret: interrupt return
            - This is a kernel -> user level switch that the hardware handles
              when this specific instruction is invoked
    - interrupt()
        - Handler code for each type of interrupt. Using tags: interrupts are
          defined in schedos.h
        - Yield: calls schedule()
        - Exit: sets process data, calls schedule()
        - User-specified calls: does nothing, calls run()
        - Clock: calls schedule()
        - How we would implement an interrupt: set process data, then call run()
          or schedule() depending if the process should continue to run or give
          up control
    - schedule()
        - Decides a new pid, then calls run()
        - This function is only concerned with the scheduler's policy
        - The variable scheduling_algorithm determines which policy is invoked

[5 min] -> 18
- kernel.h
    - procstate, process: we already knew what these do, but now we know
      they're mostly handled by the kernel in kernel.c
    - We have good reason to believe that user processes don't touch these
      (based on how OS's usually work) but can also check:
        - via grep: looks like process_t is used in lower-level code and the
          kernel only. 
        - via grep: kernel.h is only included in x86.c, k-loader.c, kernel.c. 
        - grep flags: -r, -i, -w, -v
    - Other items: definitions that don't look relevant to our task
    - Function headers: interrupt/schedule (we know about these), functions
      having to do with registers, controllers, and the console
      - Looking through these: this is the interface to clear the console and
        read in text

[2 min] -> 20
- x86.h
    - Register struct
    - A whole bunch of assembly wrappers and flags - doubtful we'll need these,
      but this is how to find them

[3 min] -> 23
- process.h
    - System calls
    - yield and exit are defined here - it looks like they call an interrupt
      via assembly while setting a flag. (Do the INT_* items look familiar?)
    - No modifiying of "process" data structures here: this part of the code
      only invokes interrupts
    - If we want to add system calls, we'll want to add them here. Without
      having to figure out additional assembly instructions, it looks like
      we get one status code and one argument (copying the interface of
      sys_exit)

[4 min] -> 27
- p-schedos-app-1.c
    --> Try to open the app code, looking up the name with !ls. To make it easier
      to copy, enter :new and :!ls, then open! p-schedos-app-1.c. :q closes the
      window
    - The character to print is defined here. The other apps define their
      own character and include the file. (When the C preprocessor comes across
      #include, it literally includes the file before passing it to the
      compiler.)
    - Apps "write" by manipulating cursorpos directly: the pointer value is set
      to a code. The screen is "advanced" by incrementing the pointer. (We saw
      function definitions for clearing the screen earlier). This is an unusual
      interface. It looks like a printf() exists (grep -> lib.h), but the apps
      don't use it.
    - The apps yield after every character, and yield forever when they are
      done.

[6 min] -> 33
- Other files: elf.h, lib.h, x86sync.h
    - elf.h: low level code
    - lib.h: small version of the C standard library
        - (Looks like there's a printf and putc after all, even though the apps
          don't use them. Where are they used? Grep brings up many items:
          try piping it to grep -v asm and grep -v tags to filter out the ones
          we don't care about)
        - It looks like it's used in some way in the app assembly code, but we
          don't really care: the pattern of writing to the cursorpos array
          directly works.
    - x86sync.h:
        - atomic_swap, compare_and_swap, fetch_and_add. These could be useful
          for part 3.2 for implementing locking: atomic_swap is one way to
          implement a spinlock. (Thinking of mutexes: grep -r mutex . returns
          nothing - we'll have to implement our own)
        - "inline" header functions work by suggesting to the compiler that the
          function body should be spliced into the code. (this is not a
          guarantee)


e) Review what we've found [5 min]

- Consider the questions from part a:
    - Where is the scheduler? How is the policy decided? When are new processes
      chosen?
        --> kernel.c, an integer value set in start, whenever schedule() is
          invoked
    - What data structures define a process? How do we get information about
      processes?
        --> process_t in kernel.h, lookups by pid via the global proc_array in
          kernel.c, info about the currently running process by *current
    - Synchronization - what concurrency primitives are available? How to we
      implement system calls to get code to run in kernel space?
        --> atomic calls in x86sync.h - no higher-level concurrency primitives,
          such as mutexes, are available
        --> system calls: interface in process.h, implementation in kernel.h.
          Interrupt numbers are defined in schedos.h. To create a system call:
          define it in the header, and add the correct behavior in
          kernel.h:interrupt()


2) Questions on Lab 4 or Homework 5

- Anything not answered or covered in enough detail earlier