Class 5 CS372H 31 January 2012 On the board ------------ 1. Last time 2. Page faults --uses --costs 3. Page replacement 4. Processes 5. Process control: the shell --------------------------------------------------------------------------- 1. Last time --reinforced virtual memory, TLB, page faults --today: finish page faults 2. Page faults Last time: we discussed the mechanics at a high-level (page fault handler runs) A. Uses Last time: --paging (virtual memory larger than RAM) --distributed shared memory --copy-on-write Paging in day-to-day use: --Demand paging --Growing the stack --BSS page allocation --Shared text --Shared libraries --Shared memory --Copy-on-write (fork, mmap, etc.) B. Page faults: costs --What does demand paging (i.e., paging from the disk) cost? --let's look at average memory access time (AMAT) --AMAT = (1-p)*memory access time + p * page fault time, where p is the prob. of a page fault. memory access time ~ 100ns disk access time ~ 10 ms = 10^7 ns --QUESTION: what does p need to be to ensure that paging hurts performance by less than 10%? 1.1*t_M = (1-p)*t_M + p*t_D p = .1*t_M / (t_D - t_M) ~ 10^1 / 10^7 = 10^{-6} so only one access out of 1,000,000 can be a page fault!! --basically, page faults are super-expensive (good thing the machine can do other things during a page fault) --Thrashing is even worse Memory overcommitted -- pages tossed out while still needed Example: --one program touches 50 pages (each equally likely); only have 40 physical page frames --If have enough pages, 100ns/ref --If have too few pages, assume every 5th reference leads to a page fault --4refs x 100ns and 1 page fault x 10ms for disk I/O --this gets us 5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! --What we wanted: virtual memory the size of disk with access time the speed of physical memory --What we have here: memory with access time roughly of disk (2 ms/mem_ref compare to 10 ms/disk_access) Concept is much larger than OSes: need to pay attention to the slow case if it's really slow and common enough to matter. 3. Page replacement A. What's the high-level point? --when dealing with page-to-the-disk, memory is a cache of the disk. --We'll mostly ignore the problem of page replacement, except for the next ten minutes. (JOS doesn't page. If it runs out of physical memory, it returns an error. Plus, memory is a lot cheaper than it used to be, so I'd wager that most of you don't run into physical memory limitations very often.) --The high-level idea is that when there's a page fault (because a page that 'looked' to the process like it was in RAM was actually on the disk), the OS has to decide which page to evict. --Lots of algorithms for this. We mostly won't discuss them. --Most of them use some bits (for accounting inside the page structures) --The two big ones are: the Use bit and the Modified bit. B. Some implementation points Note that many machines, x86 included, maintain 4 bits per page table entry: --*use*: Set when page referenced; cleared by an algorithm like CLOCK (the bit is called "Accessed" on x86) --*modified*: Set when page modified; cleared when page written to disk (the bit is called "Dirty" on x86) --*valid*: Program can reference this page without getting a page fault. Set if page is in memory? [no. it is "only if", not "if". *valid*=1 implies page in physical memory. but page in physical memory does not imply *valid*=1; in other words, *valid*=0 does not imply page is not in physical memory.] --*read-only*: program can read page, but not modify it. Set if page is truly read-only? [no. similar case to above, but slightly confusing because the bit is called "writable". if a page's bits are such that it appears to be read-only, it may or may not be because it is truly "read only". but if a page is truly read-only, it better have its bits set to be read-only.] Do we actually need Modified and Use bits in the page tables set by the harware? --[again, x86 calls these the Dirty and Accessed bits] --answer: no. --how could we simulate them? --for the Modified [x86: Dirty] bit, just mark all pages read-only. Then if a write happens, the OS gets a page fault and can set the bit itself. Then the OS should mark the page writable so that this page fault doesn't happen again --for the Reffed [x86: Accessed] bit, just mark all pages as not present (even if they are present). Then if a reference happens, the OS gets a page fault, and can set the bit, after which point the OS should mark the page present (i.e., set the PRESENT bit). C. Is caching always a win? No. Here are some cases when it may not buy anything: --process doesn't reuse memory --process reuses memory but it doesn't fit. --individually, all processes fit, but too much for the system what do we do? --well, in the first two cases, there's nothing you can do, other than restructuring your computation or buying memory (e.g., expensive hardware that keeps entire customer database in RAM) --in the third case, can and must shed load. how? two approaches: a. working set b. page fault frequency a. working set --only run a set of processes s.t. the union of their working sets fit in memory --book defines working set. short version: the pages a processed has touched over some trailing window of time b. page fault frequency --track the metric (# page faults/instructions executed) --if that thing rises above a threshold, and there is not enough memory on the system, swap out the process --------------------------------------------------------------------------- admin --project partners due Friday. we expect two emails from each team. the redundancy helps eliminate errors. --MikeD concurrency lecture on Thursday --Thursday is a discussion day. We'll call on people. Don't show up if you haven't done the reading. --------------------------------------------------------------------------- 4. Processes write on the board ------------------ * what is a process? * how do they interact with the operating system? * how do they come into being? A. What is a process? --abstraction of a virtual machine (virtual memory, virtual CPU, etc.). instance of a running program. [draw picture] --here's an implementation: PCB ----------------- | process id | | state | (ready, runnable, blocked, etc.) | user id | | IP | | open file | | VM structures | | registers | | ..... | (signal mask, terminal, priority, ...) ---------------- called "proc" in Unix, "task_struct" in Linux, and "struct env" in JOS --each one has its own cr3 and hence its own view of virtual memory, which contains: --program code (aka "text") --constants --zeroed-out area for variables --stack --heap --its own registers --state of OS resources --very little else is actually needed, but a modern process does have a lot of associated information: --signal state --UID, signal mask, controlling terminal, priority, whether being debugged, etc., etc. --typically has less privilege than operating system --OS can manipulate the hardware. processes cannot. --OS (obviously) can manipulate OS abstractions. processes cannot. --the hardware knows the difference between privileged and unprivileged mode (on the x86, these are called ring 0 and ring 3. The middle rings aren't used in the classical setup, but they are used in some approaches to virtualization.) B. How do processes interact with the operating system? --syscalls: the interface to the operating system. --lots of these --on Unix, type "man 2 " to get documentation. --here are three relevant ones on Unix: int fd = open(const char* path, int flags) write(fd, const void *, size_t) read(fd, void *, size_t) --fd is a *file descriptor*. this is an abstraction, provided by the operating system, that represents an open file --every process can usually expect to begin life with three file descriptors already open: 0: represents the input to the process (e.g., tied to terminal) 1: represents the output 2: represents the error output these are sometimes known as stdin, stdout, stderr --we mentioned in class 1 that Unix hides for processes the difference between a device and a file. This is an example. --we'll see in ten minutes or so how powerful this is. --here are some other system calls (these are included in the notes so that you know what the basic interface to a Unix-like OS looks like): --int open(char*, int flags, [, int mode]); --int read(int fd, void*, int nbytes): --int write(int fd, void* buf, int nbytes); --off_t lseek(int fd, off_t pos, int whence) --int close(int fd); --int kill(int pid, int signal) --void exit (int status) --int fork(void) --int waitpid(int pid, int* stat, int opt) --int execve(char* prog, char** argv, char** envp) --int dup2 (int oldfd, int newfd) --int pipe(int fds[2]) C. How does a process come into being? --answer: another system call! --in Unix, it is fork() --on JOS, it is exo_fork() --fork creates an exact copy (almost; the return value is different). --thus, what happens if a system had two important users, and one of them runs a process that executes this code: for (i = 0; i < 10; i++) { fork(); } while (1) {} [answer: one of the users gets a LOT more of the CPU than another] --what behavior do you want? [this actually corresponds to research. OSes are only just applying resource containers.] 5. Process control: the shell A. How does the shell start programs? --example: $ ls [see panel 1 on handout; go line-by-line] --calls fork(), which creates a copy of the shell. now there are two copies of the shell running --then calls exec(), which loads the new program's instructions into memory and begins executing them. --(exec invokes the loader, which we'll talk about.) while (1) { write(1, "$ ", 2); readcommand(command, args); // parse input if ((pid = fork()) == 0) // child? exec(command, args, 0); else if (pid > 0) // parent? wait(0); //wait for child else perror("failed to fork"); } --how can shell wait for the end of a process? --with wait() or waitpid() system calls --WAIT, WHY ARE FORK() AND EXEC() SEPARATE? * We will come back to this. B. Redirection * What is redirection? $ ls > tmp1 * How is it implemented? key lines: just before exec, shell does: close(1) open ("tmp1", O_TRUNC | O_CREAT | O_WRONLY, 0666) which automatically assigns tmp1 to be fd 1 * What about $ sh < script > tmp1 where script contains echo abc echo def [draw picture]