Released Tuesday, September 15, 2009
Part A due Friday, October 9, 2009, 7:00 PM
Part B due Friday, October 16, 2009, 7:00 PM
In this lab you will implement the basic kernel facilities required to get a protected user-mode environment (i.e., "process") running. You will enhance the JOS kernel to set up the data structures to keep track of user environments, create a single user environment, load a program image into it, and start it running. You will also make the JOS kernel capable of handling any system calls the user environment makes and handling any other exceptions it causes.
Note: In this lab, the terms environment and process are interchangeable - they have roughly the same meaning. We introduce the term "environment" instead of the traditional term "process" in order to stress the point that JOS environments do not provide the same semantics as UNIX processes, even though they are roughly comparable.
Use Git to commit your Lab 2 source, fetch the latest version of the course repository, and then create a local branch called lab3 based on our lab3 branch, origin/lab3:
mig% cd ~/CS395T/lab mig% git commit -am 'my solution to lab2' Created commit 734fab7: my solution to lab2 4 files changed, 42 insertions(+), 9 deletions(-) mig% git pull Already up-to-date. mig% git checkout -b lab3 origin/lab3 Branch lab3 set up to track remote branch refs/remotes/origin/lab3. Switched to a new branch "lab3" mig% git merge lab2 Merge made by recursive. kern/pmap.c | 42 +++++++++++++++++++ 1 files changed, 42 insertions(+), 0 deletions(-) mig%
Lab 3 contains a number of new source files, which you should browse:
inc/ | env.h | Public definitions for user-mode environments |
trap.h | Public definitions for trap handling | |
syscall.h | Public definitions for system calls from user environments to the kernel | |
lib.h | Public definitions for the user-mode support library | |
kern/ | env.h | Kernel-private definitions for user-mode environments |
env.c | Kernel code implementing user-mode environments | |
trap.h | Kernel-private trap handling definitions | |
trap.c | Trap handling code | |
trapentry.S | Assembly-language trap handler entry-points | |
syscall.h | Kernel-private definitions for system call handling | |
syscall.c | System call implementation code | |
lib/ | Makefrag | Makefile fragment to build user-mode library, obj/lib/libuser.a |
entry.S | Assembly-language entry-point for user environments | |
libmain.c | User-mode library setup code called from entry.S | |
syscall.c | User-mode system call stub functions | |
console.c | User-mode implementations of putchar and getchar, providing console I/O | |
exit.c | User-mode implementation of exit | |
panic.c | User-mode implementation of panic | |
user/ | * | Various test programs to check kernel lab 3 code |
In addition, a number of the source files we handed out for lab2 are modified in lab3. To see the differences, you can type:
$ git diff lab2 | more
As in lab 2, you will need to do all of the regular exercises described in the lab and at least one challenge problem. Additionally, you will need to write up brief answers to the questions posed in the lab and a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem. If you implement more than one challenge problem, you only need to describe one of them in the write-up, though of course you are welcome to do more. Place the write-up in a file called answers.txt (plain text) or answers.html (HTML format) in the top level of your lab3 directory before handing in your work.
info idt
will print
the current interrupt descriptor table (IDT).
This is useful for checking whether you set it up correctly.
vb
command sets a breakpoint
at a particular CS:EIP address.
Since the kernel code segment selector is 8,
vb 8:0xf0101234
sets a breakpoint at the given kernel address.
Similarly,
since the user segment selector is 0x1b, vb 0x1b:0x80020
sets a breakpoint at the given user address.
make grade
tests
does not mean your code is perfect. It may have subtle bugs that will
only be tickled by future labs.
All your kernel code is running in the same address space with no protection.
If you get weird crashes
that don't seem to be explainable by a bug in the crashing code,
it's likely that they're due to a bug somewhere else that is
modifying memory used by the crashing code.
As before,
you can test your code against our test scripts
by running make grade
.
When you are done,
run make handin
to tar up and hand in your source tree.
Again, if you are submitting late, please email lab3-handin.tar.gz
to the TA.
As you can see in kern/env.c, the kernel maintains three main global variables pertaining to environments:
struct Env *envs = NULL; /* All environments */ struct Env *curenv = NULL; /* the current env */ static struct Env_list env_free_list; /* Free list */
Once JOS gets up and running, the envs pointer points to an array of Env structures representing all the environments in the system. In our design, the JOS kernel will support a maximum of NENV simultaneously active environments, although there will typically be far fewer running environments at any given time. (NENV is a constant #define'd in inc/env.h.) Once it is allocated, the envs array will contain a single instance of the Env data structure for each of the NENV possible environments.
The JOS kernel keeps all of the inactive Env structures
on the env_free_list
.
This design allows easy allocation and
deallocation of environments,
as they merely have to be added to or removed from the free list.
The kernel uses the curenv
variable
to keep track of the currently executing environment at any given time.
During boot up, before the first environment is run,
curenv is initially set to NULL
.
struct Env { struct Trapframe env_tf; // Saved registers LIST_ENTRY(Env) env_link; // Free list link pointers envid_t env_id; // Unique environment identifier envid_t env_parent_id; // env_id of this env's parent unsigned env_status; // Status of the environment uint32_t env_runs; // Number of times environment has run // Address space pde_t *env_pgdir; // Kernel virtual address of page dir physaddr_t env_cr3; // Physical address of page dir };
Here's what the Env fields are for:
env_free_list
.
See inc/queue.h
for details.
ENV_FREE
:
ENV_RUNNABLE
:
ENV_NOT_RUNNABLE
:
Like a Unix process, a JOS environment couples the concepts of
"thread" and "address space". The thread is defined primarily by the
saved registers (the env_tf
field), and the address space
is defined by the page directory and page tables pointed to by
env_pgdir
and env_cr3
. To run an
environment, the kernel must set up the CPU with both the saved
registers and the appropriate address space.
Note that in Unix-like systems, individual environments have their own kernel stacks. In JOS, however, only one environment can be active in the kernel at once, so JOS needs only a single kernel stack.
In lab 2,
you allocated memory in i386_vm_init()
for the pages[]
array,
which is a table the kernel uses to keep track of
which pages are free and which are not.
You will now need to modify i386_vm_init() further
to allocate a similar array of Env structures,
called envs.
Exercise 1. Modify i386_vm_init() in kern/pmap.c to allocate and map the envs array. This array consists of exactly NENV instances of the Env structure, laid out consecutively in the kernel's virtual address space starting at address UENVS (defined in inc/memlayout.h). You should allocate and map this array the same way as you did the pages array. |
You will now write the code in kern/env.c necessary to run a user environment. Because we do not yet have a filesystem, we will set up the kernel to load a static binary image that is embedded within the kernel itself. JOS embeds this binary in the kernel as a ELF executable image.
The Lab 3 GNUmakefile generates a number of binary images in the obj/user/ directory. If you look at kern/Makefrag, you will notice some magic that "links" these binaries directly into the kernel executable as if they were .o files. The -b binary option on the linker command line causes these files to be linked in as "raw" uninterpreted binary files rather than as regular .o files produced by the compiler. (As far as the linker is concerned, these files do not have to be ELF images at all - they could be anything, such as text files or pictures!) If you look at obj/kern/kernel.sym after building the kernel, you will notice that the linker has "magically" produced a number of funny symbols with obscure names like _binary_obj_user_hello_start, _binary_obj_user_hello_end, and _binary_obj_user_hello_size. The linker generates these symbol names by mangling the file names of the binary files; the symbols provide the regular kernel code with a way to reference the embedded binary files.
In i386_init() in kern/init.c you'll see code to run one of these binary images in an environment. However, the critical functions to set up user environments are not complete; you will need to fill them in.
Exercise 2.
In the file env.c ,
finish coding the following functions:
As you write these functions,
you might find the new cprintf verb r = -E_NO_MEM; panic("env_alloc: %e", r);will panic with the message "env_alloc: out of memory". |
Below is a call graph of the code up to the point where the user code is invoked. Make sure you understand the purpose of each step.
start
(kern/entry.S
)
i386_init
cons_init
i386_detect_memory
i386_vm_init
page_init
env_init
idt_init
(still incomplete at this point)
env_create
env_run
env_pop_tf
Once you are done you should compile your kernel and run it under Bochs. If all goes well, your system should enter user space and execute the hello binary until it makes a system call with the INT instruction. At that point there will be trouble, since JOS has not set up the hardware to allow any kind of transition from user space into the kernel. A real CPU would reset and reboot; Bochs prints a message about "3rd exception with no resolution" and exits.
Set a Bochs breakpoint at env_pop_tf, which should be the last function you hit before actually entering user mode. Step through this function; the processor should enter user mode after the iret instruction. You should then see the first instruction in the user environment's executable, which is the cmpl instruction at the label start in lib/entry.S. Now use vb 0x1b:0x... to set a breakpoint at the int $0x30 in sys_cputs() in hello (see obj/user/hello.asm for the user-space address). This int is the system call to display a character to the console. If you cannot execute as far as the int, then something is wrong with your address space setup or program loading code; go back and fix it before continuing.
Exercise 3. Read Chapter 9, Exceptions and Interrupts in the 80386 Programmer's Manual (or Chapter 5 of the IA-32 Developer's Manual), if you haven't already. |
In this lab we generally follow Intel's terminology for interrupts, exceptions, and the like. However, terms such as exception, trap, interrupt, fault and abort have no standard meaning across architectures or operating systems, and are often used without regard to the subtle distinctions between them on a particular architecture such as the x86. When you see these terms outside of this lab, the meanings might be slightly different.
In order to ensure that these protected control transfers are actually protected, the processor's interrupt/exception mechanism is designed so that the code currently running when the interrupt or exception occurs does not get to choose arbitrarily where the kernel is entered or how. Instead, the processor ensures that the kernel can be entered only under carefully controlled conditions. On the x86, two mechanisms provide this protection:
The Interrupt Descriptor Table. The processor ensures that interrupts and exceptions can only cause the kernel to be entered at a few specific, well-defined entry-points determined by the kernel itself, and not by the code running when the interrupt or exception is taken.
The x86 allows up to 256 different interrupt or exception entry points into the kernel, each with a different interrupt vector. A vector is a number between 0 and 256. An interrupt's vector is determined by the source of the interrupt: different devices, error conditions, and application requests to the kernel generate interrupts with different vectors. The CPU uses the vector as an index into the processor's interrupt descriptor table (IDT), which the kernel sets up in kernel-private memory, much like the GDT. From the appropriate entry in this table the processor loads:
The Task State Segment. The processor needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could compromise the kernel.
For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it also switches to a stack in the kernel's memory. A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.
Although the TSS is large and can potentially serve a variety of purposes, JOS only uses it to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in JOS is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode. JOS doesn't use any other TSS fields.
All of the synchronous exceptions that the x86 processor can generate internally use interrupt vectors between 0 and 31, and therefore map to IDT entries 0-31. For example, a page fault always causes an exception through vector 14. Interrupt vectors greater than 31 are only used by software interrupts, which can be generated by the INT instruction, or asynchronous hardware interrupts, caused by external devices when they need attention.
In this section we will extend JOS to handle the internally generated x86 exceptions in vectors 0-31. In the next section we will make JOS handle software interrupt vector 0x30, which JOS (fairly arbitrarily) uses as its system call interrupt vector. In Lab 4 we will extend JOS to handle externally generated hardware interrupts such as the clock interrupt.
+--------------------+ KSTACKTOP | 0x00000 old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 <---- ESP +--------------------+
CS:EIP
to point to the handler function defined there.
For certain types of x86 exceptions, in addition to the "standard" five words above, the processor pushes onto the stack another word containing an error code. The page fault exception, number 14, is an important example. See the 80386 manual to determine for which exception numbers the processor pushes an error code, and what the error code means in that case. When the processor pushes an error code, the stack would look as follows at the beginning of the exception handler when coming in from user mode:
+--------------------+ KSTACKTOP | 0x00000 old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 | error code | " - 24 <---- ESP +--------------------+
The processor can take exceptions and interrupts both from kernel and user mode. It is only when entering the kernel from user mode, however, that the x86 processor automatically switches stacks before pushing its old register state onto the stack and invoking the appropriate exception handler through the IDT. If the processor is already in kernel mode when the interrupt or exception occurs (the low 2 bits of the CS register are already zero), then the kernel just pushes more values on the same kernel stack. In this way, the kernel can gracefully handle nested exceptions caused by code within the kernel itself. This capability is an important tool in implementing protection, as we will see later in the section on system calls.
If the processor is already in kernel mode and takes a nested exception, since it does not need to switch stacks, it does not save the old SS or ESP registers. For exception types that do not push an error code, the kernel stack therefore looks like the following on entry to the exception handler:
+--------------------+ <---- old ESP | old EFLAGS | " - 4 | 0x00000 | old CS | " - 8 | old EIP | " - 12 +--------------------+
For exception types that push an error code, the processor pushes the error code immediately after the old EIP, as before.
There is one important caveat to the processor's nested exception capability. If the processor takes an exception while already in kernel mode, and cannot push its old state onto the kernel stack for any reason such as lack of stack space, then there is nothing the processor can do to recover, so it simply resets itself. Needless to say, the kernel should be designed so that this can't happen.
You should now have the basic information you need in order to set up the IDT and handle exceptions in JOS. For now, you will set up the IDT to handle interrupt vectors 0-31 (the processor exceptions) and interrupts 32-47 (the device IRQs).
The header files inc/trap.h and kern/trap.h contain important definitions related to interrupts and exceptions that you will need to become familiar with. The file kern/trap.h contains definitions that are strictly private to the kernel, while inc/trap.h contains definitions that may also be useful to user-level programs and libraries.
Note: Some of the exceptions in the range 0-31 are defined by Intel to be reserved. Since they will never be generated by the processor, it doesn't really matter how you handle them. Do whatever you think is cleanest.
The overall flow of control that you should achieve is depicted below:
IDT trapentry.S trap.c +----------------+ | &handler1 |---------> handler1: trap (struct Trapframe *tf) | | // do stuff { | | call trap // handle the exception/interrupt | | // undo stuff } +----------------+ | &handler2 |--------> handler2: | | // do stuff | | call trap | | // undo stuff +----------------+ . . . +----------------+ | &handlerX |--------> handlerX: | | // do stuff | | call trap | | // undo stuff +----------------+
Each exception or interrupt should have
its own handler in trapentry.S
and idt_init() should initialize the IDT with the addresses
of these handlers.
Each of the handlers should build a struct Trapframe
(see inc/trap.h
) on the stack and call
trap()
(in trap.c
)
with a pointer to the Trapframe.
trap()
handles the
exception/interrupt or dispatches to a specific
handler function.
If and when trap() returns,
the code in trapentry.S
restores the old CPU state saved in the Trapframe
and then uses the iret instruction
to return from the exception.
Exercise 4.
Edit trapentry.S and trap.c and
implement the features described above. The macros
TRAPHANDLER and TRAPHANDLER_NOEC in
trapentry.S should help you, as well as the T_*
defines in inc/trap.h . You will need to add an
entry point in trapentry.S (using those macros)
for each trap defined in inc/trap.h , and
you'll have to provide _alltraps which the
TRAPHANDLER macros refer to. You will
also need to modify idt_init() to initialize the
idt to point to each of these entry points
defined in trapentry.S ; the SETGATE
macro will be helpful here.
Hint: your _alltraps should:
Consider using the Test your trap handling code using some of the test programs in the user directory that cause exceptions before making any system calls, such as user/divzero. You should be able to get make grade to succeed on the divzero, softint, and badsegment tests at this point. |
Challenge! You probably have a lot of very similar code
right now, between the lists of TRAPHANDLER in
trapentry.S and their installations in
trap.c . Clean this up. Change the macros in
trapentry.S to automatically generate a table for
trap.c to use. Note that you can switch between
laying down code and data in the assembler by using the
directives .text and .data .
|
Exercise 5. Modify trap_dispatch() to dispatch page fault exceptions to page_fault_handler(). You should now be able to get make grade to succeed on the faultread, faultreadkernel, faultwrite, and faultwritekernel tests. If any of them don't work, figure out why and fix them. |
You will further refine the kernel's page fault handling below, as you implement system calls.
Exercise 6. Modify trap_dispatch() to make breakpoint exceptions invoke the kernel monitor. You should now be able to get make grade to succeed on the breakpoint test. |
Challenge!
Modify the JOS kernel monitor so that
you can 'continue' execution from the current location
(e.g., after the int3,
if the kernel monitor was invoked via the breakpoint exception),
and so that you can single-step one instruction at a time.
You will need to understand certain bits
of the EFLAGS register
in order to implement single-stepping.
Optional: If you're feeling really adventurous, find some x86 disassembler source code - e.g., by ripping it out of Bochs, or out of GNU binutils, or just write it yourself - and extend the JOS kernel monitor to be able to disassemble and display instructions as you are stepping through them. Combined with the symbol table loading from lab 2, this is the stuff of which real kernel debuggers are made. |
SETGATE
from idt_init
). Why?
How did you need to set it in order to get the breakpoint exception
to work as specified above?
In the JOS kernel, we will use the int
instruction, which causes a processor interrupt.
In particular, we will use int $0x30
as the system call interrupt.
We have defined the constant
T_SYSCALL
to 0x30 for you. You will have to
set up the interrupt descriptor to allow user processes to
cause that interrupt. Note that interrupt 0x30 cannot be
generated by hardware, so there is no ambiguity caused by
allowing user code to generate it.
The application will pass the system call number and
the system call arguments in registers. This way, the kernel won't
need to grub around in the user environment's stack
or instruction stream. The
system call number will go in %eax
, and the
arguments (up to five of them) will go in %edx
,
%ecx
, %ebx
, %edi
,
and %esi
, respectively. The kernel passes the
return value back in %eax
. The assembly code to
invoke a system call has been written for you, in
syscall()
in lib/syscall.c
. You
should read through it and make sure you understand what
is going on.
Exercise 7.
Add a handler in the kernel
for interrupt vector T_SYSCALL .
You will have to edit kern/trapentry.S and
kern/trap.c 's idt_init() . You
also need to change trap_dispatch() to handle the
system call interrupt by calling syscall()
(defined in kern/syscall.c)
with the appropriate arguments,
and then arranging for
the return value to be passed back to the user process
in %eax .
Finally, you need to implement syscall() in
kern/syscall.c .
Make sure syscall() returns -E_INVAL
if the system call number is invalid.
You should read and understand lib/syscall.c
(especially the inline assembly routine) in order to confirm
your understanding of the system call interface.
You may also find it helpful to read inc/syscall.h .
Run the user/hello program under your kernel.
It should print " |
Challenge!
Implement system calls using the sysenter and
sysexit instructions instead of using
int 0x30 and iret . The sysenter/sysexit instructions were designed
by Intel to be faster than int/iret . They do
this by using registers instead of the stack and by making
assumptions about how the segmentation registers are used.
The exact details of these instructions can be found in Volume
2B of the Intel reference manuals.The easiest way to add support for these instructions in JOS is to add a sysenter_handler in
kern/trapentry.S that creates the same trap frame
that is normally created by an int 0x30
instruction (being sure to save the correct return address and
stack pointer provided by the user environment). Then,
instead of calling into trap , push the arguments
to syscall and call syscall
directly. Once syscall returns, set everything
up for and execute the sysexit instruction. You will also need to add code to kern/init.c to
set up the necessary model specific registers (MSRs). Look at
the enable_sep_cpu function in this diff for an
example of this, and you can find an implementation of
wrmsr to add to /inc/x86.h here).
Finally, lib/syscall.c must be changed to support
making a system call with sysenter . Here is a
possible register layout for the sysenter
instruction:eax - syscall number edx, ecx, ebx, edi - arg1, arg2, arg3, arg4 esi - return pc ebp - return esp esp - trashed by sysenterGCC's inline assembler does not support directly loading values into ebp , so you will need to add code to
save (push) and restore (pop) it yourself (and you may want to
do the same thing for esi as well). The return
address can be put into esi by using an
instruction like leal after_sysenter_label,
%%esi . Note that this only supports 4 arguments, so you will need to leave the old method of doing system calls around if you want to support 5 argument system calls as well. Finally, in order for Bochs to support these instructions, it must be compiled with the --enable-sep option, in
addition to the other options listed on the tools page.
|
A user program starts running at the top of
lib/entry.S
. After some setup, this code
calls libmain()
, in lib/libmain.c
.
You should modify libmain()
to initialize the global pointer
env
to point at this environment's
struct Env
in the envs[]
array.
(Note that lib/entry.S
has already defined envs
to point at the UENVS
mapping you set up in Part A.)
Hint: look in inc/env.h
and use
sys_getenvid
.
libmain()
then calls umain
, which,
in the case of the hello program, is in
user/hello.c
. Note that after printing
"hello, world
", it tries to access
env->env_id
. This is why it faulted earlier.
Now that you've initialized env
properly,
it should not fault.
If it still faults, you probably haven't mapped the
UENVS
area user-readable (back in Part A in
pmap.c
; this is the first time we've actually
used the UENVS
area).
Exercise 8.
Add the required code to the user library, then
boot your kernel. You should see user/hello
print "hello, world " and then print "i
am environment 00000800 ".
user/hello then attempts to "exit"
by calling sys_env_destroy()
(see lib/libmain.c and lib/exit.c).
Since the kernel currently only supports one user environment,
it should report that it has destroyed the only environment
and then drop into the kernel monitor.
|
Memory protection is a crucial feature of an operating system, ensuring that bugs in one program cannot corrupt other programs or corrupt the operating system itself.
Operating systems usually rely on hardware support to implement memory protection. The OS keeps the hardware informed about which virtual addresses are valid and which are not. When a program tries to access an invalid address or one for which it has no permissions, the processor stops the program at the instruction causing the fault and then traps into the kernel with information about the attempted operation. If the fault is fixable, the kernel can fix it and let the program continue running. If the fault is not fixable, then the program cannot continue, since it will never get past the instruction causing the fault.
As an example of a fixable fault, consider an automatically extended stack. In many systems the kernel initially allocates a single stack page, and then if a program faults accessing pages further down the stack, the kernel will allocate those pages automatically and let the program continue. By doing this, the kernel only allocates as much stack memory as the program needs, but the program can work under the illusion that it has an arbitrarily large stack.
System calls present an interesting problem for memory protection. Most system call interfaces let user programs pass pointers to the kernel. These pointers point at user buffers to be read or written. The kernel then dereferences these pointers while carrying out the system call. There are two problems with this:
For both of these reasons the kernel must be extremely careful when handling pointers presented by user programs.
You will now solve these two problems with a single mechanism that scrutinizes all pointers passed from userspace into the kernel. When a program passes the kernel a pointer, the kernel will check that the address is in the user part of the address space, and that the page table would allow the memory operation.
Thus, the kernel will never suffer a page fault due to dereferencing a user-supplied pointer. If the kernel does page fault, it should panic and terminate.
Exercise 9.
Change kern/trap.c to panic if a page
fault happens in kernel mode.
Hint: to determine whether a fault happened in user mode or
in kernel mode, check the low bits of the Read
Change
Change [00001000] user_mem_check assertion failure for va 00000001 [00001000] free env 00001000 Destroyed the only environment - nothing more to do! |
Note that the same mechanism you just implemented also works for
malicious user applications (such as user/evilhello
).
Exercise 10.
Change [00000000] new env 00001000 [00001000] user_mem_check assertion failure for va f0100020 [00001000] free env 00001000 |
This completes the lab. Make sure you pass all the make grade tests, and hand in your work with make turnin.
Last updated: Tue Sep 15 21:17:07 -0500 2009 [validate xhtml]