CS372H Spring 2011 Homework 2 Solutions

Problem 1

Given the following program that uses three memory segments in an address space as described in class (code segment, data segment ,and stack segment):

char a[100];
main(int argc, char ** argv)
{
    int d;
    staticdouble b;
    char *s = "boo", * p;

p = malloc(300);
return 0;
}

Identify the segment in which each variable resides and indicate if the variable is private to the thread or is shared among threads.
Be careful.

The array a, the static variable b, and the string constant "boo" are all in the data segment and are shared across threads.
The arguments argc and argv are in the stack segment and are private to the thread.
The automatic variables d, s, and p are also in the stack segment and are private to the thread.
Note that the variable p itself is in the stack segment (private), but the object it points to is in the data segment which is a shared region of an address space(that's why the be careful warning). The contents of s consist of the address of the string "boo" which happens to be in the data segment (shared).

Problem 2 True or false. A virtual memory system that uses paging is vulnerable to external fragmentation.

False. One of the advantages to paging is that it does not result in external fragmentation because the physical pages are parceled out, and the address space grown accordingly (e.g., for the stack and heap), at the granularity of pages.

Problem 3 (based on reading; not covered in lecture)

Segmented memory schemes (as well as user-level memory allocators like malloc() and, later in class, some disk placement schemes) use various policies for fitting variable sized allocation units into variable-sized spaces. Concoct a scenario in which Best-fit allocation outperforms First-fit, Worst-fit, and Buddy allocation (see the book for details on these algorithms). Repeat for First-fit v. the others, Worst-fit v. the others, and Buddy allocation v. the others.

This solution is after Andreas Reifert, a distinguished student in the class of Fall 1999.
(x) A yK - means: in step x we allocate yK of memory
(x) D (y) - means: in step x we deallocate the memory allocated in step y

Best fit outperforms:
(1) A 5K
(2) A 8K
(3) A 3K - buddy cannot do this
(4) D (1)
(5) D (3)
(6) A 3K - first and worst take the 5K part, best the 3K part
(7) A 5K - first and worst cannot do this, best can #

Worst fit outperforms:
(1) A 3K
(2) A 8K
(3) A 5K - buddy cannot do this
(4) D (1)
(5) D (3)
(6) A 2K - first and best take the 3K part, worst the 5K part
(7) A 3K - first and best take the 5K part, worst a 3K
(8) A 3K - first and best cannot do this, worst can

First fit outperforms:
(1) A 4K
(2) A 2K
(3) A 2K
(4) A 3K
(5) A 5K - buddy cannot do this
(6) D (1)
(7) D (3)
(8) D (5)
(9) A 1K - best takes the 2K part, worst the 5K part, first the 4K part
(10) A 3K - best takes the 4K part, worst a 4K part, first the 3K part
(11) A 2K - best takes the 5K part, worst the 4K part, first the 2K part
(12) A 5K - best and worst cannot do this, first can

Buddy outperforms:
(1) A 2K
(2) A 4K
(3) A 8K
(4) D (1) - only buddy can merge the 2K with the neighbouring 2K to a 4K part
(5) A 4K - best, worst and first cannot do this, buddy can

Problem 4 (based on reading; not covered in lecture)

Describe the data structures required to implement best-fit memory allocation.

Each partition will need the following data structure:

typedef struct
{
    unsigned base_address;         // starting address of the partition
    unsigned size;                &nb\ sp;      // size of the partition in bytes
    enum {free, in_use} status;   // status of the partition
}
PartitionRecord;

Then, we need to implement a data structure that contains the partition records and such that:
* partition records are sorted by size
* partition records can be inserted into the data structure efficiently
* partition records can removed from the data structure efficiently
* the smallest partition larger than a particular size can be located efficiently
* allows a partition to be checked against its adjacent neighbors so that it can be merged with them if they are all free (or a subset thereof)

There is no known data structure that can achieve all these feats simultaneously. There are however two reasonable alternatives:

B-trees (log n search, log n insertion, and log n removal)
A bucket table, where a bucket contains all partition records of partitions that are within a given size range. This case, O(n) search, O(n) insertion, and O(n) deletion, but the expected time would be much less in practice, almost O(1) (unlike B-trees).

In addition, you will need a doubly-linked list of partitions sorted by size to facilitate merging of free partitions.

Problem 5

This question refers to an architecture using segmentation with paging (you may need to consult the text for details). In this architecture, the 32-bit virtual address is divided into fields as follows:

4 bit segment number

12 bit page number

16 bit offset

Here are the relevant tables (all values in hexadecimal):

Segment Table		Page Table A		Page Table B
0	Page Table A	0	CAFE	0	F000
1	Page Table B	1	DEAD	1	D8BF
x	(rest invalid)	2	BEEF	x	(rest invalid)
		3	BA11
		x	(rest invalid)

Find the physical address corresponding to each of the following virtual addresses (answer "bad virtual address" if the virtual address is invalid):

00000000
20022002
10015555

0xCAFE0000
bad virtual address (segment number doesn't match)
0xD8BF5555

Problem 6

In a 32-bit machine we subdivide the virtual address into 4 segments as follows:

10-bit

8-bit

6-bit

8 bit

We use a 3-level page table, such that the first 10-bit are for the first level and so on.

What is the page size in such a system?
What is the size of a page table for a process that has 256K of memory starting at address 0?
What is the size of a page table for a process that has a code segment of 48K starting at address 0x1000000, a data segment of 600K starting at address 0x80000000 and a stack segment of 64K starting at address 0xf0000000 and growing upward (like in the PA-RISC of HP)?

The page field is 8-bit wide, then the page size is 256 bytes.
Using the subdivision above, the first level page table points to 1024 2nd level page tables, each pointing to 256 3rd page tables, each containing 64 pages. The program's address space consists of 1024 pages, thus we need we need 16 third-level page tables. Therefore we need 16 entries in a 2nd level page table, and one entry in the first level page table. Therefore the size is: 1024 entries for the first table, 256 entries for the 2nd level page table, and 16 3rd level page table containing 64 entries each. Assuming a 32-bit physical address, we can use 4 bytes per entry (24 bit physical page number with 8 bits left for control.) Assuming 4 bytes per entry, the space required is 1024 * 4 + 256 * 4 (one second-level paget table) + 16 * 64 * 4 (16 third-level page tables) = 9216 bytes.
First, the stack, data and code segments are at addresses that require having 3 page tables entries active in the first level page table. For 64K, you need 256 pages, or 4 third-level page tables. For 600K, you need 2400 pages, or 38 third-level page tables and for 48K you need 192 pages or 3 third-level page tables. Assuming 4 bytes per entry, the space required is 1024 * 4 + 256 * 3 * 4 (3 second-level page tables) + 64 * (38+4+3)* 4 (38 third-level page tables for data segment, 4 for stack and 3 for code segment) = 18688 bytes.

Problem 7

A computer system has a 36-bit virtual address space with a page size of 8K, and 4 bytes per page table entry.

How many pages are in the virtual address space?
What is the maximum size of addressable physical memory in this system?
If the average process size is 8GB, would you use a one-level, two-level, or three-level page table? Why?
Compute the average size of a page table in question 3 above.

A 36 bit address can address 2^36 bytes in a byte addressable machine. Since the size of a page 8K bytes (2^13), the number of addressable pages is 2^36 / >2^13 = 2^23
With 4 byte entries in the page table we can reference 2^32 pages. Since each page is 2^13 B long, the maximum addressable physical memory size is 2^32 * 2^13 = 2^45 B (assuming no protection bits are used).

8 GB = 2^33 B

We need to analyze memory and time requirements of paging schemes in order to make a decision. Average process size is considered in the calculations below.

1 Level Paging
Since we have 2^23 pages in each virtual address space, and we use 4 bytes per page table entry, the size of the page table will be 2^23 * 2^2 = 2^25. This is 1/256 of the process' own memory space, so it is quite costly. (32 MB)

2 Level Paging
The address would be divided up as 12 | 11 | 13 since we want page table pages to fit into one page and we also want to divide the bits roughly equally.

Since the process' size is 8GB = 2^33 B, I assume what this means is that the total size of all the distinct pages that the process accesses is 2^33 B. Hence, this process accesses 2^33 / 2^13 = 2^20 pages. The bottom level of the page table then holds 2^20 references. We know the size of each bottom level chunk of the page table is 2^11 entries. So we need 2^20 / 2^11 = 2^9 of those bottom level chunks.

The total size of the page table is then:

//size of the outer page table	//total size of the inner pages
1 * 2^12 * 4	+ 2^9 * 2^11 * 4	= 2^20 * ( 2^-6 + 4) ~4MB

3 Level Paging
For 3 level paging we can divide up the address as follows:
8 | 8 | 7 | 13

Again using the same reasoning as above we need 2^20/2^7 = 2^13 level 3 page table chunks. Each level 2 page table chunk references 2^8 level 3 page table chunks. So we need 2^13/2^8 = 2^5 level-2 tables. And, of course, one level-1 table.

The total size of the page table is then:

//size of the outer page table	//total size of the level 2 tables	//total size of innermost tables
1 * 2^8 * 4	2^5 * 2^8 *4	2^13 * 2^7 * 4	~4MB

As easily seen, 2-level and 3-level paging require much less space then level 1 paging scheme. And since our address space is not large enough, 3-level paging does not perform any better than 2 level paging. Due to the cost of memory accesses, choosing a 2 level paging scheme for this process is much more logical.

Calculations are done in answer no. 3.

Problem 8

In a 32-bit machine we subdivide the virtual address into 4 pieces as follows:

8-bit 4-bit 8-bit 12-bit

We use a 3-level page table, such that the first 8 bits are for the first level and so on. Physical addresses are 44 bits and there are 4 protection bits per page.

What is the page size in such a system?
How much memory is consumed by the page table and wasted by internal fragmentation for a process that has 64K of memory starting at address 0?
How much memory is consumed by the page table and wasted by internal fragmentation for a process that has a code segment of 48K starting at address 0x1000000, a data segment of 600K starting at address 0x80000000 and a stack segment of 64K starting at address 0xf0000000 and growing upward (towards higher addresses)?

4K. The last 12 bits of the virtual address are the offset in a page, varying from 0 to 4095. So the page size is 4096, that is, 4K.
2912 or 4224 bytes for page tables, 0 bytes for internal fragmentation.
Using the subdivision above, the 1st level page table contains 256 entries, each entry pointing to a 2nd level page table. A 2nd level page table contains 16 entries, each entry pointing to a 3rd page table. A 3rd page table contains 256 entries, each entry pointing to a page. The process's address space consists of 16 pages, thus we need 1 third-level page table. Therefore we need 1 entry in a 2nd level page table, and one entry in the first level page table. Therefore the size is: 256 entries for the first table, 16 entries for the 2nd level page table, and 1 3rd level page table containing 256 entries.
Since physical addresses are 44 bits and page size is 4K, the page frame number occupies 32 bits. Taking the 4 protection bits into account, each entry of the level-3 page table takes (32+4) = 36 bits. Rounding up to make entries byte (word) aligned would make each entry consume 40 (64) bits or 5 (8) bytes. For a 256 entry table, we need 1280 (2048) bytes.
The top-level page table should not assume that 2^nd level page tables are page-aligned. So, we store full physical addresses there. Fortunately, we do not need control bits. So, each entry is at least 44 bits (6 bytes for byte-aligned, 8 bytes for word-aligned). Each top-level page table is therefore 256*6 = 1536 bytes (256 * 8 = 2048 bytes).
Trying to take advantage of the 256-entry alignment to reduce entry size is probably not worth the trouble. Doing so would be complex; you would need to write a new memory allocator that guarantees such alignment. Further, we cannot quite fit a table into a 1024-byte aligned region (44-10 = 34 bits per address, which would require more than 4 bytes per entry), and rounding the size up to the next power of 2 would not save us any size over just storing pointers and using the regular allocator.
Similarly, each entry in the 2^nd level page table is a 44-bit physical pointer, 6 bytes (8 bytes) when aligned to byte (word) alignment. A 16 entry table is therefore 96 (128) bytes. So the space required is 1536 (2048) bytes for the top-level page table + 96 (128) bytes for one second-level page table + 1280 (2048) bytes for one third-level page table = 2912 (4224) bytes. Since the process can fit exactly into 16 pages, there is no memory wasted by internal fragmentation.
5664 or 8576 bytes for page tables, 0 bytes.
First, the stack, data and code segments are at addresses that require having 3 page table entries active in the first level page table, so we have 3 second-level page tables. For 48K, you need 12 pages or 1 third-level page table; for 600K, you need 150 pages, or 1 third-level page table and for 64K you need 16 pages or 1 third-level page table.
So the space required is 1536 (2048) bytes for the top level page table + 3 * 96 (3 * 128) bytes for 3 second-level page tables + 3 * 1280 (3 * 2048) for 3 third-level page table = 5664 (8576) bytes.
As the code, data, stack segment of the process fits exactly into 12, 150, 16 pages respectively, there is no memory wasted by internal fragmentation.

Problem 9

In keeping with the RISC processor design philosophy of moving hardware functionality to software, you see a proposal that processor designers remove the MMU (memory management unit) from the hardware. To replace the MMU, the proposal has compilers generate what is known as position independent code (PIC). PIC can be loaded and run at any adress without any relocation being performed. Assuming that PIC code runs just as fast as the non-PIC code, what would be the disadvantaqge of this scheme compared to the page MMU used on modern microprocessors?

Solution TBD.

Problem 10

Describe the advantages of using a MMU that incorporates segmentation and paging over ones that are either pure paging or pure segmentation. Present your answer as separate lists of advantages over each of the pure schemes.

Solution TBD.