Basic Algorithms

================ Start Lecture #5 ================

1.6.2: Data Analysis and Visualization

Ratio test

Assume you believe the running time t(n) of an algorithm is Θ(nd) for some specific d and you want to both verify your assumption and find the multiplicative constant.

Make a plot of (n, t(n)/nd). If you are right the points should tend toward a horizontal line and the height of this line is the multiplicative constant.

Homework: R-1.29

What if you believe it is polynomial but don't have a guess for d?
Ans: Use ...

The power test

Plot (n, t(n)) on log log paper. If t(n) is Θ(nd), say t(n) approaches bnd, then log(t(n)) approaches log(b)+d(log(n)).

So when you plot (log(n), log(t(n)) (i.e., when you use log log paper), you will see the points approach (for large n) a straight line whose slope is the exponent d and whose y intercept is the multiplicative constant d.

Homework: R-1.30

Chapter 2: Basic Data Structures

2.1: Stacks and Queues

2.1.1: Stacks

Stacks implement a LIFO (last in first out) policy. All the action occurs at the top of the stack, primarily with the push(e) and pop operation.

The stack ADT supports

There is a simple implementation using an array A and an integer s (the current size). A[s-1] contains the TOS.

Objection (your honor). The ADT says we can always push. A simple array implementation would need to signal an error if the stack is full.

Sustained! What do you propose instead?

An extendable array.

Good idea.

Homework: Assume a software system has 100 stacks and 100,000 elements that can be on any stack. You do not know how the elements are distributed on the stacks. If you used a normal array based implementation for the stacks, how much memory will you need. What if you use an extendable array based implementation? Now answer the same question, but assume you have θ(S) stacks and Θ(E) elements.

Applications for procedure calls

Stacks work great for implementing procedure calls since procedures have stack based semantics. That is, last called is first returned and local variables allocated with a procedure are deallocated when the procedure returns.

So have a stack of "activation records" in which you keep the return address and the local variables.

Support for recursive procedures comes for free. For languages with static memory allocations (e.g., fortran) one can store the local variables with the method. Fortran forbids recursion so that memory allocation can be static. Recursion adds considerably flexibility to a language as some cost in efficiency (not part of this course).

2.1.2: Queues

Queues implement a FIFO (first in first out) policy. Elements are inserted at the rear and removed from the front using the enqueue and deque operations respectively.

The queue ADT supports

Simple circular-array implementation

Personal (irrelevant) rant:
I object to programming languages using the well known and widely used function name mod and changing its meaning. My math training prevents me from accepting that mod(-3,10) is -3. The correct value is 7. The book, following java, defines mod(x,y) as x-floor(x/y)y. This is not mod but remainder. Think of x as dividend, y as divisor and then floor(x/y) is the quotient. We remember from elementary school
dividend = quotient * divisor + remainder
remainder = dividend - quotient * divisor
The last line is exactly the book's and java's definition of mod.

My favorite high level language, ada, gets it right in the obvious way: Ada defines both mod and remainder (ada extends the math definition of mod to the case where the second argument is negative).

This rant is irrelevant for the course since (I hope at least) the book will use mod(x,y) only when x≥0 and y>0, in which case mod and remainder are equal
End of personal (irrelevant) rant

Returning to relevant issues we note that for queues we need a front and rear "pointers" f and r. Since we are using arrays f and r are actually indexes not pointers. Calling the array Q, Q[f] is the front element of the queue, i.e., the element that would be returned by dequeue(). Similarly, Q[r] is the element into which enqueue(e) would place e. There is one exception: if f=r, the queue is empty so Q[f] is not the front element.

Without writing the code, we see that f will be increased by each dequeue and r will be increased by every enqueue.

Assume Q has n slots Q[0]…Q[N-1] and the queue is initially empty with f=r=0. Now consider enqueue(1); dequeue(); enqueue(2); dequeue(); enqueue(3); dequeue(); …. There is never more than one element in the queue, but f and r keep growing so after N enqueue(e);dequeue() pairs, we cannot issue another operation.

The solution to this problem is to treat the array as circular, i.e., right after Q[N-1] we find Q[0]. The way to implement this is to arrange that when either f or r is N-1, adding 1 gives 0 not N. Similarly for r. So the increment statements become
f←(f+1) mod N
r←(r+1) mod N

Note: Recall that we had some grief due to our starting arrays and loops at 0. For example, the fifth slot of A is A[4] and the fifth iteration of "for i←0 to 30" occurs when i=4. The updates of f and r directly above show one of the advantages of starting at 0; they are less pretty if the array starts at 1.

The size() of the queue seems to be r-f, but this is not always correct since the array is circular. For example let N=10 and consider an initially empty queue with f=r=0 that has
enqueue(10)enqueue(20);dequeue();enqueue(30);dequeue();enqueue(40);dequeue() applied. The queue has one element, f=4, and r=3. Now apply 6 more enqueue(e) operations
enqueue(50);enqueue(60);enqueue(70);enqueue(80);enqueue(90);enqueue(100) At this point the array has 7 elements, f=0, and r=3. Clearly the size() of the queue is not f-r=-3. It is instead 7, the number of elements in the queue.

The problem is that f in some sense is 10 not 0 since there were 10 enqueue(e) operations. In fact if we kept 2 values for f and 2 for r, namely the value before the mod and after, then size() would be fBeforeMod-rBeforeMod. Instead we, use the following inelegant formula.
size() = (r-f+N) mod N

Remark: If java's definition of -3 mod 10 gave 7 (as it should) instead of -3, we could use the more attractive formula
size() = (r-f) mod N.

Since isEmpty() is simply an abbreviation for the test size()=0, it is just testing if r=f.

Algorithm front():
    if isEmpty() then
        signal an error // throw QueueEmptyException
    return Q[f]
Algorithm dequeue():
    if isEmpty() then
        signal an error // throw QueueEmptyException
    Q[f]←NULL      // for security or debugging
    f←(f+1) mod N
    return temp
Algorithm enqueue(e):
    if size() = N-1 then
        signal an error // throw QueueFullException
    r←(r+1) mod N

Examples in OS

Round Robin processor scheduling is queue based as is fifo disk arm scheduling.

More general processor or disk arm scheduling policies often use priority queues (with various definitions of priority). We will learn how to implement priority queues later this chapter (section 2.4).

Homework: (You may refer to your 202 notes if you wish; mine are on-line based on my home page). How can you interpret Round Robin processor scheduling and fifo disk scheduling as priority queues. That is what is the priority? Same question for SJF (shortest job first) and SSTF (shortest seek time first)