Basic Algorithms

================ Start Lecture #14 ================

Remark: From robin simon
The last day for students to withdraw is Nov. 5th. Therefore the exam should be returned at least a week before then.

Chapter 3 Search Trees and Skip Lists

3.1 Ordered Dictionaries and Binary Search Trees

We just studied unordered dictionaries at the end of chapter 2. Now we want to extend the study to permit us to find the "next" and "previous" items. More precisely we wish to support, in addition to findElement(k), insertItem(k,e), and removeElement(k), the new methods

closestKeyBefore(k): Return the key of the item with largest key less than or equal to k.
closestElemBefore(k): Return the element of the item with largest key less than or equal to k.
closestKeyAfter(k): Return the key of the item with smallest key greater than or equal to k.
closestElemAfter(k): Return the element of the item with smallest key greater than or equal to k.

We naturally signal an exception if no such item exists. For example if the only keys present are 55, 22, 77, and 88, then closestKeyAfter(90) or closestElemBefore(2) each signal an exception.

We begin with the most natural implementation.

3.1.1 Sorted Tables

We use the sorted vector implementation from chapter 2 (we used it as a simple implementation of a priority queue). Recall that this keeps the items sorted in key order. Hence it is O(n) for inserts and removals, which is slow; however, we shall see that it is fast for finding and element and for the four new methods closestKeyBefore(k) and friends. We call this a lookup table.

The space required is Θ(n) since we grow and shrink the array supporting the vector (see extendable arrays).

As indicated the key favorable property of a lookup table is that it is fast for (surprise) lookups using the binary search algorithm that we study next.

Binary Search

In this algorithm we are searching for the rank of the item containing a key equal to k. We are to return a special value if no such key is found.

The algorithm maintains two variables lo and hi, which are respectively lower and upper bounds on the rank where k will be found (assuming it is present).

Initially, the key could be anywhere in the vector so we start with lo=0 and hi=n-1. We write key(r) for the key at rank r and elem(r) for the element at rank r.

We then find mid, the rank (approximately) halfway between lo and hi and see how the key there compares with our desired key.

If k = key(mid), we have found the item and return elem(mid)
If k < key(mid), then we restrict our attention to indexes less than mid.
If k > key(mid), then we restrict our attention to indexes greater than mid.

Some care is need in writing the algorithm precisely as it is easy to have an ``off by one error''. Also we must handle the case in which the desired key is not present in the vector. This occurs when the search range has been reduced to the empty set (i.e., when lo exceeds hi).

Algorithm BinarySearch(S,k,lo,hi):
    Input:  An ordered vector S containing (key(r),elem(r)) at rank r
            A search key k
            Integers lo and hi
    Output: An element of S with key k and rank between lo and hi.
            NO_SUCH_KEY if no such element exits

If lo > hi then
    return NO_SUCH_KEY                    // Not present

mid ← ⌊(lo+hi)/2⌋
if k = key(mid) then
    return elem(mid)                     // Found it
if k < key(mid) then
    return BinarySearch(S,k,lo,mid-1)    // Try bottom ``half''
if k > key(mid) then
    return BinarySearch(S,k,mid+1,hi)    // Try top ``half''

Do some examples on the board.

Analysis of Binary Search

It is easy to see that the algorithm does just a few operations per recursive call. So the complexity of Binary Search is Θ(NumberOfRecursions). So the question is "How many recursions are possible for a lookup table with n items?".

The number of eligible ranks (i.e., the size of the range we still must consider) is hi-lo+1.

The key insight is that when we recurse, we have reduced the range to at most half of what it was before. There are two possibilities, we either tried the bottom or top ``half''. Let's evaluate hi-lo+1 for the bottom and top half. Note that the only two possibilities for ⌊(lo+hi)/2⌋ are (lo+hi)/2 or (lo+hi)/2-(1/2)=(lo+hi-1)/2

Bottom: (mid-1)-lo+1 = mid-lo = ⌊(lo+hi)/2⌋-lo ≤ (lo+hi)/2-lo = (hi-lo)/2<(hi-lo+1)/2

Top: hi-(mid+1)+1 = hi-mid = hi-⌊(lo+hi)/2⌋ ≤ hi-(lo+hi-1)/2 = (hi-lo+1)/2

So the range starts at n and is halved each time and remains an integer (i.e., if a recursive call has a range of size x, the next recursion will be at most ⌊x/2⌋).