Homework 6 Solutions

R-7.2
R-7.6
R-7.9
R-7.11
R-7.12
C-7.2
C-7.5
C-7.10
R-8.3
R-8.5
C-8.16
Adjacency Lists
R-9.3
R-9.5
Proof by Contradiction
Proof by Induction

R-7.2

insert 30:

insert 40:

insert 24:

insert 58:

insert 48:

insert 26:

insert 11:

insert 13:

R-7.6

R-7.9

The number of 48 nodes and 24 nodes will depend on the coin flips. I got tails the first time for 48, and one heads and then tails for 24.

R-7.11

R-7.12

Linear probing:

Quadratic Probing:

C-7.2

For any item x, let rank(x) be the number of items in S or T that are less than or equal to x. We are assuming no duplicates, so the ranks of all the items in S and T are distinct, and within each ordered array they are monotonically increasing (i.e., if i < j then rank(S[i]) < rank(S[j]), and similarly for T). Therefore, to find the item with rank k, which would be the kth smallest, we can do a binary search on the items of S, looking for one with rank k. If one is found, great; if not, do the same on the items of T. The kth smallest must be in one array or the other.

So we can find the kth smallest by doing (at most) two binary searches on the ranks. Each search visits O(log n) items. Whenever an item is visited, we must determine its rank. This can be found by doing a binary search on the other array. (Here we are searching for an actual item, not a rank.) For example, suppose we want the rank of T[i]. Let S[j] be the largest item in S that is smaller than T[i]. (S[j] can be found with a binary search.) Then the rank of T[i] is i+j.

So each "visit" requires a binary search, which takes O(log n) time. Thus, we have O(log n) visits and O(log n) time per visit, which gives a time complexity for the whole algorithm of O(log²n).

Here is some pseudo code for just the part where we do a binary search on the ranks of S:

min := 0;
max := n-1;
while (max >= min) do begin
    i := min + floor((max - min)/2);
    r := rank(S[i]);
    if (r = k) then
        ANSWER S[i];  // we're done
    else if (r < k) then
        min = i+1;
    else // must be that r > k
        max = i-1;
end;

C-7.5

Here is the algorithm in pseudo-code. It is similar to TreeSearch, except that we cannot stop when we find a node with the right key, because there might be more nodes with that key. I have written a second subroutine with the same name but different parameter list for the recursion. Notice that when we find a node with the right key we must search both its children, since both of the subtrees may also contain nodes with that key.

findAllElements(k):
    create empty List;
    findAllElements(k, root, List);
    return List;

findAllElements(k, v, List):
    if v is an external node then
        return;

    if k = key(v) then begin
        List.addElement(v);
        findAllElements(k, T.leftChild(v), List);
        findAllElements(k, T.rightChild(v), List);
    end;
    else if k < key(v) then
        findAllElements(k, T.leftChild(v), List);
    else
        findAllElements(k, T.rightChild(v), List);

To show that the time complexity is O(h + s), we must bound the number of nodes visited that do not have key k to be no more that O(h).

claim: The algorithm will trace no more than two paths, from the root to the leaves, containing nodes with keys not equal to k.

proof of claim: Notice that the path being traced splits in two only when a node with key k is encountered. If one or both of that node's children has key k, or if one or both of the children is a leaf, then we have not increased the number of paths containing non-k keys. So we need to show that the algorithm will encounter a node with key k, both of whose children have non-k keys, at most once. (In fact, this can only be the first node encoundered with key k.)

Suppose the algorithm encounters a node, x, with key k, and it is not the first such node. Then it must be a descendant of another node, y, with key k. Node x is either in the left or right subtree of y. Suppose it is the left subtree (a similar argument applies if it is the right subtree). Since the tree is a binary search tree, all keys in the left subtree of y must be less than or equal to k, and all the keys in the right subtree of x must be greater than or equal to k. Since the right child of x is in both, it must have key equal to k. Hence, x has at most one child with key not equal to k, traversing x cannot increase the number of paths with non-k keys, and the claim is proven.

Since the algorithm traces at most two paths containing non-k nodes, and each such path can have length no more than h, the number of nodes visited, and the time complexity of the algorithm, must be O(h + s).

C-7.10

As suggested in the hint, we will store a value at each node, called "size", which will give the number of nodes in the subtree rooted at that node (including the node itself). The insertion and deletion algorithms will have to be modified to make sure the sizes are maintained properly. Basically, from the insertion or deletion point, we have to walk up parent pointers to the root of the tree, resetting the size values along the way. When a rotation is performed, care must be taken to properly set the size of each of the three nodes involved in the rotation. (You can work out the details; it's not hard. The size of a node v, v.size, is always 1 + T.leftChild(v).size + T.rightChild(v).size.) The amount of additional work for each rotation is constant, and the number of nodes that will need to have their size changed is O(log n), so insertion and deletion can still be done in O(log n) time.

Since the AVL tree is also a binary search tree, we can deduce something about the range of values stored in a subtree from the values in the ancestors of the root of the subtree. For example, starting at the root, we have no information, so we can just say that the keys in the tree are in the range [-infinity, +infinity]. But if the root stores key k, then the nodes in the left subtree of the root must be in the range [-infinity, k], and those in the right subtree must be in [k, +infinity]. If the left child stores key k', then the nodes in the right subtree of the left child of the root must have keys in the range [k', k], etc.

We can do a modified depth first traversal of the tree, keeping track of the range boundaries of each subtree as we descend. Suppose we are looking for the number of keys in the range [k₁, k₂]. If we come to a subtree whose boundary range does not intersect [k₁, k₂], then we can ignore that subtree. If we come to a subtree whose boundary range is a subset of [k₁, k₂], then we can add to our running total the size of the root of that subtree, and that subtree also does not need to be traversed. If the boundary range of the subtree intersects, but is not a subset of, [k₁, k₂], then the subtree needs to be traversed to see how many nodes are in [k₁, k₂].

Here is some pseudo-code. There is a second subroutine with the same name but different parameters to implement the recursion. (In this pseudo-code, ranges are treated as simple variables.)

countAllInRange(k1, k2)
    return countAllInRange(T.root, [k1, k2], [-infinity, +infinity]);

countAllInRange(v, [k1,k2], [r1,r2])
    // returns the number of keys in subtree rooted at v that
    // are in [k1,k2]
    // [r1,r2] is the boundary range for the subtree rooted at v
    // (all keys in this subtree are known to be in [r1,r2])
    if [r1,r2] is a subset of [k1,k2] then
        return v.size;
    else if [r1,r2] does not intersect [k1,k2] then
        return 0;
    else begin
        if v.key is in [k1,k2] then
            count := 1;
        else
            count := 0;

        count := count + countAllInRange(T.leftChild(v), [k1,k2], [r1,v.key]);
        count := count + countAllInRange(T.rightChild(v), [k1,k2], [v.key,r2]);

        return count;
    end;

To show that the algorithm has time complexity O(log n), we must show that the traversal does not branch out "too much". Notice that the traversal only branches out at a node if the current range, [r_i, r_j], intersects [k₁, k₂], but is not contained in it. In other words, we branch if k₁ is in [r_i, r_j] but k₂ isn't, or vice versa. But furthermore notice that, at any level of the tree, no two boundary ranges associated with nodes at that level can overlap, except possibly at their endpoints. Therefore, at any level of the tree, only two ranges can contain k₁ but not k₂, or vice versa.

Therefore, the algorithm can only traverse two complete paths from root to leaves, which represents O(log n) operations (since it is an AVL tree). At each node on these paths, there will be a branch off, but that branch will terminate after one node, so all the side branches represent only O(log n) more operations. So finally the algorithm must have time complexity O(log n).

R-8.3

Prove by induction that mergesort is O(n log n):

If n = 1 then zero operations are needed, and 0 = 1 log 1.

Suppose n > 1, and for all m < n mergesort of a list of m items takes O(m log m) time. Let T(n) be the running time as a function of n. Since the merge step is O(n),

T(n) = 2T(n/2) + O(n)
<= 2T(n/2) + C₁n
for some constant C₁. But, by the induction hypothesis,

T(n/2) <= C₂(n/2)log n/2
<= C₃n log n - C₃n
for some constants C₂ and C₃. Therefore,

T(n) <= 2[C₃n log n - C₃n] + C₁n
<= C₄n log n - C₄n + C₁n
<= C₄n log n + C₅n
for some constants C₄ and C₅. But this means that T(n) is O(n log n).

R-8.5

Big-theta(n log n).

Every sublist will be split exactly in half. But we still need to do O(n work at each of the O(log n) levels of the recursion tree, so this is the best that can be done.

R-8.16

Here is the algorithm in pseudo-code:

Sort(A); Sort(B); // use O(n log n) sort algorithm
i := 0;
j := n-1;
while i < n and j >= 0 do begin
    while A[i] + B[j] > m do
        j := j - 1;
    if A[i] + B[j] = m then
        ANSWER (A[i], B[j]); // we're done
    else
        i := i + 1;
end;
ANSWER "none";

Proof of time complexity: We do a constant amount of work every time i or j is incremented or decremented, but this happens at most 2n times, so the this loop is O(n). Sorting is O(n log n).

Adjacency Lists

Show that the space usage of the adjacency list representation of graphs has complexity big-Theta of m+n.

For each of the m edges there are two references to it out of all the incidence containers (adjacency lists), which requires big-Theta(m) space.

Other than that, each of the m edges and n vertices requires constant, non-zero, space, so the total space usage is big-Theta(m + n).

R-9.3

An Euler tour could visit the nodes in the following order:

1, 3, 5, 7, 1, 2, 4, 6, 8, 2, 3, 4, 5, 6, 7, 8, 1.

R-9.5

There are several possible sequences; this is one of them. See the algorithm for topological sort on page 384.

15, 22, 16, 31, 127, 141, 32, 169, 126.

Proof by Contradiction

Prove by contradiction: if (u, v) is a back-edge in the DFS traversal of an undirected graph G, then v is an ancestor of u.

Suppose that u is the current node being explored, and (u, v) is a back-edge, so that v has already been traversed. Suppose, for the sake of contradiction, that v is not an ancestor of u. This means that v is not on the path from the root to u, which means that v is not on the current recursion stack. But this means that v has already been explored completely, which means that all paths from v have been traversed previously, including the one connecting it to u. But this is a contradiction, since the node u and the edge (u,v) are only now being traversed.

Therefore, v must be an ancestor of u.

Proof by Induction

Let G be an undirected graph where each node u has an even degree. Prove by induction on the number m (of edges) that G has an Euler Tour (a path that goes through each edge of G exactly once).

(We must assume that G is connected, since that is the only way it can have an Euler tour.)

The claim is vacuously true for number of edges equal to one, since there are no graphs with one edge and each node having even degree. It is also clearly true for number of edges equal to two, since there is only one such graph. Similarly for number of edges equal to zero.

Suppose the claim is true for any graph with number of edges less than m, for m > 1.

Let G be a graph with m edges, m > 1, and with each node having an even degree. Pick a node x in G, and, starting from x, traverse a path of edges until a node is reached which has no more untraversed outgoing edges.

claim: this node must be x.
proof of claim: since each node has even degree, whenever we enter a node by traversing an edge, there must be another edge to leave the node by. But we started at x without entering it through an edge, so this node is the only one that we could enter by an edge and not have an untraversed edge available to leave by.

Let P be the path so traversed. Form a new graph, G', by removing all edges of P from G.

Each node in G' must still have even degree, since every removed edge that was incident on a node u corresponds to another removed edge indicent on u. So G' consists of multiple connected parts, each of which has fewer than m edges, and nodes with even degree. So each connected subgraph of G' has an Euler tour, by the induction hypothesis.

Now we can construct an Euler tour on G by traversing the path P, except that whenever P first enters one of the connected components of G', we traverse the Euler tour of that component, and then continue traversing P.

Hence, by induction, any connected graph with even-degree nodes has an Euler tour.

		T(`n`)	<= 2[C₃`n` log `n` - C₃`n`] + C₁`n`
			<= C₄`n` log `n` - C₄`n` + C₁`n`
			<= C₄`n` log `n` + C₅`n`