Basic Algorithms: Lecture 18

================ Start Lecture #18 ================

Here are the three pictures for the remaining three possibilities. That is, the other double rotation and both single rotations. The original configuration is shown on top and the result after the rotation is shown immediately below.

Homework: R-3.3, R-3.4, R-3.5

What is the complexity of insertion?

Let n be the number of nodes in the tree before the insertion.
Finding the insertion point is Θ(log n).
Expanding the leaf and inserting the item is Θ(1).
Walking up the tree looking for an imbalance is Θ(1) per level, which is O(log n) since the tree has height Θ(log n).
Performing the one needed rotation is Θ(1).

Hence we have the following

Theorem: The complexity of insertion in an AVL tree is Θ(log n).

Problem Set 3, problem 3. Please read the entire problem before beginning.

Draw an avl tree containing items with integer keys. Draw the internal nodes as circles and write the key inside the circle. Draw the leaves as squares; leaves do not contain items. You will want to read the remaining parts before drawing this tree.
Choose a key not present in the tree you drew in part A whose insertion will require the "other" double rotation in order to restore balance (i.e., the double rotation shown in the diagram above showing one double and two single rotations). Draw the tree after the insertion, but prior to the rotation. Then draw the tree after the rotation.
Choose a key not present in the tree you drew in part A whose insertion will require a single rotation in order to restore balance. Draw the tree after the insertion, but prior to the rotation. Then draw the tree after the rotation.
Choose a key not present in the tree you drew in part A whose insertion will require the "other" single rotation in order to restore balance Draw the tree after the insertion, but prior to the rotation. Then draw the tree after the rotation.

Removing an Item from an AVL Tree

In order to remove an item with key k, we begin just as we did for an ordinary binary search tree by searching for the item and repairing the tree. Then we must restore the height-balanced property as we did when inserting into an AVL tree. Once again rotations are the key, but there is an extra twist this time. The details follow.

Search the AVL tree for the key to be removed.
Presumably the search succeeds in finding the key at an internal node. If not, the key is not present and we signal an error.
Call this internal node w.
Returning the element with the desired key is simple; it is the element at w.
We need to actually remove w, but we cannot leave a hole. We described the procedure previously when discussing removal in a general binary search tree. I repeat the discussion here and enhance the pictures since we must also insure that the resulting tree is balanced (i.e., AVL). There are you recall three cases.
1. The trivial case: If we are lucky both of w's children are leaves. Then we can simply replace w with a leaf. (Recall that leaves do not contain items.) Note that this procedure is the reverse of how we insert an item.
2. The easy case: Assume one child of w is a leaf and the other, call it z, is an internal node. In this case we can simply replace w by z; that is have the parent of w now point to z. This removes w as desired and also removes the leaf child of w, which is OK since leaves do not contain items.
  
  Note that the above two cases can be considered the same. We notice that one child of w is a leaf and replace w by the other child (and its descendents, if any).
3. The difficult case: Both children of w are internal nodes. What we will do is replace the item in w with the item that has the next highest key.
  - First we must find the item y with the next highest key. We already solved this when we implemented insertions: The node y we seek is the next internal node after w in an inorder traversal.
  - Store the item in y in the node w. This removes the old item of w, which we wanted to do.
    - Does replacing the item in w by the item formerly in y still result in a binary search tree? That is, are parents still bigger than (or equal to if we permit duplicate keys) all of the left subtree and smaller than all of the right subtree?
    - Yes. The only new parent is the item from y which has now moved to node w. But this item is the on immediately after the old item in w. Since it came from the right subtree of w, it is bigger than the left subtree, and since it was the smallest item in the right subtree, it is smaller than all remaining items in the that subtree.
  - But what about the old node y? It's left child is a leaf so it is the easy or trivial case and we just replace y by the other child and its descendants.
We have now successfully removed the item in w and repaired the structure so that we again have a binary search tree. However, we may need to restore balance, i.e., re-establish the AVL property. The possible trouble is that, in each of the three cases (trivial, easy, or difficult) the light green node on the left has been replaced by the light blue node on the right, which is of height one less. This might cause a problem.
Since the tree was balanced before the removal, the sibling of the light green (shown in purple) had height equal to, one less than, or one greater than, the height of the light green.
1. If the purple height was equal to the light green, it is now one greater than the light blue; this is still in balance. Since the parent (red) has not changed height, all is well. This is the good case; we are done.
2. If the purple height was one less than the light green, it is now equal to the light blue; this again remains in balance. But now red's height has dropped by one and might be out of balance. This is the unknown case; we turn our attention to the parent (red) and redo the balance check.
3. If the purple height was one greater than the light green, it is now two greater than the light blue so we are out of balance. This is the bad case; we must re-balance using a rotation.
In the good case we are done; in the unknown case we proceed up the tree; and in the bad case we rotate. This sounds the same as for insertion and indeed it is similar, but there is an important difference.
One similarity is that if we proceed up the tree and reach the root, we are done (there is then no sibling; so we can't be out of balance).
The more important similarity is that the rotations needed are the same four we used for insertions, two double-rotations and two single-rotations.
- The red node is the parent.
- The purple sibling is the one whose height is too high (I sometimes called it the “problem” node when discussing insertions). With removals, heights are decreasing so the high node is the one whose height was not changed.
- The third node in the rotation is the higher child of the purple sibling (or either child, if their heights are equal). When dealing with insertions we were raising heights, so I sometimes referred to this child, whose height had been raised, as the “cause”. For removals, its height has not changed.
The reason we can use the same rotations for deletions as we used for insertions is that we are solving the same problem (one sibling two higher than the other) and, to paraphrase the famous physicist and lecturer Richard Feynmann, “The same problem has the same solution”.
The important difference is the following.
- Recall that after a rotation, the highest node has height one less that the height of the highest node before the rotation (the parent). For insertions, the parent had just had its height raised by one, so this reduction restores the height to what it was before the insertion began. Hence no further changes are needed higher in the tree.
- With deletions the highest node after the rotation again has height one less than the highest node before the rotation, but there was no previous increase that this reduction cancels. Hence the height at the top has really been reduced by one, which can caused this node to be out of balance, depending on whether its sibling had height that was equal, one larger, or one greater.
- If that sibling was equal in height, it is now one greater. This is in balance and the parent's height is unchanged. The good case, we are done.
- If that sibling had height one less, it is now equal. This is in balance, but the parent's height has dropped by one. The unknown case, we move up the tree.
- If that sibling had height one greater, it is now two greater. The bad case, we must perform anotherrotation.
This second rotation occurs at a point further up the tree from the original rotation. Hence even though this second rotation can cause a third rotation, each subsequent rotation is closer to the root and at the root the process must stop.

What is the complexity of a removal? Remember that the height of an AVL tree is Θ(log(N)), where N is the number of nodes.

We must find a node with the key, which has complexity Θ(height) = Θ(log(N)).
We must remove the item: Θ(1).
We must re-balance the tree, which might involve Θ(height) rotations. Since each rotation is Θ(1), the complexity is Θ(log(N))*Θ(1) = Θ(log(N)).

Theorem: The complexity of removal for an AVL tree is logarithmic in the size of the tree.

Homework: R-3.6

Problem Set 3 problem 4 (end of problem set 3). Please read the entire problem before beginning.

Draw an avl tree containing items with integer keys. Draw the internal nodes as circles and write the key inside the circle. Draw the leaves as squares; leaves do not contain items. You will want to read the remaining parts before drawing this tree.
Choose a key present in the tree you drew in part A whose removal will require a double rotation and a single rotation in order to restore balance. Draw the tree after the removal, but prior to the rotations. Then draw the tree after the double rotation, but prior to the single rotation. Finally, draw the tree after both rotations.

3.2.2 Performance

The news is good. Search, Inserting, and Removing all have logarithmic complexity.

The three operations all involve a sweep down the tree searching for a key, and possibly an up phase where heights are adjusted and rotations are performed. Since only a constant amount of work is performed per level and the height is logarithmic, the complexity is logarithmic.

3.3 Bounded-Depth Search Trees (skipped)

3.4 Splay Trees (skipped)

3.5 Skip Lists (skipped)

3.6 Java Example: AVL and Red-Black Trees (skipped)

Chapter 4 Sorting, Sets, and Selection

We already did a sorting technique in chapter 2. Namely we inserted items into a priority queue and then removed the minimum each time. When we use a heap to implement the priority, the resulting sort is called heap-sort and is asymptotically optimal. That is, its complexity of O(Nlog(N)) is as fast as possible if we only use comparisons (proved in 4.2 below)

4.1 Merge-Sort

4.1.1 Divide-and-Conquer

The idea is that if you divide an enemy into small pieces, each piece, and hence the enemy, can be conquered. When applied to computer problems divide-and-conquer involves three steps.

Divide the problem into smaller subproblems.
Solve each of the subproblems, normally via a recursive call to the original procedure.
Combine the subproblem solutions into a solution for the original problem.

In order to prevent an infinite sequence of recursions, we need to define a stopping condition, i.e., a predicate that informs us when to stop dividing (because the problem is small enough to solve directly).

Using Divide-and-Conquer for Sorting

This turns out to be so easy that it is perhaps surprising that it is asymptotically optimal. The key observation is that merging two sorted lists is fast (the time is linear in the size of the lists).

The steps are

Divide (with stopping condition): If S has zero or one element, simply return S since it is already sorted. Otherwise S has n≥2 elements: Move the first ⌈n/2⌉ elements of S into S₁ and the remaining ⌊n/2⌋ elements into S₂.
Solve recursively: Recursively sort each of the two subsequences.
Combine: Merge the two (now sorted) subsequences back into S

Example:: Sort {22, 55, 33, 44, 11}.

Divide {22, 55, 33, 44, 11} into {22, 55, 33} and {44, 11}
Recursively sort {22, 55, 33} and {44, 11} getting {22, 33, 55} and {11, 44}
Merge {22, 33, 55} and {11, 44} getting {11, 22, 33, 44, 55}

Expanding the recursion one level gives.

Divide {22, 55, 33, 44, 11} into {22, 55, 33} and {44, 11}
Recursively sort {22, 55, 33} and {44, 11} getting {22, 33, 55} and {11, 44}
1. Divide {22, 55, 33} into {22, 55} and {33}
2. Recursively sort {22, 55} and {33} getting {22, 55} and {33}
3. Merge {22, 55} and {33} getting {22, 33, 55}
1. Divide {44, 11} into {44} and {11}
2. Recursively sort {44} and {11} getting {44} and {11}
3. Merge {44} and {11} getting {11, 44}
Merge {22, 33, 55} and {11, 44} getting {11, 22, 33, 44, 55}

Expanding again gives

Divide {22, 55, 33, 44, 11} into {22, 55, 33} and {44, 11}
Recursively sort {22, 55, 33} and {44, 11} getting {22, 33, 55} and {11, 44}
1. Divide {22, 55, 33} into {22, 55} and {33}
2. Recursively sort {22, 55} and {33} getting {22, 55} and {33}
  1. Divide {22, 55} into {22} and {55}
  2. Recursively sort {22} and {55} getting {22} and {55}
  3. Merge {22} and {55} getting {22, 55}
  1. Do NOT divide {33} since it has only one element and hence is already sorted
3. Merge {22, 55} and {33} getting {22, 33, 55}
1. Divide {44, 11} into {44} and {11}
2. Recursively sort {44} and {11} getting {44} and {11}
  1. Do NOT divide {44} since it has only one element and hence is already sorted
  1. Do NOT divide {11} since it has only one element and hence is already sorted
3. Merge {44} and {11} getting {11, 44}
Merge {22, 33, 55} and {11, 44} getting {11, 22, 33, 44, 55}

Finally there still is one recursion to do so we get.

Divide {22, 55, 33, 44, 11} into {22, 55, 33} and {44, 11}
Recursively sort {22, 55, 33} and {44, 11} getting {22, 33, 55} and {11, 44}
1. Divide {22, 55, 33} into {22, 55} and {33}
2. Recursively sort {22, 55} and {33} getting {22, 55} and {33}
  1. Divide {22, 55} into {22} and {55}
  2. Recursively sort {22} and {55} getting {22} and {55}
    1. Do NOT divide {22} since it has only one element and hence is already sorted.
    1. Do NOT divide {55} since it has only one element and hence is already sorted.
  3. Merge {22} and {55} getting {22, 55}
  1. Do NOT divide {33} since it has only one element and hence is already sorted
3. Merge {22, 55} and {33} getting {22, 33, 55}
1. Divide {44, 11} into {44} and {11}
2. Recursively sort {44} and {11} getting {44} and {11}
  1. Do NOT divide {44} since it has only one element and hence is already sorted
  1. Do NOT divide {11} since it has only one element and hence is already sorted
3. Merge {44} and {11} getting {11, 44}
Merge {22, 33, 55} and {11, 44} getting {11, 22, 33, 44, 55}

Hopefully there is a better way to describe this action. How about the following picture. The left tree shows the dividing. The right shows the result of the merging.