================ Start Lecture #17 ================
Notes on midterm:
  1. Midterm handed back at the end of class
  2. An answer sheet was also handed out.
  3. The breakdown was 90-100: 14. 80-89: 15, 70-79: 9, 60-69: 3, 0-59: 1.
  4. We will review the answers in this week's recitations.

3.2.1 Update Operations

Insertion

Begin by a standard binary search tree insertion. In the diagrams on the right, black shows the situation before the insertion; red after. The numbers are the heights. Ignore the blue markings, they are explained in the text as needed.

  1. Do a find for the key to be inserted.

  2. Presumably wind up at a leaf (meaning the key not already in the tree).

  3. Call this leaf w.

  4. Insert the item by converting w to an internal node at height one (i.e., the new internal node has two leaves as children).

  5. Call the new internal node w. We view the previous operation as expanding the leaf w into an internal node (with leaves as children).

  6. In the diagram on the right, w is converted from a black leaf into a red height 1 node.

Why aren't we finished?
Ans:The tree may no longer be in balance, i.e. it may no longer by an AVL tree.

We look for an imbalanced node, i.e., a node whose children have heights differing by more than 1. If we find such a node, the tree is no longer AVL and we must perform a re-balancing operation.

  1. We start our search for an unbalanced node at w since that is where the tree has changed. Indeed the height of w has increased from 0 to 1. This height increase of w may cause a height increase of w's parent, which in tern may cause a height increase of w's grandparent, etc. That is, we may need to search the entire path from w up to the root, i.e. the ancestors of w. But we know the key fact that no other node has had its height changed.

  2. As we traverse the path from w to the root we will be looking at the ancestors of w. For each ancestor A we consider, we will also check the sibling of A. Let h be the height of A before the insert. Since the tree was AVL before the insert, we know that the height of A's sibling was and is h-1, h, or h+1. We shall see that these three cases lead us to three different actions.

    1. h-1 will be the "bad" case and will require a re-balancing operation.

    2. h will be the "unknown" case, we will need to proceed up the tree to see if a re-balance is needed.

    3. h+1 will be the "good" case, we will be able to stop and no re-balancing will be needed.

  3. We begin at w which was at height 0 and is now at height 1. Its children are leaves so w itself is in balance.

  4. Since w was at height 0, the sibling must have had height -1, 0, or +1. The height of the sibling has not changed (it is not on the path from w to the root). Since a height of -1 is impossible, we have just two possibilities, the "unknown" and "good" cases.

  5. If the sibling was at height 1, their parent was, and still is, at height 2. Hence no heights other than w's changed and the tree remains in balance. So in this case, which is illustrated in the figure above, we are indeed done.

  6. If the sibling of w was (and hence is) at height 0, we have the "unknown" case. The siblings heights now differ by 1, which is OK, but their parent's height has changed from 1 to 2 (since w is at height 1). This situation is illustrated at the right. Note that not all of the tree is shown. Also note that the black heights are before the insert and the reds are after (ignore the blue for now).

  7. Since w's parent P has had its height changed (from 1 to 2) we need to check P's sibling. The old height is 1 so the sibling must be at height 0, 1, or 2. (It is 1 in the diagram.) If we had the "good" case (height=2) the new height of P and the height of P's sibling would be equal and the height of their parent would not have changed so we would have been done (without re-balancing).

  8. In the diagram we again have the "unknown" case and proceed up to P's parent. Another "unknown" case (node's old height and its siblings height are equal, namely 2) and we keep going up.

  9. We get a final unknown case (both heights 3) and finally reach the root, which was at height 4. Since it has no sibling there is nothing that can be imbalanced so we are done, again with no re-balancing required.

  10. What is the problem? We just proceed up as in the figure on the right and eventually we hit the root and are done.

  11. WRONG.

  12. In the figure, before the insertion, siblings had the same height, the "unknown" case. This is certainly possible but not required.

  13. If an ancestor of w had height one less than its sibling, then the insertion has made them equal. That is the "good" case: The height of the parent doesn't change and we are done as illustrated in the previous (small) figure.

  14. The "bad" case occurs when an ancestor of w had height one greater than its sibling. In this case the insertion has made it two greater, which is not permitted for an AVL tree.

  15. For example the light blue node could have originally been at height 2 instead of 3. We no longer have an AVL tree, the root is imbalanced: Its children have height 2 (the blue) and 4 (the red). We must re-balance this tree.


The top left diagram illustrates the problem case from the previous figure (k was 3 above).
Definition: The operation we will perform when x, y, and z lie in a straight line is called a single rotation. When x, y, and z form an angle, the operation is called a double rotation.

Let us consider the upper left double rotation, which is the rotation that we need to apply to the example above. It is redrawn to the right with the subtrees drawn and their heights labeled. The colors are there so that you can see where they trees go when we perform the rotation.

Recall that x is an ancestor of w and has had its height raised from k-1 to k. The sibling of x is at height k-1 and so is the sibling of y. The reason x had its height raised is that one of its siblings (say the right one) has been raised from k-2 to k-1.
How do I know the other one is k-2?
Ans: It must have been the "unknown" case or we would not have proceeded further up the tree.

The double rotation transforms the picture on top to the one on the bottom. We now actually are done with the insertion. Let's check the bottom picture and make sure.

  1. The order relation remains intact. That is, every node is greater than its entire left subtree and less than its entire right subtree.

  2. The tree (now rooted at x) is balanced.

  3. Nodes y and z are each at height k. Hence x, the root of this tree is at height k+1.

  4. The tree above, rooted at z, has height k+2.

  5. But remember that before the insert z was at height k+1.

  6. So the rotated tree (which is after the insert) has the same height as the original tree before the insert.

  7. Hence every node above z in the original tree keeps its original height so the entire tree is now balanced.

Thus, if an insertion causes an imbalance, just one rotation re-balances the tree globally. We will see that for removals it is not this simple.

Here are the three pictures for the remaining three possibilities. That is, the other double rotation and both single rotations. The original configuration is shown on top and the result after the rotation is shown immediately below.

Homework: R-3.3, R-3.4, R-3.5

What is the complexity of insertion?

  1. Let n be the number of nodes in the tree before the insertion.

  2. Finding the insertion point is Θ(log n).

  3. Expanding the leaf and inserting the item is Θ(1).

  4. Walking up the tree looking for an imbalance is Θ(1) per level, which is O(log n) since the tree has height Θ(log n).

  5. Performing the one needed rotation is Θ(1).

Hence we have the following

Theorem: The complexity of insertion in an AVL tree is Θ(log n).