Basic Algorithms: Lecture 17

================ Start Lecture #17 ================

3.2 AVL Trees

Named after its inventors Adel'son-Vel'skii and Landis.

Definition: An AVL tree is a binary search tree that satisfies the height-balance property, which means that for every internal node, the height of the two children can differ by at most 1.

Homework: Draw an AVL tree of height 3 where each left child has height one greater than its sibling.

Since the algorithms for an AVL tree require the height to be computed and this is an expensive operation, each node of an AVL tree contains its height (say as v.height()).

Remark: You can actually be fancier and store just two bits to tell whether the node has the same height as its sibling, one greater, or one smaller.

We see that the height-balance property prevents the tall skinny trees you developed in problem set 2. But is it really true that the height must be O(log(n))?

Yes it is and we shall now prove it. Actually we will prove instead that an AVL tree of height h has at least 2(h/2)-1 internal nodes from which the desired result follows easily.

Lemma: An AVL tree of height h has at least 2(h/2)-1 internal nodes.

Proof: Let n(h) be the minimum number of internal nodes in an AVL tree of height h.

n(1)=1 and n(2)=2. So the lemma holds for h=1 and h=2.

Here comes the key point.

Consider an AVL tree of height h≥3 and the minimum number of nodes. This tree is composed of a root, and two subtrees. Since the whole tree has the minimum number of nodes for its height so do the subtrees. For the big tree to be of height h, one of the subtrees must be of height h-1. To get the minimum number of nodes the other subtree is of height h-2.
Why can't the other subtree be of height h-3 or h-4?
The height of siblings can differ by at most 1!

What the last paragraph says in symbols is that for h≥3,
n(h) = 1+n(h-1)+n(h-2)

The rest is just algebra, i.e. has nothing to do with trees, heights, searches, siblings, etc.

n(h) > n(h-1) so n(h-1) > n(h-2). Hence

n(h) > n(h-1)+n(h-2) > 2n(h-2)

Really we could stop here. We have shown that n(h) at least doubles when h goes up by 2. This says that n(h) is exponential in h and hence h is logarithmic in n. But we will proceed slower. Applying the last formula i times we get

For any i>0,   n(h) > 2in(h-2i)    (*)

Let's find an i so that h-2i is guaranteed to be 1 or 2.
I claim i = ⌈h/2⌉-1 works.

If h is even h-2i = h-(h-2) = 2

If h is odd h-2i = h - (2⌈h/2⌉-2) = h - ((h+1)-2) = 1

Now we plug this value of i into equation (*) and get for h≥3

n(h) > 2in(h-2i)
   = 2⌈h/2⌉-1n(h-2i)
   ≥ 2⌈h/2⌉-1(1)
   ≥ 2(h/2)-1

Theorem: the height of an AVL tree storing n items is O(log(n)).

Proof: From the lemma we have n(h) > 2(h/2)-1.

Taking logs gives log(n(h)) > (h/2)-1 or

h < 2log(n(h))+2

Since n(h) is the smallest number of nodes possible for an AVL tree of height h, we see that h < 2 log(n) for any AVL tree of height h.

3.2.1 Update Operations

Insertion

Begin by a standard binary search tree insertion. In the diagrams on the right, black shows the situation before the insertion; red after. The numbers are the heights. Ignore the blue markings, they are explained in the text as needed.

  1. Do a find for the key to be inserted.

  2. Presumably wind up at a leaf (meaning the key not already in the tree).

  3. Call this leaf w.

  4. Insert the item by converting w to an internal node at height one (i.e., the new internal node has two leaves as children).

  5. Call the new internal node w. We view the previous operation as expanding the leaf w into an internal node (with leaves as children).

  6. In the diagram on the right, w is converted from a black leaf into a red height 1 node.

Why aren't we finished?
Ans:The tree may no longer be in balance, i.e. it may no longer by an AVL tree.

We look for an imbalanced node, i.e., a node whose children have heights differing by more than 1. If we find such a node, the tree is no longer AVL and we must perform a re-balancing operation.

  1. We start our search for an unbalanced node at w since that is where the tree has changed. Indeed the height of w has increased from 0 to 1. This height increase of w may cause a height increase of w's parent, which in tern may cause a height increase of w's grandparent, etc. That is, we may need to search the entire path from w up to the root, i.e. the ancestors of w. But we know the key fact that no other node has had its height changed.

  2. As we traverse the path from w to the root we will be looking at the ancestors of w. For each ancestor A we consider, we will also check the sibling of A. Let h be the height of A before the insert. Since the tree was AVL before the insert, we know that the height of A's sibling was and hence is h-1, h, or h+1. We shall see that these three cases lead us to three different actions.

    1. h-1 will be the "bad" case and will require a re-balancing operation.

    2. h will be the "unknown" case, we will need to proceed up the tree to see if a re-balance is needed.

    3. h+1 will be the "good" case, we will be able to stop and no re-balancing will be needed.

  3. We begin at w which was at height 0 and is now at height 1. Its children are leaves so w itself is in balance.

  4. Since w was at height 0 and the tree was AVL, its sibling must have had height -1, 0, or +1. The height of the sibling has not changed (it is not on the path from w to the root). Since a height of -1 is impossible, we have just two possibilities, the “unknown” and “good” cases.

  5. If the sibling was at height 1, their parent was, and still is, at height 2. Hence no heights other than w's changed and the tree remains in balance. So in this case, which is illustrated in the figure above, we are indeed done.

  6. If the sibling of w was (and hence is) at height 0, we have the "unknown" case. The siblings heights now differ by 1, which is OK, but their parent's height has changed from 1 to 2 (since w is at height 1). This situation is illustrated at the right. Note that not all of the tree is shown. Also note that the black heights are before the insert and the reds are after (ignore the blue for now).

  7. Since w's parent P has had its height changed (from 1 to 2) we need to check P's sibling. The old height is 1 so the sibling must be at height 0, 1, or 2. (It is 1 in the diagram.) If we had the "good" case (height=2) the new height of P and the height of P's sibling would be equal and the height of their parent would not have changed so we would have been done (without re-balancing).

  8. In the diagram we again have the "unknown" case and proceed up to P's parent. Another "unknown" case (node's old height and its siblings height are equal, namely 2) and we keep going up.

  9. We get a final unknown case (both heights 3) and finally reach the root, which was at height 4. Since it has no sibling there is nothing that can be imbalanced so we are done, again with no re-balancing required.

  10. What is the problem? We just proceed up as in the figure on the right and eventually we hit the root and are done.

  11. WRONG.

  12. In the figure, before the insertion, siblings had the same height, the "unknown" case. This is certainly possible but not required.

  13. If an ancestor of w had height one less than its sibling, then the insertion has made them equal. That is the "good" case: The height of the parent doesn't change and we are done as illustrated in the previous (small) figure.

  14. The "bad" case occurs when an ancestor of w had height one greater than its sibling. In this case the insertion has made it two greater, which is not permitted for an AVL tree.

  15. For example the light blue node could have originally been at height 2 instead of 3. We no longer have an AVL tree, the root is imbalanced: Its children have height 2 (the blue) and 4 (the red). We must re-balance this tree.


The top left diagram illustrates the problem case from the previous figure (k was 3 above).
Definition: The operation we will perform when x, y, and z lie in a straight line is called a single rotation. When x, y, and z form an angle, the operation is called a double rotation.

Let us consider the upper left double rotation, which is the rotation that we need to apply to the example above. It is redrawn to the right with the subtrees drawn and their heights after the insertion are listed. The colors are there so that you can see where they trees go when we perform the rotation.

Recall that x is an ancestor of w and has had its height raised from k-1 to k. The sibling of x is at height k-1 and so is the sibling of y. The reason x had its height raised is that one of its siblings (say the right one) has been raised from k-2 to k-1.
How do I know the other one is k-2?
Ans: It must have been the "unknown" case or we would not have proceeded further up the tree.

The double rotation transforms the picture on top to the one on the bottom. We now actually are done with the insertion. Let's check the bottom picture and make sure.

  1. The order relation remains intact. That is, every node is greater than its entire left subtree and less than its entire right subtree.

  2. The tree (now rooted at x) is balanced.

  3. Nodes y and z are each at height k. Hence x, the root of this tree is at height k+1.

  4. The tree above, rooted at z, has height k+2.

  5. But remember that before the insert z was at height k+1.

  6. So the rotated tree (which is after the insert) has the same height as the original tree before the insert.

  7. Hence every node above z in the original tree keeps its original height so the entire tree is now balanced.

Thus, if an insertion causes an imbalance, just one rotation re-balances the tree globally. We will see that for removals it is not this simple.

Allan Gottlieb