Basic Algorithms

================ Start Lecture #15 ================

Problem Set 2 is (actually, should have been) assigned. It is due tues 4 Nov.

3.2 AVL Trees

Named after its inventors Adel'son-Vel'skii and Landis.

Definition: An AVL tree is a binary search tree that satisfies the height-balance property, which means that for every internal node, the height of the two children can differ by at most 1.

Homework: Draw an AVL tree of height 3 where each left child that is not a leaf has height one greater than its sibling.

Since the algorithms for an AVL tree require the height to be computed and this is an expensive operation, each node of an AVL tree contains its height (say as v.height()).

Remark: You can actually be fancier and store just two bits to tell whether the node has the same height as its sibling, one greater, or one smaller.

We see that the height-balance property prevents the tall skinny trees you developed in problem set 2. But is it really true that the height must be O(log(n))?

Yes it is and we shall now prove it. Actually we will prove instead that an AVL tree of height h has at least 2(h/2)-1 internal nodes from which the desired result follows easily.

Lemma: An AVL tree of height h has at least 2(h/2)-1 internal nodes.

Proof: Let n(h) be the minimum number of internal nodes in an AVL tree of height h.

n(1)=1 and n(2)=2. So the lemma holds for h=1 and h=2.

Here comes the key point.

Consider an AVL tree of height h≥3 and the minimum number of nodes. This tree is composed of a root, and two subtrees. Since the whole tree has the minimum number of nodes for its height so do the subtrees. For the big tree to be of height h, one of the subtrees must be of height h-1. To get the minimum number of nodes the other subtree is of height h-2.
Why can't the other subtree be of height h-3 or h-4?
The height of siblings can differ by at most 1!

What the last paragraph says in symbols is that for h≥3,
n(h) = 1+n(h-1)+n(h-2)

The rest is just algebra, i.e. has nothing to do with trees, heights, searches, siblings, etc.

n(h) > n(h-1) so n(h-1) > n(h-2). Hence

n(h) > n(h-1)+n(h-2) > 2n(h-2)

Really we could stop here. We have shown that n(h) at least doubles when h goes up by 2. This says that n(h) is exponential in h and hence h is logarithmic in n. But we will proceed slower. Applying the last formula i times we get

For any i>0,   n(h) > 2in(h-2i)    (*)

Let's find an i so that h-2i is guaranteed to be 1 or 2. This would guarantee that n(h-2i) ≥ 1.
I claim i = ⌈h/2⌉-1 works.

If h is even h-2i = h-(h-2) = 2

If h is odd h-2i = h - (2⌈h/2⌉-2) = h - ((h+1)-2) = 1

Now we plug this value of i into equation (*) and get for h≥3

n(h) > 2in(h-2i)
   = 2⌈h/2⌉-1n(h-2i)
   ≥ 2⌈h/2⌉-1(1)
   ≥ 2(h/2)-1

Theorem: the height of an AVL tree storing n items is O(log(n)).

Proof: From the lemma we have n(h) > 2(h/2)-1.

Taking logs gives log(n(h)) > (h/2)-1 or

h < 2log(n(h))+2

Since n(h) is the smallest number of nodes possible for an AVL tree of height h, we see that h < 2 log(n) for any AVL tree of height h.