Basic Algorithms: Lecture 17
================ Start Lecture #17 ================
3.2 AVL Trees
Named after its inventors Adel'son-Vel'skii and Landis.
Definition: An AVL tree is a
binary search tree that satisfies the height-balance
property, which means that for every internal node, the
height of the two children can differ by at most 1.
Homework: Draw an AVL tree of height 3 where each
left child has height one greater than its sibling.
Since the algorithms for an AVL tree require the height to be
computed and this is an expensive operation, each node of an AVL tree
contains its height (say as v.height()).
Remark: You can actually be fancier and store just
two bits to tell whether the node has the same height as its sibling,
one greater, or one smaller.
We see that the height-balance property prevents the tall skinny
trees you developed in problem set 2. But is it really true that the
height must be O(log(n))?
Yes it is and we shall now prove it. Actually we will prove
instead that an AVL tree of height h has at least 2(h/2)-1
internal nodes from which the desired result follows easily.
Lemma: An AVL tree of height h has at least
2(h/2)-1 internal nodes.
Proof:
Let n(h) be the minimum number of internal nodes in an AVL tree of height h.
n(1)=1 and n(2)=2. So the lemma holds for h=1 and h=2.
Here comes the key point.
Consider an AVL tree of height h≥3
and the minimum number of nodes. This tree is composed of a root,
and two subtrees. Since the whole tree has the minimum number of
nodes for its height so do the subtrees. For the big tree to be of
height h, one of the subtrees must be of height h-1. To get the
minimum number of nodes the other subtree is of height h-2.
Why can't the other subtree be of height h-3 or h-4?
The height of siblings can differ by at most 1!
What the last paragraph says in symbols is that for h≥3,
n(h) = 1+n(h-1)+n(h-2)
The rest is just algebra, i.e. has nothing to do with trees,
heights, searches, siblings, etc.
n(h) > n(h-1) so n(h-1) > n(h-2). Hence
n(h) > n(h-1)+n(h-2) > 2n(h-2)
Really we could stop here. We have shown that n(h) at least
doubles when h goes up by 2. This says that n(h) is exponential in h
and hence h is logarithmic in n. But we will proceed slower.
Applying the last formula i times we get
For any i>0, n(h) > 2in(h-2i) (*)
Let's find an i so that h-2i is guaranteed to be 1 or 2.
I claim i = ⌈h/2⌉-1 works.
If h is even h-2i = h-(h-2) = 2
If h is odd h-2i = h - (2⌈h/2⌉-2) = h - ((h+1)-2) = 1
Now we plug this value of i into equation (*) and get for h≥3
n(h) > 2in(h-2i)
= 2⌈h/2⌉-1n(h-2i)
≥ 2⌈h/2⌉-1(1)
≥ 2(h/2)-1
Theorem: the height of an AVL tree storing n items
is O(log(n)).
Proof:
From the lemma we have n(h) > 2(h/2)-1.
Taking logs gives log(n(h)) > (h/2)-1 or
h < 2log(n(h))+2
Since n(h) is the smallest number of nodes possible for an AVL tree of
height h, we see that h < 2 log(n) for any AVL tree of height h.
3.2.1 Update Operations
Insertion
Begin by a standard binary search tree insertion. In the diagrams
on the right, black shows the situation before the insertion; red
after.
The numbers are the heights.
Ignore the blue markings, they are explained in the text as needed.
-
Do a find for the key to be inserted.
-
Presumably wind up at a leaf (meaning the key not already in the
tree).
-
Call this leaf w.
-
Insert the item by converting w to an internal node at
height one (i.e., the new internal node has two leaves as
children).
-
Call the new internal node w. We view the previous operation as
expanding the leaf w into an internal node (with leaves as
children).
-
In the diagram on the right, w is converted from a black leaf into a
red height 1 node.
Why aren't we finished?
Ans:The tree may no longer be in balance, i.e. it may no longer by an
AVL tree.
We look for an imbalanced node, i.e., a node whose children have
heights differing by more than 1. If we find such a node, the tree is
no longer AVL and we must perform a re-balancing
operation.
-
We start our search for an unbalanced node at w since that is
where the tree has changed. Indeed the height of w has increased
from 0 to 1. This height increase of w may cause
a height increase of w's parent, which in tern may cause a
height increase of w's grandparent, etc. That is, we may need to
search the entire path from w up to the root, i.e. the ancestors
of w. But we know the key fact that no other node has had its
height changed.
-
As we traverse the path from w to the root we will be looking at
the ancestors of w. For each ancestor A we consider, we will also
check the sibling of A. Let h be the height of A before
the insert. Since the tree was AVL before the insert, we know
that the height of A's sibling was and hence is h-1, h, or
h+1. We shall see that these three cases lead us to three
different actions.
-
h-1 will be the "bad" case and will require a re-balancing
operation.
-
h will be the "unknown" case, we will need to proceed up the
tree to see if a re-balance is needed.
-
h+1 will be the "good" case, we will be able to stop and no
re-balancing will be needed.
-
We begin at w which was at height 0 and is now at height 1. Its
children are leaves so w itself is in balance.
-
Since w was at height 0 and the tree was
AVL, its sibling must have had height -1, 0, or +1.
The height of the sibling has not changed (it is not on
the path from w to the root).
Since a height of -1
is impossible, we have just two possibilities, the
“unknown” and “good” cases.
-
If the sibling was at height 1, their parent was, and still
is, at height 2. Hence no heights other than w's changed and
the tree remains in balance. So in this case, which is illustrated
in the figure above, we are indeed done.
-
If the sibling of w was (and hence is) at height 0, we have the
"unknown" case. The siblings heights now differ by 1, which is
OK, but their parent's height has changed from 1 to 2 (since w is
at height 1). This situation is illustrated at the right.
Note that not all
of the tree is shown. Also note that the black heights are before
the insert and the reds are after (ignore the blue for now).
-
Since w's parent P has had its height changed (from 1 to 2) we need
to check P's sibling. The old height is 1 so the sibling must be
at height 0, 1, or 2. (It is 1 in the diagram.) If we had the
"good" case (height=2) the new height of P and the height of P's
sibling would be equal and the height of their parent would not
have changed so we would have been done (without
re-balancing).
-
In the diagram we again have the "unknown" case and proceed up to
P's parent. Another "unknown" case (node's old height and its
siblings height are equal, namely 2) and we keep going up.
-
We get a final unknown case (both heights 3) and finally reach the
root, which was at height 4. Since it has no sibling there is
nothing that can be imbalanced so we are done, again with no
re-balancing required.
-
What is the problem? We just proceed up as in the figure on the
right and eventually we hit the root and are done.
-
WRONG.
-
In the figure, before the insertion, siblings had the same height,
the "unknown" case.
This is certainly possible but not required.
-
If an ancestor of w had height one less than its sibling, then
the insertion has made them equal. That is the "good" case:
The height of the parent doesn't change and we are done as
illustrated in the previous (small) figure.
-
The "bad" case occurs when an ancestor of w had height one
greater than its sibling. In this case the insertion has made it two
greater, which is not permitted for an AVL tree.
-
For example the light blue node could have originally been at
height 2 instead of 3. We no longer have an AVL tree, the root is
imbalanced: Its children have height 2 (the blue) and 4 (the
red). We must re-balance this tree.
The top left diagram illustrates the problem case from the previous
figure (k was 3 above).
-
Node z is the imbalanced ancestor of w found starting at w and
working up. The tree is out of balance, it is no longer
AVL.
-
Node y is the child of z with larger height (it is an ancestor of
w). The old height of y was 1 greater than it sibling's (the
“bad” case). Now it is 2 greater.
-
Node x is the child of y with larger height (it is an ancestor of
w and may in fact equal w). The old height of x was equal to its
sibling's (the “unknown” case). Now it is one
greater.
-
In this diagram y is the right child of z and x is the left child
of y. There are three other possibilities, which are also shown
(but with less detail).
-
We will isolate on the three nodes x y and z and their subtrees
and see how to rearrange the diagram to restore balance.
Definition: The operation we will perform when x,
y, and z lie in a straight line is called a single
rotation. When x, y, and z form an angle, the operation is
called a double rotation.
Let us consider the upper left double rotation, which is the rotation
that we need to apply to the example above.
It is redrawn to the right with the subtrees drawn and their heights
after the insertion are listed. The colors are there so that
you can see where they trees go when we perform the rotation.
Recall that x is
an ancestor of w and has had its height raised from k-1 to k. The
sibling of x is at height k-1 and so is the sibling of y.
The reason x had its height raised is that one of its siblings (say
the right one) has been raised from k-2 to k-1.
How do I know the other one is k-2?
Ans: It must have been the "unknown" case or we would not have
proceeded further up the tree.
The double rotation transforms the picture on top to the one on the
bottom. We now actually are done with the insertion. Let's check the
bottom picture and make sure.
-
The order relation remains intact. That is, every node is greater
than its entire left subtree and less than its entire right
subtree.
-
The tree (now rooted at x) is balanced.
-
Nodes y and z are each at height k. Hence x, the root of this
tree is at height k+1.
-
The tree above, rooted at z, has height k+2.
-
But remember that before the insert z was at height k+1.
-
So the rotated tree (which is after the insert) has the same
height as the original tree before the insert.
-
Hence every node above z in the original tree keeps its original
height so the entire tree is now balanced.
Thus, if an insertion causes an imbalance, just
one rotation re-balances the tree globally.
We will see that for removals it is not this simple.
Allan Gottlieb