================ Start Lecture #17 ================
Notes on midterm:
-
Midterm handed back at the end of class
-
An answer sheet was also handed out.
-
The breakdown was 90-100: 14. 80-89: 15, 70-79: 9, 60-69: 3, 0-59:
1.
-
We will review the answers in this week's recitations.
3.2.1 Update Operations
Insertion
Begin by a standard binary search tree insertion. In the diagrams
on the right, black shows the situation before the insertion; red
after.
The numbers are the heights.
Ignore the blue markings, they are explained in the text as needed.
-
Do a find for the key to be inserted.
-
Presumably wind up at a leaf (meaning the key not already in the
tree).
-
Call this leaf w.
-
Insert the item by converting w to an internal node at
height one (i.e., the new internal node has two leaves as
children).
-
Call the new internal node w. We view the previous operation as
expanding the leaf w into an internal node (with leaves as
children).
-
In the diagram on the right, w is converted from a black leaf into a
red height 1 node.
Why aren't we finished?
Ans:The tree may no longer be in balance, i.e. it may no longer by an
AVL tree.
We look for an imbalanced node, i.e., a node whose children have
heights differing by more than 1. If we find such a node, the tree is
no longer AVL and we must perform a re-balancing
operation.
-
We start our search for an unbalanced node at w
since that is where the tree has changed. Indeed the height of w
has increased from 0 to 1. This height increase of w
may cause a height increase of w's parent, which
in tern may cause a height increase of w's grandparent, etc. That
is, we may need to search the entire path from w up to the root,
i.e. the ancestors of w.
But we know the key fact that no other node has had its height
changed.
-
As we traverse the path from w to the root we will be looking at
the ancestors of w. For each ancestor A we consider, we will also
check the sibling of A. Let h be the height of A before
the insert. Since the
tree was AVL before the insert, we know that the height of A's
sibling was and is h-1, h, or h+1. We shall see that these
three cases lead us to three different actions.
-
h-1 will be the "bad" case and will require a re-balancing
operation.
-
h will be the "unknown" case, we will need to proceed up the
tree to see if a re-balance is needed.
-
h+1 will be the "good" case, we will be able to stop and no
re-balancing will be needed.
-
We begin at w which was at height 0 and is now at height 1. Its
children are leaves so w itself is in balance.
-
Since w was at height 0, the sibling must have
had height -1, 0, or +1.
The height of the sibling has not changed (it is not on
the path from w to the root).
Since a height of -1
is impossible, we have just two possibilities, the "unknown" and
"good" cases.
-
If the sibling was at height 1, their parent was, and still
is, at height 2. Hence no heights other than w's changed and
the tree remains in balance. So in this case, which is illustrated
in the figure above, we are indeed done.
-
If the sibling of w was (and hence is) at height 0, we have the
"unknown" case. The siblings heights now differ by 1, which is
OK, but their parent's height has changed from 1 to 2 (since w is
at height 1). This situation is illustrated at the right.
Note that not all
of the tree is shown. Also note that the black heights are before
the insert and the reds are after (ignore the blue for now).
-
Since w's parent P has had its height changed (from 1 to 2) we need
to check P's sibling. The old height is 1 so the sibling must be
at height 0, 1, or 2. (It is 1 in the diagram.) If we had the
"good" case (height=2) the new height of P and the height of P's
sibling would be equal and the height of their parent would not
have changed so we would have been done (without
re-balancing).
-
In the diagram we again have the "unknown" case and proceed up to
P's parent. Another "unknown" case (node's old height and its
siblings height are equal, namely 2) and we keep going up.
-
We get a final unknown case (both heights 3) and finally reach the
root, which was at height 4. Since it has no sibling there is
nothing that can be imbalanced so we are done, again with no
re-balancing required.
-
What is the problem? We just proceed up as in the figure on the
right and eventually we hit the root and are done.
-
WRONG.
-
In the figure, before the insertion, siblings had the same height,
the "unknown" case.
This is certainly possible but not required.
-
If an ancestor of w had height one less than its sibling, then
the insertion has made them equal. That is the "good" case:
The height of the parent doesn't change and we are done as
illustrated in the previous (small) figure.
-
The "bad" case occurs when an ancestor of w had height one
greater than its sibling. In this case the insertion has made it two
greater, which is not permitted for an AVL tree.
-
For example the light blue node could have originally been at
height 2 instead of 3. We no longer have an AVL tree, the root is
imbalanced: Its children have height 2 (the blue) and 4 (the
red). We must re-balance this tree.
The top left diagram illustrates the problem case from the previous
figure (k was 3 above).
-
Node z is the imbalanced ancestor of w found starting at w and
working up. The tree is out of balance, it is no longer
AVL.
-
Node y is the child of z with larger height (it is an answer of
w). The old height of y was 1 greater than it sibling's (the "bad"
case). Now it is 2 greater.
-
Node x is the child of y with larger height (it is an answer of w
and may in fact equal w). The old height of x was equal to its
sibling's (the "unknown" case). Now it is one greater.
-
In this diagram y is the right child of z and x is the left child
of y. There are three other possibilities, which are also shown
(but with less detail).
-
We will isolate on the three nodes x y and z and their subtrees
and see how to rearrange the diagram to restore balance.
Definition: The operation we will perform when x,
y, and z lie in a straight line is called a single
rotation. When x, y, and z form an angle, the operation is
called a double rotation.
Let us consider the upper left double rotation, which is the rotation
that we need to apply to the example above.
It is redrawn to the right with the subtrees drawn and their heights
labeled. The colors are there so that you can see where they trees go when
we perform the rotation.
Recall that x is
an ancestor of w and has had its height raised from k-1 to k. The
sibling of x is at height k-1 and so is the sibling of y.
The reason x had its height raised is that one of its siblings (say
the right one) has been raised from k-2 to k-1.
How do I know the other one is k-2?
Ans: It must have been the "unknown" case or we would not have
proceeded further up the tree.
The double rotation transforms the picture on top to the one on the
bottom. We now actually are done with the insertion. Let's check the
bottom picture and make sure.
-
The order relation remains intact. That is, every node is greater
than its entire left subtree and less than its entire right
subtree.
-
The tree (now rooted at x) is balanced.
-
Nodes y and z are each at height k. Hence x, the root of this
tree is at height k+1.
-
The tree above, rooted at z, has height k+2.
-
But remember that before the insert z was at height k+1.
-
So the rotated tree (which is after the insert) has the same
height as the original tree before the insert.
-
Hence every node above z in the original tree keeps its original
height so the entire tree is now balanced.
Thus, if an insertion causes an imbalance, just
one rotation re-balances the tree globally.
We will see that for removals it is not this simple.
Here are the three pictures for the remaining three possibilities.
That is, the other double rotation and both single rotations. The
original configuration is shown on top and the result after the
rotation is shown immediately below.
Homework: R-3.3, R-3.4, R-3.5
What is the complexity of insertion?
-
Let n be the number of nodes in the tree before the
insertion.
-
Finding the insertion point is Θ(log n).
-
Expanding the leaf and inserting the item is Θ(1).
-
Walking up the tree looking for an imbalance is Θ(1) per
level, which is O(log n) since the tree has height Θ(log
n).
-
Performing the one needed rotation is
Θ(1).
Hence we have the following
Theorem: The complexity of insertion in an AVL
tree is Θ(log n).