RUN-TIMES FOR SOME TREE ALGORITHMS


(C) Samuel L. Marateck, 1999

The run-time for a tree procedure depends on the number of nodes visited. In a procedure like Inorder that traverses the entire tree, the procedure visits each node a constant number of times, so the run-time is O(n) for a tree with n nodes.

Run-times for procedures that operate on already constructed binary search trees are in general less than O(n) because only one path is followed at each node. For instance, in the following tree, it takes four comparisons to locate the value stored in node 9 (the path followed is described by the nodes 1, 2, 4, and 9). To locate a value stored in any other node of the tree requires that number of comparisons or less.

				1
			       / \
			      2   3
			    /  \ /  \
			   4   5 6   7
			 /   \
			8     9

To get the run-time in general, it helps to consider a full tree. Because each node has two descendants, the number of nodes on each level form a geometric sequence: 1, 2, 4, 8,... The sum of the number of nodes from level 0 to level L is given as the sum of the geometric series, namely, (1) n = 2(L+1) - 1

where it is assumed that the root is at level 0 and its two children are at level 1, and so forth. Thus, if the tree under consideration were full, it would have 15 nodes. The number of nodes on a path from the root to the last generation is L + 1, in our case 4. This is also the maximum number of comparisons to find a value in a node of the tree.

To find the number of comparisons required as a function of the number of nodes, solve equation (1) for L + 1. Thus 2(L + 1) = n + 1, or

	(2)	L + 1 = log(n + 1)

where the log is taken to the base 2. Since O( log(n + 1)) is the same as O(log(n)), the run-time for finding a value in a full BST is O(log(n)). If the data used to form the BST is arranged in a random order, even if the tree is not full the average run-time for finding an element is still O(log n).

If the original data is sorted, however, the BST degenerates into a tree that looks like a linear linked list (we'll call it a linear tree) and the run-time is no longer O(log(n)).

				1
			       /
			     2
			   /
			 3
		       /
		     4
As was shown in class, the run-time required to find a value in the list is O(n).

The Time Analysis for Constructing a BST

Another interesting calculation is the determination of the run-time needed to construct a BST. First, we'll assume that the data to be inserted in the tree are randomly ordered. If there are n nodes already in the BST, it requires O(log n) steps to place the new node in the tree. Since there are n nodes to be inserted, the run-time to construct the tree is O(nlogn).

If the numbers are sorted, resulting in a linear tree, it takes i comparisons to insert the ith number. The sum of these comparisons from 1 to n is the sum of the arithmetic series, namely, ½n(n+1). So the run-time to construct the tree is O(n² ).

The Time Analysis for Determining the Height of a Tree

What is the run-time for determining the height of a tree? The function is :

function height( T : tree ) : integer:
begin
      if T = nil then
         height := -1
      else 
         height := 1 + max( height( T^,left ),  height( T^.right ) )
end;

Since the program must visit all the nodes of the tree, the run-time is O( n )

The Time Analysis for an Algorithm that Determines the Height at each Node

How do we do the time analysis for an algorithm that calculates the height of each node? There are two algorithms to do this. The first visits each node but once and thus its asymptotic behavior is clearly O(n). The second one, uses the height function just discussed. In this section, we will analyse the second algorithm. At each node V visited, the height is calculated. This means visiting each node in V's subtree.

First let's look at a complete tree of 15 nodes and manually count the number of nodes visited. The first subtree with 1 as its root has 15 nodes. The two subtrees in the first generation have 7 nodes each.

                                      1
				    /   \
                                  2       3
                                 / \     / \
                               4    5   6    7
                             /  \  / \ / \  / \
                            8        11        15
Thus aggregate number of nodes visited is the sum of 15 + 2*7 + 4*3 + 8*1, or 15 + 14 + 12 + 8, or approximately, 4*15. But log(16) is 4, so our sum is 15log(16). For large N the asymptotic time analysis yields Nlog(N) .

In effect this is what is happening: for a full tree with many nodes N, in the first generation there are two subtrees with N/2 nodes each. The sum of the nodes on this level is N/2 + N/2 = N. In the next generation there are four subtrees with N/4 nodes each. The sum of the nodes on this generation is again N. For N very large -- this is called the asymptotic limit-- at each level of the tree the sum of the nodes in the subtrees is N. Since there are log(N) levels, the asymptotic behavior is O(NlogN).

The following is a more detailed analysis. Assume that the tree is full and has 127 nodes. The sum is now 127 + 2*63 + 4*31 + 8*15 + 16*7 + 32*3 + 64*1, or 127 + 126 + 124 + 120 + 112 + 96 + 64. This is the same as

N + (N-1) + (N-3) + (N-7) + (N-15) +(N-31) + (N-63)
which is the same as
7*N -(1 + 3 + 7 + 15 + 31 + 63)
where 7 is approximately log(N). For large N, this becomes Nlog(N) - summation(2N - 1) where the summation index N goes from 1 to log(N) - 1. Let's approximate this upper limit by log(N). Summation(2N) for N going from 0 to 4 is 1 + 2 + 4 + 8 + 16 which is 31. This is 25 - 1. In general, Summation(2N) for N going from 0 to m is 2M + 1 - 1. So summation(2N) between 0 and log(N) is (2log(N) + 1) - 1, and summation( - 1) becomes - log(N). But 2log(N) is N, so our asymptotic analysis yields
NlogN - 2N + logN
or
 (N + 1)logN - 2N 
But this is O(NlogN).