The run-time for a tree procedure depends on the number of nodes visited. In a procedure like Inorder that traverses the entire tree, the procedure visits each node a constant number of times, so the run-time is O(n) for a tree with n nodes.
Run-times for procedures that operate on already constructed binary search trees are in general less than O(n) because only one path is followed at each node. For instance, in the following tree, it takes four comparisons to locate the value stored in node 9 (the path followed is described by the nodes 1, 2, 4, and 9). To locate a value stored in any other node of the tree requires that number of comparisons or less.
1 / \ 2 3 / \ / \ 4 5 6 7 / \ 8 9
To get the run-time in general, it helps to consider a full tree. Because each node has two descendants, the number of nodes on each level form a geometric sequence: 1, 2, 4, 8,... The sum of the number of nodes from level 0 to level L is given as the sum of the geometric series, namely, (1) n = 2(L+1) - 1
where it is assumed that the root is at level 0 and its two children are at level 1, and so forth. Thus, if the tree under consideration were full, it would have 15 nodes. The number of nodes on a path from the root to the last generation is L + 1, in our case 4. This is also the maximum number of comparisons to find a value in a node of the tree.
To find the number of comparisons required as a function of the number of nodes, solve equation (1) for L + 1. Thus 2(L + 1) = n + 1, or
(2) L + 1 = log(n + 1)
where the log is taken to the base 2. Since O( log(n + 1)) is the same as O(log(n)), the run-time for finding a value in a full BST is O(log(n)). If the data used to form the BST is arranged in a random order, even if the tree is not full the average run-time for finding an element is still O(log n).
If the original data is sorted, however, the BST degenerates into a tree that looks like a linear linked list (we'll call it a linear tree) and the run-time is no longer O(log(n)).
1 / 2 / 3 / 4As was shown in class, the run-time required to find a value in the list is O(n).
The Time Analysis for Constructing a BST
Another interesting calculation is the determination of the run-time needed to construct a BST. First, we'll assume that the data to be inserted in the tree are randomly ordered. If there are n nodes already in the BST, it requires O(log n) steps to place the new node in the tree. Since there are n nodes to be inserted, the run-time to construct the tree is O(nlogn).
If the numbers are sorted, resulting in a linear tree, it takes i comparisons to insert the ith number. The sum of these comparisons from 1 to n is the sum of the arithmetic series, namely, ½n(n+1). So the run-time to construct the tree is O(n² ).
The Time Analysis for Determining the Height of a Tree
What is the run-time for determining the height of a tree? The function is :
function height( T : tree ) : integer: begin if T = nil then height := -1 else height := 1 + max( height( T^,left ), height( T^.right ) ) end;Since the program must visit all the nodes of the tree, the run-time is O( n )
The Time Analysis for an Algorithm that Determinines the Height at each Node
How do we do the time analysis for an algorithm that calculates the height of each node? First let's look at a complete tree of 15 nodes and manually count the number of nodes visited.
1 / \ 2 3 / \ / \ 4 5 6 7 / \ / \ / \ / \ 8 11 15Its the sum of 15 + 2*7 + 4*3 + 8*1, or 15 + 14 + 12 + 8, or approximately, 4*15. But log(16) is 4, so our sum is 15log(16). For large N the assymptotic time analysis yields Nlog(N) .
The following is a more detailed analysis. Assume that the tree is full and has 127 nodes. The sum is now 127 + 2*63 + 4*31 + 8*15 + 16*7 + 32*3 + 64*1, or 127 + 126 + 124 + 120 + 112 + 96 + 64. This is the same as
N + (N-1) + (N-3) + (N-7) + (N-15) +(N-31) + (N-63)which is the same as
7*N -(1 + 3 + 7 + 15 + 31 + 63)where 7 is approximately log(N). For large N, this becomes Nlog(N) - summation(2(N - 1)) where the summation index N goes from 1 to log(N) - 1. Summation(2N) between the indicated limits becomes (2log(N)) - 1, and summation( - 1) becomes - (log(N) - 1). But 2log(N) is N, so our assymptotic analysis yields
NlogN - N + logN - 2or
(N + 1)logN + logN + 2But this is O(NlogN).