Basic Algorithms: Lecture 10

================ Start Lecture #10 ================

Remark: I will not hold you responsible for the proofs of the theorems.

Theorem: Let T be a binary tree having height h and n nodes. Then

The number of leaves in T is at least h+1 and at most 2^h.
The number of internal nodes in T is at least h and at most 2^h-1.
The number of nodes in T is at least 2h+1 and at most 2^h+1-1.
log(n+1)-1≤h≤(n-1)/2

Proof:

Induction on n the number of nodes in the tree.
Base case n=1: Clearly true for all trees having only one node.
Induction hypothesis: Assume true for all trees having at most k nodes.
Main inductive step: prove the assertion for all trees having k+1 nodes. Let T be a tree with k nodes and let h be the height of T.
Remove the root of T. The two subtrees produced each have no more than k nodes so satisfy the assertion. Since each has height at most h-1, each has at most 2^h-1 leaves. At least one of the subtrees has height exactly h-1 and hence has at least h leaves. Put the original tree back together.
One subtree has at least h leaves, the other has at least 1, so the original tree has at least h+1. Each subtree has at most 2^h-1 leaves and the original root is not a leaf, so the original has at most 2^h leaves.
Same idea. Induction on the number of nodes. Clearly true if T has one node. Remove the root. At least one of the subtrees is height h-1 and hence has at least h-1 internal nodes. Each of the subtrees has height at most h-1 so has at most 2^h-1-1 internal nodes. Put the original tree back together. One subtree has at least h-1 internal nodes, the other has at least 1, so the original tree has at least h. Each subtree has at most 2^h-1-1 internal nodes and the original root is an internal node, so the original has at most 2(2^h-1)+1 = 2^h-1-1 internal nodes.
Add parts 1 and 2.
Apply algebra (including a log) to part 3.

Theorem:In a binary tree T, the number of leaves is 1 more than the number of internal nodes.

Proof: Again induction on the number of nodes. Clearly true for one node. Assume true for trees with up to n nodes and let T be a tree with n+1 nodes. For example T is the top tree on the right.

Choose a leaf and its parent (which of course is internal). For example, the leaf t and parent s in red.
Remove the leaf and its parent (middle diagram)
Splice the tree back without the two nodes (bottom diagram).
Since S has n-1 nodes, S satisfies the assertion.
Note that T is just S + one leaf + one internal so also satisfies the assertion.

Alternate Proof (does not use the pictures):

Place two tokens on each internal node.
Push these tokens to the two children.
Notice that now all nodes but the root have one token; the root has none.
Hence 2*(number internal) = (number internal) + (number leaves) + 1
Done (slick!)

Corollary: A binary tree has an odd number of nodes.

Proof: #nodes = #leaves + #internal = 2(#internal)+1.

Preorder traversal of a binary tree

Algorithm binaryPreorder(T,v)
   Visit node v
   if T.isInternal(v) then
      binaryPreorder(T,T.leftChild(v))
      binaryPreorder(T,T.rightChild(v))

Algorithm binaryPretorder(T)
   binaryPreorder(T,T.root())

Postorder traversal of a binary tree

Algorithm binaryPostorder(T,v)
   if T.isInternal(v) then
      binaryPostorder(T,T.leftChild(v))
      binaryPostorder(T,T.rightChild(v))
   Visit node v

Algorithm binaryPosttorder(T)
   binaryPostorder(T,T.root())

Inorder traversal of a binary tree

Algorithm binaryInorder(T,v)
   if T.isInternal(v) then
      binaryInorder(T,T.leftChild(v))
   Visit node v
   if T.isInternal(v) then
      binaryInorder(T,T.rightChild(v))

Algorithm binaryIntorder(T)
   binaryPostorder(T,T.root())

Definition: A binary tree is fully complete if all the leaves are at the same (maximum) depth. This is the same as saying that the sibling of a leaf is a leaf.

Euler tour traversal

Generalizes the above. Visit the node three times, first when ``going left'', then ``going right'', then ``going up''. Perhaps the words should be ``going to go left'', ``going to go right'' and ``going to go up''. These words work for internal nodes. For a leaf you just visit it three times in a row (or you could put in code to only visit a leaf once; I don't do this). It is called an Euler Tour traversal because an Euler tour of a graph is a way of drawing each edge exactly once without taking your pen off the paper. The Euler tour traversal would draw each edge twice but if you add in the parent pointers, each edge is drawn once.

The book uses ``on the left'', ``from below'', ``on the right''. I prefer my names, but you may use either.

Algorithm eulerTour(T,v):
   visit v going left
   if T.isInternal(v) then
      eulerTour(T,T.leftChild(v))
   visit v going right
   if T.isInternal(v) then
      eulerTour(T,T.rightChild(v))
   visit v going up

Algorithm eulerTour(T):
   eulerTour(T,T.root))

Pre- post- and in-order traversals are special cases where two of the three visits are dropped.

It is quite useful to have this three visits. For example here is a nifty algorithm to print and expression tree with parentheses to indicate the order of the operations. We just give the three visits.

Algorithm visitGoingLeft(v):
   if T.isInternal(v) then
      print "("

Algorithm visitGoingRight(v)
   print v.element()

Algorithm visitGoingUp(v)
   if T.isInternal(v) then
      print ")"

Homework: Plug these in to the Euler Tour and show that what you get is the same as

Algorithm printExpression(T,v):
   input: T an expression tree v a node in T.
   if T.isLeaf(v) then
      print v.element()  // for a leaf the element is a value
   else
      print "("
      printExpression(T,T.leftChild(v))
      print v.element()  // for an internal node the element is an operator
      printExpression(T,T.rightChild(v))
      print ")"

Algorithm printExpression(T):
   printExpression(T,T.root())

Problem Set 2 problem 2. We have seen that traversals have complexity Θ(N), where N is the number of nodes in the tree. But we didn't count the costs of the visit()s themselves since the user writes that code. We know that visit() will be called N times, once per node, for post-, pre-, and in-order traversals and will be called 3N times for Euler tour traversal. So if each visit costs Θ(1), the total visit cost will be Θ(N) and thus does not increase the complexity of a traversal. If each visit costs Θ(N), the total visit cost will be Θ(N²) and hence the total traversal cost will be Θ(N²). The same analysis works for any visit cost providing all the visits cost the same. For this problem we will be considering a variable cost visits. In particular, assume that the cost of visiting a node v is the height of v (so roots can be expensive to visit, but leaves are free).

Part A. How many nodes N are in a fully complete binary tree of height h?

Part B. How many nodes are at height i in a fully complete binary tree of height h? What is the total cost of visiting all the nodes at height i?

Part C. Write a formula using ∑ (sum) for the total cost of visiting all the nodes. This is very easy given B.

One point extra credit. Show that the sum you wrote in part C is Θ(N).

Part D. Continue to assume the cost of visiting a node equals its height. Describe a class of binary trees for which the total cost of visiting the nodes is θ(N²). Naturally these will not be fully complete binary trees. Hint do problem 3.

2.3.4 Data Structures for representing trees

A vector-based implementation for Binary Trees

We store each node as the element of a vector. Store the root in element 1 of the vector and the key idea is that we store the two children of the element at rank r in the elements at rank 2r and 2r+1.

Draw a fully complete binary tree of height 3 and show where each element is stored.

Draw an incomplete binary tree of height 3 and show where each element is stored and that there are gaps.

There must be a way to tell leaves from internal nodes. The book doesn't make this explicit. Here is an explicit example. Let the vector S be given. With a vector we have the current size. S[0] is not used. S[1] has a pointer to the root node (or contains the root node if you prefer). For each S[i], S[i] is null (a special value) if the corresponding node doesn't exist). Then to see if the node v at rank i is a leaf, look at 2i. If 2i exceeds S.size() then v is a leaf since it has no children. Similarly if S[2i] is null, v is a leaf. Otherwise v is external.
How do you know that if S[2i] is null, then s[2i+1] will be null?
Ans: Our binary trees are proper.

This implementation is very fast. Indeed all tree operations are O(1) except for positions() and elements(), which produce n results and take time Θ(n).

Homework: R-2.7

However, this implementation can waste a lot of space since many of the entries in S might be unused. That is there may be many i for which S[i] is null.

Problem Set 2 problem 3. Give a tree with fewer than 20 nodes for which S.size() exceeds 100. Give a tree with fewer than 25 nodes for which S.size() exceeds 1000. Give a tree with fewer than 100 nodes for which S.size() exceeds a million.

Allan Gottlieb