Basic Algorithms

================ Start Lecture #9 ================

Postorder Traversal

First recursively traverse each child then visit the root. More formerly

Algorithm postorder(T,v):
   for each child c of v
      postorder(T,c)
   visit node v

Algorithm postorder(T):
   postorder(T,T.root())

Theorem: Preorder traversal of a tree with n nodes has complexity Θ(n).

Proof: The same as for preorder.

Remarks:

Postorder is how you evaluate an arithmetic expression tree.
Evaluate some arithmetic expression trees.
When you write out the nodes in the order visited your get what is called either "reverse polish notation" or "polish notation"; I don't remember which.
If you do preorder traversal you get the other one.

Problem Set 2, Problem 1. Note that the height of a tree is the depth of a deepest node. Extend the height algorithm so that it returns in addition to the height the v.element() for some v that is of maximal depth.

2.3.3 Binary Trees

Recall that a binary tree is an ordered tree in which no node has more than two children. The left child is ordered before the right child.

The book adopts the convention that, unless otherwise mentioned, the term "binary tree" will mean "proper binary tree", i.e., all internal nodes have two children. This is a little convenient, but not a big deal. If you instead permitted non-proper binary trees, you would test if a left child existed before traversing it (similarly for right child.)

Will do binary preorder (first visit the node, then the left subtree, then the right subtree, binary postorder (left subtree, right subtree, node) and then inorder (left subtree, node, right subtree).

The Binary Tree ADT

We have three (accessor) methods in addition to the general tree methods.

leftChild(v): Return the (position of the) left child; signal an error if v is a leaf.
rightChild(v): Similar
sibling(v): Return the (unique) sibling of v; signal an error if v is the root (and hence has no sibling).

Remark: I will not hold you responsible for the proofs of the theorems.

Theorem: Let T be a binary tree having height h and n nodes. Then

The number of leaves in T is at least h+1 and at most 2^h.
The number of internal nodes in T is at least h and at most 2^h-1.
The number of nodes in T is at least 2h+1 and at most 2^h+1-1.
log(n+1)-1≤h≤(n-1)/2

Proof:

Induction on n the number of nodes in the tree.
Base case n=1: Clearly true for all trees having only one node.
Induction hypothesis: Assume true for all trees having at most k nodes.
Main inductive step: prove the assertion for all trees having k+1 nodes. Let T be a tree with k nodes and let h be the height of T.
Remove the root of T. The two subtrees produced each have no more than k nodes so satisfy the assertion. Since each has height at most h-1, each has at most 2^h-1 leaves. At least one of the subtrees has height exactly h-1 and hence has at least h leaves. Put the original tree back together.
One subtree has at least h leaves, the other has at least 1, so the original tree has at least h+1. Each subtree has at most 2^h-1 leaves and the original root is not a leaf, so the original has at most 2^h leaves.
Same idea. Induction on the number of nodes. Clearly true if T has one node. Remove the root. At least one of the subtrees is height h-1 and hence has at least h-1 internal nodes. Each of the subtrees has height at most h-1 so has at most 2^h-1-1 internal nodes. Put the original tree back together. One subtree has at least h-1 internal nodes, the other has at least 1, so the original tree has at least h. Each subtree has at most 2^h-1-1 internal nodes and the original root is an internal node, so the original has at most 2(2^h-1)+1 = 2^h-1-1 internal nodes.
Add parts 1 and 2.
Apply algebra (including a log) to part 3.

Theorem:In a binary tree T, the number of leaves is 1 more than the number of internal nodes.

Proof: Again induction on the number of nodes. Clearly true for one node. Assume true for trees with up to n nodes and let T be a tree with n+1 nodes. For example T is the top tree on the right.

Choose a leaf and its parent (which of course is internal). For example, the leaf t and parent s in red.
Remove the leaf and its parent (middle diagram)
Splice the tree back without the two nodes (bottom diagram).
Since S has n-1 nodes, S satisfies the assertion.
Note that T is just S + one leaf + one internal so also satisfies the assertion.

Alternate Proof (does not use the pictures):

Place two tokens on each internal node.
Push these tokens to the two children.
Notice that now all nodes but the root have one token; the root has none.
Hence 2*(number internal) = (number internal) + (number leaves) + 1
Done (slick!)

Corollary: A binary tree has an odd number of nodes.

Proof: #nodes = #leaves + #internal = 2(#internal)+1.

Preorder traversal of a binary tree

Algorithm binaryPreorder(T,v)
   Visit node v
   if T.isInternal(v) then
      binaryPreorder(T,T.leftChild(v))
      binaryPreorder(T,T.rightChild(v))

Algorithm binaryPretorder(T)
   binaryPreorder(T,T.root())

Postorder traversal of a binary tree

Algorithm binaryPostorder(T,v)
   if T.isInternal(v) then
      binaryPostorder(T,T.leftChild(v))
      binaryPostorder(T,T.rightChild(v))
   Visit node v

Algorithm binaryPosttorder(T)
   binaryPostorder(T,T.root())

Inorder traversal of a binary tree

Algorithm binaryInorder(T,v)
   if T.isInternal(v) then
      binaryInorder(T,T.leftChild(v))
   Visit node v
   if T.isInternal(v) then
      binaryInorder(T,T.rightChild(v))

Algorithm binaryIntorder(T)
   binaryPostorder(T,T.root())

Definition: A binary tree is fully complete if all the leaves are at the same (maximum) depth. This is the same as saying that the sibling of a leaf is a leaf.

Euler tour traversal

Generalizes the above. Visit the node three times, first when ``going left'', then ``going right'', then ``going up''. Perhaps the words should be ``going to go left'', ``going to go right'' and ``going to go up''. These words work for internal nodes. For a leaf you just visit it three times in a row (or you could put in code to only visit a leaf once; I don't do this). It is called an Euler Tour traversal because an Euler tour of a graph is a way of drawing each edge exactly once without taking your pen off the paper. The Euler tour traversal would draw each edge twice but if you add in the parent pointers, each edge is drawn once.

The book uses ``on the left'', ``from below'', ``on the right''. I prefer my names, but you may use either.

Algorithm eulerTour(T,v):
   visit v going left
   if T.isInternal(v) then
      eulerTour(T,T.leftChild(v))
   visit v going right
   if T.isInternal(v) then
      eulerTour(T,T.rightChild(v))
   visit v going up

Algorithm eulerTour(T):
   eulerTour(T,T.root))

Pre- post- and in-order traversals are special cases where two of the three visits are dropped.

It is quite useful to have this three visits. For example here is a nifty algorithm to print and expression tree with parentheses to indicate the order of the operations. We just give the three visits.

Algorithm visitGoingLeft(v):
   if T.isInternal(v) then
      print "("

Algorithm visitGoingRight(v)
   print v.element()

Algorithm visitGoingUp(v)
   if T.isInternal(v) then
      print ")"

Homework: Plug these in to the Euler Tour and show that what you get is the same as

Algorithm printExpression(T,v):
   input: T an expression tree v a node in T.
   if T.isLeaf(v) then
      print v.element()  // for a leaf the element is a value
   else
      print "("
      printExpression(T,T.leftChild(v))
      print v.element()  // for an internal node the element is an operator
      printExpression(T,T.rightChild(v))
      print ")"

Algorithm printExpression(T):
   printExpression(T,T.root())

Problem Set 2 problem 2. We have seen that traversals have complexity Θ(N), where N is the number of nodes in the tree. But we didn't count the costs of the visit()s themselves since the user writes that code. We know that visit() will be called N times, once per node, for post-, pre-, and in-order traversals and will be called 3N times for Euler tour traversal. So if each visit costs Θ(1), the total visit cost will be Θ(N) and thus does not increase the complexity of a traversal. If each visit costs Θ(N), the total visit cost will be Θ(N²) and hence the total traversal cost will be Θ(N²). The same analysis works for any visit cost providing all the visits cost the same. For this problem we will be considering a variable cost visits. In particular, assume that the cost of visiting a node v is the height of v (so roots can be expensive to visit, but leaves are free).

Part A. How many nodes N are in a fully complete binary tree of height h?

Part B. How many nodes are at height i in a fully complete binary tree of height h? What is the total cost of visiting all the nodes at height i?

Part C. Write a formula using ∑ (sum) for the total cost of visiting all the nodes. This is very easy given B.

One point extra credit. Show that the sum you wrote in part C is Θ(N).

Part D. Continue to assume the cost of visiting a node equals its height. Describe a class of binary trees for which the total cost of visiting the nodes is θ(N²). Naturally these will not be fully complete binary trees. Hint do problem 3.

2.3.4 Data Structures for representing trees

A vector-based implementation for Binary Trees

We store each node as the element of a vector. Store the root in element 1 of the vector and the key idea is that we store the two children of the element at rank r in the elements at rank 2r and 2r+1.

Draw a fully complete binary tree of height 3 and show where each element is stored.

Draw an incomplete binary tree of height 3 and show where each element is stored and that there are gaps.

There must be a way to tell leaves from internal nodes. The book doesn't make this explicit. Here is an explicit example. Let the vector S be given. With a vector we have the current size. S[0] is not used. S[1] has a pointer to the root node (or contains the root node if you prefer). For each S[i], S[i] is null (a special value) if the corresponding node doesn't exist). Then to see if the node v at rank i is a leaf, look at 2i. If 2i exceeds S.size() then v is a leaf since it has no children. Similarly if S[2i] is null, v is a leaf. Otherwise v is external.
How do you know that if S[2i] is null, then s[2i+1] will be null?
Ans: Our binary trees are proper.

This implementation is very fast. Indeed all tree operations are O(1) except for positions() and elements(), which produce n results and take time Θ(n).

Homework: R-2.7