Basic Algorithms: Lecture 9

================ Start Lecture #9 ================

2.3.2 Tree Traversal

Traversing a tree is a systematic method for accessing or "visiting" each node. We will see and analyze three tree traversal algorithms, inorder, preorder, and postorder. They differ in when we visit an internal node relative to its children. In preorder we visit the node first, in postorder we visit it last, and in inorder, which is only defined for binary trees, we visit the node between visiting the left and right children.

Recursion will be a very big deal in traversing trees!!

Motivating the Recursive Routines

On the right are three trees. The left one just has a root, the right has a root with one leaf as a child, and the middle one has six nodes. For each node, the element in that node is shown inside the box. All three roots are labeled and 2 other nodes are also labeled. That is, we give a name to the position, e.g. the left most root is position v. We write the name of the position under the box. We call the left tree T0 to remind us it has height zero. Similarly the other two are labeled T2 and T1 respectively.

Our goal in this motivation is to calculate the sum the elements in all the nodes of each tree. The answers are, from left to right, 8, 28, and 9.

For a start, lets write an algorithm called treeSum0 that calculates the sum for trees of height zero. In fact the algorithm, will contain two parameters, the tree and a node (position) in that tree, and our algorithm will calculate the sum in the subtree rooted at the given position assuming the position is at height 0. Note this is trivial: since the node has height zero, it has no children and the sum desired is simply the element in this node. So legal invocations would include treeSum0(T0,s) and treeSum0(T2,t). Illegal invocations would include treeSum0(T0,t) and treeSum0(T1,r).

Algorithm treeSum0(T,v)
  Inputs: T a tree; v a height 0 node of T
  Output: The sum of the elements of the subtree routed at v

  Sum←v.element()
  return Sum

Now lets write treeSum1(T,v), which calculates the sum for a node at height 1. It will use treeSum0 to calculate the sum for each child.

Algorithm treeSum1(T,v)
   Inputs: T a tree; v a height 1 node of T
   Output: the sum of the elements of the subtree routed at v

   Sum←v.element()
   for each child c of v
      Sum←Sum+treeSum0(T,c)
   return Sum

OK. How about height 2?

Algorithm treeSum2(T,v)
   Inputs: T a tree; v a height 2 node of T
   Output: the sum of the elements of the subtree routed at v

   Sum←v.element()
   for each child c of v
      Sum←Sum+treeSum1(T,c)
   return Sum

So all we have to do is to write treeSum3, treSum4, ... , where treSum3 invokes treeSum2, treeSum4 invokes treeSum3, ... .

That would be, literally, an infinite amount of work.

Do a diff of treeSum1 and treeSum2.
What do you find are the differences?
In the Algorithm line and in the first comment a 1 becomes a 2.
In the subroutine call a 0 becomes a 1.

Why can't we write treeSumI and let I vary?
Because it is illegal to have a varying name for an algorithm.

The solution is to make the I a parameter and write

Algorithm treeSum(i,T,v)
   Inputs: i≥0; T a tree; v a height i node of T
   Output: the sum of the elements of the subtree routed at v

   Sum←v.element()
   for each child c of v
      Sum←Sum+treeSum(i-1,T,c)
   return Sum

This is wrong, why?
Because treeSum(0,T,v) invokes treeSum(-1,c,v), which doesn't exist because i<0

But treeSum(0,T,v) doesn't have to call anything since v can't have any children (the height of v is 0). So we get

Algorithm treeSum(i,T,v)
   Inputs: i≥0; T a tree; v a height i node of T
   Output: the sum of the elements of the subtree routed at v

   Sum←v.element()
   if i>0 then
      for each child c of v
         Sum←Sum+treeSum(i-1,T,c)
   return Sum

The last two algorithms are recursive; they call themselves. Note that when treeSum(3,T,v) calls treeSum(2,T,c), the new treeSum has new variables Sum and c.

We are pretty happy with our treeSum routine, but ...

The algorithm is wrong! Why?
The children of a height i node need not all be of height i-1. For example s is hight 2, but its left child w is height 0. (A corresponding error also existed in treeSum2(T,v)

But the only real use we are making of i is to prevent us from recursing when we are at a leaf (the i>0 test). But we can use isInternal instead, giving our final algorithm

Algorithm treeSum(T,v)
   Inputs: T a tree; v a node of T
   Output: the sum of the elements of the subtree routed at v

   Sum←v.element()
   if T.isInternal(v) then
      for each child c of v
         Sum←Sum+treeSum(T,c)
   return Sum

Our medium term goal is to learn about tree traversals (how to "visit" each node of a tree once) and to analyze their complexity.

Complexity of Primitive Operations

Our complexity analysis will proceed in a somewhat unusual order. Instead of starting with the bottom or lowest level routines (the tree methods in 2.3.1, e.g., is Internal(v)) or the top level routines (the traversals themselves), we will begin by analyzing some middle level procedures assuming the complexities of the low level are as we assert them to be. Then we will analyze the traversals using the middle level routines and finally we will give data structures for trees that achieve our assumed complexity for the low level.

Let's begin!

Complexity Assumptions for the Tree ADT

These assumptions will be verified later.

root(), parent(v), isInternal(v), isLeaf(v), isRoot(v), swapElements(v,w), replaceElement(v,e) each take Θ(1) time.
The methods returning iterators, namely children(v), elements(), and positions(), each take time Θ(k), where k is the number of items being iterated over. k=#children for the first method and #nodes for the other two.
For each iterator, the methods hasNext() and nextObject() take Θ(1) time. nextObject() sometimes has other names like nextPosition() or nextNode() or nextChild().

Middle level routines depth and height

Definitions of depth and height.

The depth of the root is 0.
The height of a leaf is 0.
The depth of a non-root v is 1 plus the depth of parent(v).
The height of an internal node v is 1 plus the maximum height of the children of v.
The height of a tree is the height of its root.

Remark: Even our definitions are recursive!

From the recursive definition of depth, the recursive algorithm for its computation essentially writes itself.

Algorithm depth(T,v)
   if T.isRoot(v) then
      return 0
   else
      return 1 + depth(T,T.parent(v))

The complexity is Θ(the answer), i.e. Θ(d_v), where d_v is the depth of v in the tree T.

The following algorithm computes the height of a position in a tree.

Algorithm height(T,v):
   if T.isLeaf(v) then
      return 0
   else
      h←0
      for each w in T.children(v) do
         h←max(h,height(T,w))
      return h+1

Remarks on the above algorithm

The loop could (perhaps should) be written in pure iterator style. Note that T.children(v) is an iterator.
This algorithm is not so easy to convert to non-recursive form
Why?
It is not tail-recursive, i.e. the recursive invocation is not just at the end.
To get the height of the tree, execute height(T,T.root())

Algorithm height(T)
    height(T,T.root())

Let's use the "official" iterator style.

Algorithm height(T,v):
    if T.isLeaf then
       return 0
    else
       h←0
       childrenOfV←T.children(v)    // "official" iterator style
       while childrenOfV.hasNext()
          h&lar;max(h,height(T,childrenOfV.nextObject())
       return h+1

But the children iterator is defined to return the empty set for a leaf so we don't need the special case

Algorithm height(T,v):
     h←0
     childrenOfV←T.children(v)    // "official" iterator style
     while childrenOfV.hasNext()
        h&lar;max(h,height(T,childrenOfV.nextObject())
     return h+1

Theorem: Let T be a tree with n nodes and let c_v be the number of children of node v. The sum of c_v over all nodes of the tree is n-1.

Proof: This is trivial! ... once you figure out what it is saying. The sum gives the total number of children in a tree. But this almost all nodes. Indeed, there is just one exception.
What is the exception?
The root.

Corollary: Computing the height of an n-node tree has time complexity Θ(n).

Proof: Look at the code of the first version.

Since each node calls height on each of its children, height is called recursively for every node that is a child, i.e. all but the root. The root is called directly from the top level.
Hence height(T,v) is called once for each node v.
"Most" of the code in height is clearly Θ(1) per call, or Θ(n) in total. The exception is the loop.
Each iteration of the loop is Θ(1).
The number of iterations in the invocation height(T,v) is the number of children in v.
Hence the total number of iterations is the total number of children, which by the theorem is n-1.
Hence the total cost of all the iterations is Θ(n).
Hence the total cost of height(T) is the cost of "most" of the code plus the cost of the loops, which is Θ(N)+Θ(N), i.e. Θ(N).

To be more formal, we should look at the "official" iterator version. The only real difference is that in the official version, we are charged for creating the iterator. But the charge is the number of elements in the iterator, i.e., the number of children this node has. So the sum of all the charges for creating iterators will be the sum of the number of children each node has, which is the total number of children, which is n-1, which is (another) $Theta;(n) and hence doesn't change the final answer.

Do a few on the board. As mentioned above, becoming facile with recursion is vital for tree analyses.

Definition: A traversal is a systematic way of "visiting" every node in a tree.

Preorder Traversal

Visit the root and then recursively traverse each child. More formally we first give the procedure for a preorder traversal starting at any node and then define a preorder traversal of the entire tree as a preorder traversal of the root.

Algorithm preorder(T,v):
   visit node v
   for each child c of v
      preorder(T,c)

Algorithm preorder(T):
   preorder(T,T.root())

Remarks:

In a preorder traversal, parents come before children (which is as it should be :-)).
If you describe a book as an ordered tree, with nodes for each chapter, section, etc., then the pre-order traversal visits the nodes in the order they would appear in a table of contents.

Do a few on the board. As mentioned above, becoming facile with recursion is vital for tree analyses.

Theorem: Preorder traversal of a tree with n nodes has complexity Θ(n).

Proof: Just like height.
The nonrecursive part of each invocation takes O(1+c_v)
There are n invocations and the sum of the c's is n-1.

Homework: R-2.3

Postorder Traversal

First recursively traverse each child then visit the root. More formerly

Algorithm postorder(T,v):
   for each child c of v
      postorder(T,c)
   visit node v

Algorithm postorder(T):
   postorder(T,T.root())

Theorem: Preorder traversal of a tree with n nodes has complexity Θ(n).

Proof: The same as for preorder.

Remarks:

Postorder is how you evaluate an arithmetic expression tree.
Evaluate some arithmetic expression trees.
When you write out the nodes in the order visited your get what is called either "reverse polish notation" or "polish notation"; I don't remember which.
If you do preorder traversal you get the other one.

Problem Set 2, Problem 1. Note that the height of a tree is the depth of a deepest node. Extend the height algorithm so that it returns in addition to the height the v.element() for some v that is of maximal depth. Note that the height algorithm is for an arbitrary (not necessarily binary) tree; your extension should also work for arbitrary trees (this is *not* harder).

2.3.3 Binary Trees

Recall that a binary tree is an ordered tree in which no node has more than two children. The left child is ordered before the right child.

The book adopts the convention that, unless otherwise mentioned, the term "binary tree" will mean "proper binary tree", i.e., all internal nodes have two children. This is a little convenient, but not a big deal. If you instead permitted non-proper binary trees, you would test if a left child existed before traversing it (similarly for right child.)

Will do binary preorder (first visit the node, then the left subtree, then the right subtree, binary postorder (left subtree, right subtree, node) and then inorder (left subtree, node, right subtree).

The Binary Tree ADT

We have three (accessor) methods in addition to the general tree methods.

leftChild(v): Return the (position of the) left child; signal an error if v is a leaf.
rightChild(v): Similar
sibling(v): Return the (unique) sibling of v; signal an error if v is the root (and hence has no sibling).

Allan Gottlieb