## Lecture 10: Trees. 3/4

A tree is a structure that starts at a single point called the root and then branches out from there. Going out from the root, different branches never join back up together.

In computer science, trees are usually drawn, and described, with the root on top, branching downward. Sometimes (for convenience of drawing, or when branching represents movement in time) they are drawn with the root on the left, branching toward the right. Rarely are they drawn like a forest tree, with the root at the bottom.

• There is a collection of nodes, one of which is the root.
• Each node except the root has one parent, and some number of children. If P is the parent of C, then C is a child of P, and vice versa.
• There is an arc connecting a child to its parent.
• If you start at any node, and follow parent arc upwards, you eventually reach the root.
• Therefore there is one and only one path from the root to any given node in the tree.
• A leaf is a node with no children.
• An internal node is a node that has children.
• A node has an associated value field, often called a label.
Node A is an ancestor of node D, and D is a descendant of A if you can get from D to A following parent arcs. That is, A is D's parent; or D's parent's parent; or D's parent's parent's parent; etc.

#### Trees in computer science

• The structure of a program. A file contains class definitions. A class definition contains methods (and other stuff). A method contains compound statements. A statement contains symbols. A symbol contains characters.

Here a node is a meaningful piece of a program, and node P is the parent of C if P directly contains C.

• A particular important part of this is the structure of expressions E.g. the expression
``(2+f(3,x,g(2))/(x-5))'' corresponds to the tree

A handy typewriter format:

```/ ---> + ---> 2
|      |
|      |-> f ---> 3
|             |
|             |-> x
|             |
|             |-> g ---> 2
|
|-> - ---> x
|
|-> 5
```
• The structure of function calls in the execution of a program. Here a node is one particular call to a function, and P is a parent of C if C is a call to a function that occurs while executing P.
• The class structure in Java. Each class has a single superclass. The root is Object
• The directory structure is a tree (disregarding linking, shortcuts, etc.) P is an ancestor of C if C is a subdirectory of P or a file in P.
• Implementation of files in the Unix system (inodes).

#### Trees in the outside world

• Trees and other branching plants (The ``root'' is the trunk. In trees, it occasionally happens that one branch grows into another, violating tree structure.)
• Evolutionary history of species. (Species C is a child of P if C evolved out of P). Strictly a tree for animals; not for plants, in which species can cross-breed.
• Taxonomic strucure of animal kingdom. Nodes are taxonomic categories (e.g. vertebrates, fish, carnivores etc. Leaves are species or individuals.
• Documents. Chapters, sections, subsections, paragraphs, sentences, words, characters. (Each occurrence of a word/character is a separate node.) Renaissance references works often include elaborate tree-structured tables of contents.
• Traditional library cataloguing systems (Dewey decimal, Library of Congress etc.) Categories and subcategories.
• Syntactic structure of sentences (sort of).
```S ---> NP ---> Adj ---> All
|       |
|       |-> Noun ---> men
|
|-> VP ---> VG ---> Aux --> are
|       |
|       |-> Verb ---> created
|
|-> Adj ---> equal
```
Nodes are phrases, labelled with syntactic categories. Root is "S". Leaves are words.
• Political units (countries, states/provinces, cities).
• Hierarchical structure of rigid social organizations, in which each person has a unique superior they report to.

### Terminology

The depth of node N is the number of steps from the root to N. The root has depth 0, its children have depth 1, etc.

The height of the tree is the greatest depth of any node in the tree.

The subtree under node N is the tree that has root N and that contains all the descendants of N.

The branching factor at node N is the number of children of N. The maximal branching factor for the tree is the maximum branching factor at any node.

### Tree variations

#### Structural variants

• Fixed number of children (particularly 2 --- binary tree) vs. variable number of children.
• Internal nodes have the same kind of labels
• Constant depth tree (all leaves at same depth) vs. variable depth.
• Internal nodes have the same kind of labels as leaf nodes vs. different kinds of labels vs. unlabelled.
• Ordered tree: Labels on leaves increase from left to right. Labels on internal nodes increase according to some search sequence (to be discussed).

#### Implementational variants

• Tree arcs: Pointers from parents to children, pointers from children to parents, both.
• Various implementations for list of children.
• Leaves are a different class from internal nodes.

### First implementation

Maximum number of children. Keep the children of node N as an array list.

### Second implementation

Arbitrarily many children (singly linked list). Pointers up and down tree. No distinction between leaves and internal nodes.

### Tree traversals

Preorder: List the label at the node, then recursively preorder all the subtrees.
Mammal, Primate, Human, Gorilla, Rodent, Squirrel, Rat, Carnivore, Skunk, Dog, Cat.

Postorder: Recursively postorder all the subtrees, then list the label at the node.
Human, Gorilla, Primate, Squirrel, Rat, Rodent, Skunk, Dog, Cat, Carnivore, Mammal.

### Binary tree

An internal node has at most two children: the left child and the right child. In this implementation, there are only downward pointers, no upward pointers.

Inorder: Recursively inorder the left subtree, then list the label at the node, then recursively inorder the right subtree.
#LLL, #LL, #LLR, #L, #LRL, #LR, #LRR, #, #RLL, #RL, #RLR, #R, #RRL, #RR, #RRR

### Expression trees

Correspond to arithmetic expression. The label in an internal node is an operator, which operates on the expressions in the left and right subtrees. The label on a leaf is a number. To simplify programming, we put that in a separate field called value.

Evaluate a tree N:

• If N is a leaf, then its value.
• If N is an internal node, then recursively evaluate the left and right subtrees and apply the operator at N to the two values.
ExpressionTree.java
TestExpressionTree.java