# Computer Science 102

Spring 2009

Due: MON night 11:59pm, MAR 30.

Introduction

In this assignment we will study an efficient scheme for for compressing a text file. It is called Huffman coding. A simple method to encode text as a string of 0s and 1s is to represent each character as a unique string. A text message is thus translated into a string of bits. By exploiting the fact that not all characters appear with the same frequency in the text, we can encode rarely used characters with long codes and frequently used ones with short codes.

Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. It uses a binary tree for encoding.

A binary tree is constructed so that each leaf in the tree and only the leaves contain a character. We consider all left edges of the tree (edges from a node to its left child) to be labelled 0, and all right edges to be labelled 1. The code for any character is the string of bits produced by traversing the edges of the tree from the root to the leaf that contains that character. To produce the best coding, characters with high frequencies should be high in the tree so that they will have short codes. For example consider the following tree:

```                                 *
0 /   \ 1
/     \
E     0/ \  1
H   /\
0 /  \  1
L     B

```

The codes is the above tree are: E = 0, H = 10, L = 110, B = 111.

A Huffman tree is built in the following way: we create a tree node for each character. The node stores the character and the frequency. Initially the data is stored in a linked list of single tree nodes sorted by frequency with the left and right children being null. Then we repeatedly combine the trees with the two lowest frequencies and make them the children of a new node that has the combined frequencies as its frequency and has an arbitrary character for the letter. The subtree with the smaller frequency becomes the left subtree of this new node, and the other one becomes the right subtree. The new node is then placed in its sorted position in the linked list.

We will use a linked list implementation to write the program. Here is the skeleton of the class (it's on the WEB) you will be using:

```class HuffADTa
//solves draft #1 for project #3
{   ListNode list;
final int MAX = 11;//there are 11 pairs of letter-frequency entries

private class HuffRecord//defines the data frequency record
{  public String letter;
public int freq;
}

private class TreeNode//defines the treelets comprising the list
{  public HuffRecord data;
public TreeNode left, right;
}

private class ListNode//defines the linked list. Each node is a treelet
{ public TreeNode info;
public ListNode next;
}
}
```
where HuffRecord, TreeNode, and ListNode are all embedded classes. The keyword public is optional because it is the default. Use the program we did in class displayed at:

http://cs.nyu.edu/courses/spring09/V22.0102-002/mar23/Prog1.java

since it contains these three embedded classes. Note that it uses the Scanner class to read the file. Add the following procedures to the program:

1. private ListNode delFirst() removes the first element of the linked list pointed to by list, and returns the pointer to that element in let's say smallfreqNode.

2. private void combineTree(ListNode p1, ListNode p2) creates a new node temp of type TreeNode whose frequency is the sum of the frequencies in the nodes pointed to by p1 and p2, and inserts that new node in its proper place in the linked list list by calling insertList. It also sets the left field of the node created (its type is TreeNode) to p1 properly dereferenced, and its right field to p2. Note that you must also instantiate an instance of Huffman so that temp.data.freq does not throw a null pointer exception.

3. private void reduce()
• calls DelFirst() twice to obtain the two nodes within the linked list of minimum frequency, the first call should return a pointer p1, and the second call should return a pointer p2.
• calls CombineTree with the above two pointers as parameters.

4. public void reduceIt() calls reduce() and prints the resulting linked list until there is only one node remaining on the list. The frequency field of this node will automatically contain the sum of all the frequencies on the original linked list. Use a while loop for this and remember that reduce is decreasing the number of nodes in the list.

5. private void printCode(TreeNode tree, String s) a recursive routine that prints the Huffman code for each letter in the tree. Do the concatenation of '0' and '1' to s in the actual parameter. printCode is called by the method printCode() that has no parameters.

6. private void decode() which reads the text file code that consists of compressed binary code (the text file for this on the WEB at huffcode.txt ) and decodes it using the Huffman tree just built. This method should not be recursive nor should it call a method that is recursive.
Each procedure should contain preconditions and postconditions (see StackADT2 on FEB 2, for instance). The data for this program is posted on the web at huff.txt ; Or if for some reason you cannot download the data from the web, you can make your own file. The data is: A 40 E 45 G 6 H 8 I 34 L 12 N 9 O 20 S 15 V 2 W 3

In printCode() and decode(), you must define the root of the tree.

Each method should contain preconditions and postconditions. The text file with the compressed binary file will be on the WEB.