# Computer Science 102

Spring 1997

Second draft due: Sun, April 27, 9:00 pm.

Introduction

In this draft you will finish the implementation of Huffman coding.

Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. A binary tree is constructed so that each leaf in the tree and only the leaves contain a character. We consider all left edges of the tree (edges from a node to its left child) to be labelled 0, and all right edges to be labelled 1. The code for any character is the string of bits produced by traversing the edges of the tree from the root to the leaf that contains that character. To produce the best coding, characters with high frequencies should be high in the tree so that they will have short codes. For example consider the following tree:

.
0  /   \ 1
/       \
E     0  /  \  1
/      \
H   0  /   \  1
/       \
L          B

The codes is the above tree are: E = 0, H = 10, L = 110, B = 111.

A Huffman tree is built in the following way: we create a tree node for each character. The node stores the character and the frequency. Initially the data is stored in a sorted list as single tree nodes with the left and right children being nil, just as it appeared in draft #1. Then we repeatedly combine the trees with the two lowest frequencies but now we make them the children of a new node that has the combined frequencies as its frequency and has an arbitrary character for the name. The subtree with the smaller frequency becomes the left subtree of this new node, and the other one becomes the right subtree.

Your ADT for this draft must be changed to incorporate two new methods, Decode_string and print_the_code. Here is the new object definition.

huffman_type = object
PROCEDURE Print;
PROCEDURE Reduce_it;
PROCEDURE Decode_string;
PROCEDURE print_the_code;
private
List : pointer;
end; {object}

2. Print prints the characters and corresponding frequencies on the sorted linked list before any nodes are combined. Already done in draft #1.

3. CombineTree( var list : pointer; p1, p2 : pointer); This is almost identical to Combine for draft #1 except that in addition to giving a value to the combined frequency, it sets the left field of the node created (its type is point_tnode) to p1^.info, and its right field to p2^.info.

4. Reduce(var list : pointer) from draft 1, which now calls CombineTree instead of Combine.

5. Reduce_it Already done in draft #1.

6. PrintCode(tree : point_tnode; stream : string) a recursive routine that prints the Huffman code for each letter in the tree (use the delete procedure [page 482 in the Turbo text] when you back up to the previous node to erase the last character). Better yet, do the concatenation of '0' and '1' to stream in the actual parameter. This way, you don't call delete. PrintCode is called by the method print_the_code.

7. Decode( tree : point_tnode; var code : text) which takes the text file code that consists of compressed binary code (the text file for this on the WEB) and recursively decodes it using the Huffman tree just built. This is called by the method decode_string that reads the text file.

Each procedure should contain preconditions and postconditions. The data for this program to build the tree is the same as in draft #1. The text file with the compressed binary file will be on the WEB.

Your client should call an instance of the object huffman_type and should

• Read the data from the file and form the Huffman tree.
• Print the Huffman code for each letter using PrintCode.
• Decode the compressed binary file.