Spring 2005
Second draft due: WED night, APR 6
Introduction
In this draft you will finish the implementation of Huffman coding.
Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. A binary tree is constructed so that each leaf in the tree and only the leaves contain a character. We consider all left edges of the tree (edges from a node to its left child) to be labelled 0, and all right edges to be labelled 1. The code for any character is the string of bits produced by traversing the edges of the tree from the root to the leaf that contains that character. To produce the best coding, characters with high frequencies should be high in the tree so that they will have short codes. For example consider the following tree:
0 / \ 1 / \ E 0/ \ 1 H /\ 0 / \ 1 L B
The codes is the above tree are: E = 0, H = 10, L = 110, B = 111.
A Huffman tree is built in the following way: we create a tree node for each character. The node stores the character and the frequency. Initially the data is stored in a heap as single tree nodes with the left and right children being null, just as it appeared in draft #2. Then we repeatedly combine the trees with the two lowest frequencies but now we make them the children of a new node that has the combined frequencies as its frequency and has an arbitrary character for the letter. The subtree with the smaller frequency becomes the left subtree of this new node, and the other one becomes the right subtree.
Your program for the last draft must be changed to incorporate two new methods, decode and printCode.
Here are the procedures to be added to your program of draft #1:
Each procedure should contain preconditions and postconditions. The data for this program to build the tree is the same as in draft #2. The text file with the compressed binary file will be on the WEB.
Your main method should