Programming Assignment #4
Computer Science 102 - Fall 2007
In this assignment we will study an efficient scheme for for compressing a
text file. It is called Huffman coding. A simple method to encode text
as a string of 0s and 1s is to represent each character as a unique string of 8
binary digits (bits). A text message is thus translated into a string of bits.
By exploiting the fact that not all characters appear with the same frequency
in the text, we can encode rarely used characters with long codes and
frequently used ones with short codes.
Given a set of characters and their corresponding frequencies, Huffman
coding will produce an optimal coding scheme. It uses a binary tree for
We will use a binary heap implementation to generate Huffman codings from an
a) Set up a project/package with the following classes:
public class HuffmanTree
// - add a constructor to init the Tree from a HuffmanNode
// - the main method will go here, as well as code to take
// a command-line argument for the input file name
public class HuffmanNode implements Comparable
public String letter;
public Double frequency;
public HuffmanNode left, right;
// - add constructors to initialize letter and frequency, etc.
// - add int compareTo(Object o) method so we can compare
// two HuffmanNodes based on their frequency attributes.
// - we're going to build a .heap. of these HuffmanNodes...
public class BinaryHeap // the heap class as posted
// - use code provided in class except:
// - add an int getSize() method that returns
// the # of elements in the heap...
You will also need the Overflow class in your project/package
b) Add the following methods:
- public HuffmanNode(String letter, Double frequency). This
constructor creates a new HuffmanNode where letter is set to l,
frequency is set to f, and left and right are set to
- public HuffmanNode(HuffmanNode left, HuffmanNode right).
This constructor creates a new HuffmanNode from its two children (i.e. the two nodes passed as
parameters should become children of the new node), setting
the letter variable to the concatenation of left.letter &
right.letter, and the frequency variable to the sum of left.frequency
- public String toString(). Place this method in the
HuffmanNode class. It returns a string of form "<"+letter+",
"+frequency+">". There's no need to recursively
iterate left/right pointers in this method. It may help you to better
debug your assignment if you also add toString() methods in the BinaryHeap
and HuffmanTree classes. I'll let you figure out on your own how these
methods should behave...
- public int compareTo(Object o). Place this method in
the HuffmanNode class. This method casts Object o into a
HuffmanNode object called huff. We then return this.frequency.compareTo(huff.frequency).
This allows us to make a heap of HuffmanNodes where the frequency
determines which node is larger than which... (This is the primary reason
we make frequency of type Double rather than double...)
- public HuffmanTree(HuffmanNode huff)
public void printLegend()is located in the HuffmanTree
class, and it calls printLegend(root,
""), which calls private
void printLegend(HuffmanNode t, String s) which is a recursive
method that works as follows:
- sets this.root to huff
- Before calling this constructor,
we make a BinaryHeap of HuffmanNodes where each node has its left
& right pointers set appropriately via the constructor in 2.
above. Once the final HuffmanNode (containing all the others) is removed
from the heap, we make that into a HuffmanTree object by calling this constructor.
public static BinaryHeap fileToHeap(String filename)
is located in the HuffmanTree class and takes a String for the file name
containing our input (letter & frequency data). The letters and
frequencies are all in the first line of the file, with spaces as
separators. (You may assume each separator is a single space).
We create a single HuffmanNode
for each letter and its frequency, and insert each of these into a new
BinaryHeap. (Note: We don't start the Huffman algorithm just yet. We
merely call the Binary Heap's constructor that takes an array of
Comparables. Don't forget that BinaryHeaps ignore the zeroth entry!!!) .
public static HuffmanTree createFromHeap(BinaryHeap b)
is located in the HuffmanTree class. We run the Huffman algorithm here.
When we have only one element left in the heap, we remove it, and create a
new HuffmanTree object with root set to our removed object...
public static void main(String args) calls fileToHeap() on args, (which means we specify the
name of the file to read at the command-line) and returns a BinaryHeap (bheap,
for example). We then call bheap.printHeap() on
the heap. Next, we call createFromHeap(bheap) on the heap to run our Huffman algorithm which
returns a HuffmanTree, called, here, htree. Finally, we call htree.printLegend() on this HuffmanTree object to print the binary
encodings for each of the letters in our input file...
- If (t.letter.length() >
1) i.e., t contains multiple
characters, then t is NOT a leaf node, so we recursively call printLegend() on
it's left child using string s + "0", and recurse on t's
right child using string s + "1"
- If t.letter is a single character, then
t is a leaf node, and we print out (t.letter+"="+s);
C. The data for this program is posted on the web at huff.txt; or if for
some reason you cannot download the data from the web, you can make your own
file. (The more files you can test on, the better!) The legend for huff.txt is:
A 20 E 24 G 3 H 4 I 17 L 6 N 5 O 10 S 8 V 1 W 2
Be sure to end all contents of the files with a newline!!!
The HuffmanTree class should be runnable from the command-line with huff.txt
as input using something like:
java <package>.HuffmanTree huff.txt
// pkg is optional
If you are using JCreator or NetBeans, be sure to select the option to
execute with command-line arguments, and specify the name of the input file
that way. For Eclipse, add the file-name to the run configuration in the
Run/Run. menu in the .Program arguments. field.