Programming Assignment 3 Computer Science 102

Spring 2009

Due: WED 11:59pm APR 15

the 2nd midterm will be MON APR 20.


As discussed in draft#1, we are studying an efficient scheme for for compressing text files called Huffman coding. We again exploit the fact that not all characters appear with the same frequency in the text, by encoding rarely used characters with long codes and frequently used ones with short codes.

Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. It uses a binary tree for encoding. In Draft #2, however, we use a min-heap as a priority queue instead of a linked list. The big Oh behavior for removing the smallest value (the root) from the heap is O(1) -- it's done in constant time -- and the behavior for reheaping is O(log(n)) since it only depends on the height; whereas, the big Oh behavior for removing the minimum values from a linked list and repositioning the combined treelet is O(n). The big Oh behavior for doing this for n nodes is O(n*log(n)) for the heap and O(n^2) for the linked list. Note that this draft hould be done with non-static methods.

Draft #2

We will use a heap implementation of a priority queue to this program. Note that here we use a min-heap in this project instead of the max-heap we used in class. Here is the skeleton of the class you will be using:

public class Hw3b 
//solves draft #2 for project #3 
     int last;
//the # of nodes in heap. This decreases as succesive pairs of
//minimum nodes are extracted
    final int MAX = 11;//there are 11 pairs of letter-frequency entries

   TreeNode x = new TreeNode[MAX + 1]//one more element for shift up

    class Huffman//defines the data frequency record
    {  public String letter; 
       public int freq;
    class TreeNode implements Comparable < TreeNode>
//defines the treelets comprising the list. 
   {  public Huffman data;
      public TreeNode left, right;

      //you must write the compareTo method


where Huffman and TreeNode are, as before, embedded classes. The keyword public is optional because it is the default. The class TreeNode must of course contain a compareTo method and should also contain a toString() method. Here are the methods to be altered or added to your class of draft #1:

  1. public void initialize(String record) This method takes the record consisting of letters and frequencies and assigns them to the corresponding fields of an instance of Huffman. It then assigns this instance of Huffman to the data part of an instance of TreeNode let's say aux. It assigns aux to an element of an array of Comparable, x[]. This array is an instance (global) variable which represents the heap. Method initialize(String record) is on the web page.

  2. method shift() shifts the elements of x one position, so that parent =child/2 has meaning.

  3. Method build creates a heap by calling shiftUp(j) in a loop. To facilitate your understanding of the heap process let's call shiftUp(j) by the name insert instead.

  4. Method reduceIt() calls reduce in a while loop while an instance variable last is greater than one. The initial value of last is MAX. last represents the value of the highest subscript of the heap as the heap is shrunken.

  5. private void reduce() is similar to the method of the same name in draft #1. It
  6. private void combine(TreeNode p1, TreeNode p2) creates a new instance aux of TreeNode as in draft #1 whose frequency field is the sum of the frequencies in the objects pointed to by p1 and p2, and assigns aux to x[last]. Thus the inserted treelet will be at the end of the heap array. Then it places the element in its proper place in the heap by calling insert(last).

  7. private Comparable deleteMin(int n) saves the top of the heap (the first element in x[]) in temp, and replaces it with the last element in the array. It then reheapifies the heap by calling shiftDown(n-1). This method is defined in the heap sort done in class. Finally it returns temp.

The data for this program is the same as for draft #1.

Your main method should:

The output for forming the heap is on the WEB. In the output, X is the character placed in the letter field for elements produced by combining two other elements.