Due: TUES night 11:59pm, DEC 11.
In this assignment we will study an efficient scheme for for compressing a text file. It is called Huffman coding. A simple method to encode text as a string of 0s and 1s is to represent each character as a unique string. A text message is thus translated into a string of bits. By exploiting the fact that not all characters appear with the same frequency in the text, we can encode rarely used characters with long codes and frequently used ones with short codes.
Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. It uses a binary tree for encoding. We will not construct a tree; but will build an array that will mimic the tree.
A binary tree is constructed so that each leaf in the tree and only the leaves contain a character. We consider all left edges of the tree (edges from a node to its left child) to be labelled 0, and all right edges to be labelled 1. The code for any character is the string of bits produced by traversing the edges of the tree from the root to the leaf that contains that character. To produce the best coding, characters with high frequencies should be high in the tree so that they will have short codes. For example consider the following tree:
* 0 / \ 1 / \ E 0/ \ 1 H /\ 0 / \ 1 L B
The codes in the above tree are: E = 0, H = 10, L = 110, B = 111.
Although we will not be building a tree, it's interesting to know that a Huffman tree is built in the following way: we create a tree node for each character. The node stores the character and the frequency. Initially the data is stored in an instance of the ArrayList. Just as in program Huff3 that we wrote on DEC 4, we will construct an array of Huffman in method init(). We will then instantiate an instance of ArrayList and using add() store the elements of the Huffman array there. We will then sort the elements of the ArrayList on the frequency using Collections.sort(). Note that the Huffman class in Huff3 has a compareTo() method that will be used to sort on the frequency.
The program will contain the following methods.
<40 A><45 E><6 G><8 H><34 I><12 L><9 N><20 O><15 S><2 V><3 W> Sorted list on frequency is: <2 V><3 W><9 N><15 S><6 G><40 A><12 L><45 E><20 O><34 I><8 H> As n decreases n = 10: <5 X><6 G><9 N><15 S><8 H><40 A><12 L><45 E><20 O><34 I> n = 9: <8 H><11 X><9 N><15 S><34 I><40 A><12 L><45 E><20 O> n = 8: <11 X><15 S><12 L><17 X><34 I><40 A><20 O><45 E> n = 7: <15 S><17 X><20 O><45 E><34 I><40 A><23 X> n = 6: <20 O><23 X><32 X><45 E><34 I><40 A> n = 5: <32 X><34 I><40 A><45 E><43 X> n = 4: <40 A><43 X><45 E><66 X> n = 3: <45 E><66 X><83 X> n = 2: <83 X><111 X> n = 1: <194 X>
The ArrayList methods we will use are: