# Computer Science 101

Fall 2012

Due: TUES night 11:59pm, DEC 11.

Introduction

In this assignment we will study an efficient scheme for for compressing a text file. It is called Huffman coding. A simple method to encode text as a string of 0s and 1s is to represent each character as a unique string. A text message is thus translated into a string of bits. By exploiting the fact that not all characters appear with the same frequency in the text, we can encode rarely used characters with long codes and frequently used ones with short codes.

Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. It uses a binary tree for encoding. We will not construct a tree; but will build an array that will mimic the tree.

A binary tree is constructed so that each leaf in the tree and only the leaves contain a character. We consider all left edges of the tree (edges from a node to its left child) to be labelled 0, and all right edges to be labelled 1. The code for any character is the string of bits produced by traversing the edges of the tree from the root to the leaf that contains that character. To produce the best coding, characters with high frequencies should be high in the tree so that they will have short codes. For example consider the following tree:

```                                 *
0 /   \ 1
/     \
E     0/ \  1
H   /\
0 /  \  1
L     B

```

The codes in the above tree are: E = 0, H = 10, L = 110, B = 111.

Although we will not be building a tree, it's interesting to know that a Huffman tree is built in the following way: we create a tree node for each character. The node stores the character and the frequency. Initially the data is stored in an instance of the ArrayList. Just as in program Huff3 that we wrote on DEC 4, we will construct an array of Huffman in method init(). We will then instantiate an instance of ArrayList and using add() store the elements of the Huffman array there. We will then sort the elements of the ArrayList on the frequency using Collections.sort(). Note that the Huffman class in Huff3 has a compareTo() method that will be used to sort on the frequency.

The program will contain the following methods.

• private reduce() removes the first two element of the array using the remove(0) method twice. The zero indicates that the first element of the array will be removed. Note that remove() not only removes an element from the list but also returns an instance of the Object class, so it must be cast to an instance of Huffman.

• private void combine(Huffman h1, Huffman h2) creates a new instance temp of type Huffman whose frequency is the sum of the frequencies in the nodes just removed. temp is then sent to insert().
• insert() inserts that new node in its proper place in the sorted array using what you will know as an insertion sort. It uses the get(index) function of ArrayList.
• reduceIt() will call reduce until the size of the array is one.
A sample output, where we have relabelled the letter on the combined nodes X, is:
```<40 A><45 E><6 G><8 H><34 I><12 L><9 N><20 O><15 S><2 V><3 W>
Sorted list on frequency is:
<2 V><3 W><9 N><15 S><6 G><40 A><12 L><45 E><20 O><34 I><8 H>
As n decreases
n = 10: <5 X><6 G><9 N><15 S><8 H><40 A><12 L><45 E><20 O><34 I>
n = 9: <8 H><11 X><9 N><15 S><34 I><40 A><12 L><45 E><20 O>
n = 8: <11 X><15 S><12 L><17 X><34 I><40 A><20 O><45 E>
n = 7: <15 S><17 X><20 O><45 E><34 I><40 A><23 X>
n = 6: <20 O><23 X><32 X><45 E><34 I><40 A>
n = 5: <32 X><34 I><40 A><45 E><43 X>
n = 4: <40 A><43 X><45 E><66 X>
n = 3: <45 E><66 X><83 X>
n = 2: <83 X><111 X>
n = 1: <194 X>
```

The ArrayList methods we will use are: