Programming Assignment 3 Computer Science 102

Spring 2000

First draft due: WED, APRIL, 5, 11:59 pm.

Introduction

In this assignment we will study an efficient scheme for for compressing a text file. It is called Huffman coding. A simple method to encode text as a string of 0s and 1s is to represent each character as a unique string of 8 binary digits (bits). A text message is thus translated into a string of bits. By exploting the fact that not all characters appear with the same frequency in the text, we can encode rarely used characters with long codes and frequently used ones with short codes.

Given a set of characters and their corresponding frequencies, an optimal coding scheme is produced by Huffman coding. It uses a binary tree for encoding. In Draft #1, however, we simply use a linked list.

Draft #1

We will use a linked list implementation to write and test the initial versions of some of the procedures to be used in draft #2. here is the data structure (it's on the WEB) you will using:

unit huff3ADT;

INTERFACE

type HuffmanRecord = record
        name : char;
        freq : integer
     end;

     point_tnode = ^tnode
     tnode = record
        Data : HuffmanRecord
        left, right : point_tnode    
     end;

     pointer = ^  ListNode;
     ListNode = record         
        info : point_tnode;
        next : pointer
     end;
     RecordFile = File of  HuffmanRecord;
     huffman_type = object
        PROCEDURE print;
        PROCEDURE ReadOrdList( var DataFile : RecordFile);
        PROCEDURE Reduce_it;
        private
            list : pointer;
     end {object};

IMPLEMENTATION
end.

Here are the procedures to be written in your ADT:

  1. sort(var list : pointer; newPtr : point_tnode); which creates a record of type ListNode with the info field set to set to newPtr and inserts it into the linked list pointed to by list, in its proper place. This linked list is sorted by frequencies, i.e., newPtr^.data.freq. It uses push( list, newPtr) if the frequency is the smallest on the list to insert a node at the beginning of the list, or insert( follow) to insert a node in any point in the rest of the sorted list, where follow is the pointer to the node that has the highest frequency smaller that that of the new node, i.e., the pointer to the preceding node in the linked list. The node representing the record is either created in Push or in Insert.

  2. ReadOrdList( var DataFile : RecordFile); which repeatedly until there is no more data to be read. List is the private field and ReadOrdListis a method of the object.

  3. DelFirst(var list, smallfreqNode : pointer) which removes the first element of the linked list pointed to by list, and returns the pointer to that element in smallfreqNode.

  4. Combine( var list : pointer; p1, p2 : pointer) which creates a new node whose frequency is the sum of the frequencies in the nodes pointed to by p1 and p2, and inserts that new node in its proper place in the linked list list by calling Sort.

  5. reduce( var list : pointer) which gets as input list

  6. Reduce_it calls reduce and prints the resulting linked list until there is only one node remaining on the list. The frequency field of this node will automatically contain the sum of all the frequencies on the original linked list.

Each procedure should contain preconditions and postconditions. The data for this program will contain the frequencies and will be mailed to you via majordomo. Or you can make your own binary file (see Section 15.3, page 665 in the Turbo Pascal text). The data is: A 40 E 45 G 6 H 8 I 34 L 12 N 9 O 20 S 15 V 2 W 3 and the binary file is on the WEB. Note that you cannot read a binary file simply by opening the link and viewing it using a browser. You must download it and then write a Pascal program to read it.

Your client should call an instance of the object huffman_type and should

The client and the output are on the WEB. In the output, $ is the character placed in the name field for nodes produced by combining two other nodes.