Programming Assignment 2

Computer Science 102

Spring 1996

First draft due Sat, FEB 28, 12:00pm (midnight). Each successive draft is due one week after the due date for the previous draft .

Your project is to design a program to implement string operations. Recall that in Turbo Pascal there is already a string type. A string is an array of 255 characters, plus a length field that indicates how many characters are actually being used. There are two problems with this built-in implementation. First, you are prevented from having a string with more than 255 characters. Second, even if your string is very short, depending on your system, the system may still allocate space for 255 characters. Turbo Pascal allocates space for only the amount of characters in the string. You are to implement an alternative string type, called string_type, that avoids these two problems.

Your string type will use a linked list instead of an array. For example, to represent the string 'comp', you can use the following linked list.

        _______     ______     ______     ______
list--> |_C|___|--> |_O|__|--> |_M|__|--> |_P|__|-->NIL

Your project divides into three parts:

The StrngADT unit

In the unit StrngADT, you will implement an abstract data type, called string_type, that will allow several useful string operations. The interface for this unit is given below.

unit StrngADT;


  pointer_type = ^node_type;
  node_type = 
      info : char;
      next : pointer_type

  string_type = object
procedure Concat(string1, string2 : string_type; var string3 : string_type);
{Precondition: None.
 Postcondition: string3 is the concatenation of string1 and string2;
                the nodes of string3 are separate from the nodes 
                of string1 and string2.}

procedure Copy(string1 : string_type; var string2 : string_type);
{Precondition: None.
 Postcondition: string2 is a copy of string1.  The nodes of string2
                are separate from the nodes of string1.}

procedure Repeats(string1 : string_type; n : integer; 
                 var string2 : string_type);
{Precondition: n >= 0.
 Postcondition: string2 is the concatenation of n copies of string1.  
 If n = 0, then string2 is the empty string represented by a linked list
 consisting of only the NIL pointer.}

procedure Assign(string1 : string; var string2 : string_type);
{Precondition: None.  (Notice that string1 is a Pascal string.)
 Postcondition: string2 represents the same string as string1.
 Use ord(string1[0] ) to obtain the string length since you will be
 using the function length as a method in this object.}

procedure Substring(string1 : string_type; x, y : integer; 
                    var string2 : string_type);
{Precondition: 1 <= x <= y <= length(string1).
 Postcondition: string2 is the substring of string1 that goes from 
                the xth character to the yth character.}

procedure delete(var string1 : string_type);
{Precondition: string1 is defined.
 Postcondition: All the nodes of string1 are returned to the heap
                and string1.head is assigned the nil pointer.

 Do not use delete in cases where x appears on both sides of the
 ``:='', like, x := x + y, only for cases like x := 'nyu' and then x
 is reassigned in for instance, x := 'hello}

function Length(string1 : string_type) : integer;
{Precondition: None.
 Postcondition: Returns the number of characters of string1.}

procedure Print(string1 : string_type);
{Precondition: None.
 Postcondition: Displays the characters of string1 on the terminal. 
                The display will have up to 75 characters per line.} 
        head : pointer_type;

Note that the private field cannot be used as a parameter for the methods of the object; however as we see here, the object (String_type) can.

Draft 1

For draft 1, you are required to first write an object called string_type which contains as methods the usual string operations: assignment (x := 'nyu'), concatenation (z := x + y), multiplication (w := 12 * z), substring (w := x [2..5]), copying (u := x), printing (x), deleting and quiting (quit). These operations are to be implemented with linked lists using dynamic pointers, so the object also contains a delete method to return the nodes of a reassigned string to the heap. The private field of the object is a pointer.

An example of a method heading is PROCEDURE ASSIGNMENT( ST1 :STRING; ST2: STRING_TYPE). So if ST1 is 'nyu', ST2 is the pointer to the first node of the three node linked list containing 'n', 'y', and 'u'. The object is defined and implemented in a unit strnADT. There is a program test1.pas (available on the homepage) that checks your strnADT.

Draft 2

The Main Program
Your main program will be a calculator of strings. In other words, the user will use a file with a series of string commands. After each command, your program will parse the command and then print out the result of the command. Each command will have one of the following forms:

  1. x := 'Hello'
  2. x := y + z
  3. x := y
  4. x := 17 * y
  5. x := y [10..20]
  6. length x
  7. x
  8. quit

Here x, y, and z represent string variables (possibly the same variables). The only variable names allowed will be a single character, either a lowercase letter ('a' to 'z') or an uppercase letter ('A' to 'Z'). As in Pascal, an uppercase letter refers to the same variable as its lowercase counterpart.

Each of the tokens in the input line will be separated by one or more blanks. Your lexical analyser should use this fact to separate the tokens,

If the input line is not in one of the formats above, your program should print an error message, and then skip that input line. Similarly, if one of the variables on the right side of an assignment has not been initialized, your program should complain and skip that line. Use a symbol table to keep track of those variables assigned values. Also, the actual string_type parameters should be elements of an array of string_type.

Discussion and explanation
Draft 2 requires you to write a string interpreter that will execute an input stream of string operations. For instance, if the input is:

        x := 'nyu'
        y:= 'hello'
        z := y + x
where z is the instruction to print the string stored in z, then the result would be "hellonyu". A variable in the input can only be a single letter. The interpreter is not case sensitive, so that "z" and "Z" represent the same variable.

The front end of the interpreter breaks the input into constituent parts called tokens. Then in the parser, it determines which statement (assignment, print, etc.) a line of token represents. I have written the scanner for you, and the initial stage of the parser (again, these are available on the homepage in analyse.pas). The scanner is elementary because the tokens in the input must be separated by at least one blank. The parser indicates which procedure further parses a statement. The parsing up to this point is done simply by determining how many tokens appear in the input. Thus x := 'hello' and x := y are parsed such that the same procedure is executed. It is up to you to further parse them by distinguishing between an assignment and a copy statement.

Use a symbol table to check whether a variable that must have a value in any operation, has in fact been assigned a value. This in x := y, you would check whether y has been assigned a value. Do this by using a type symbol = array[1..max] of boolean, where max has the value 26. If the input contains an undefined variable, the interpreter must print an error message.

How do you associate a variable with the pointer to the linked list? The natural way is to have an array of the object, i.e., table_type = array [1..max] of string_type. Assume that Table : table_type and if x := 'nyu' and y := 'hello'; and Px points to the linked list for x; and Py, for y, then Table[24] contains Px and Table[25] contains Py. This also facilitates deleting linked lists for redefined variables (see below).

How is this implemented? For instance in x := 'nyu', stream[1] is 'x', stream[2] is ':=' and stream[3] is the string 'nyu'. First you must strip the quotes from 'nyu'. The quotes are the first and last elements of stream[3]. These are referred to as stream[3][1] and stream[3][5] respectively. In general, the second subscript for the term referring to the second quote should be the length L of the string. So it should be written as stream[3][L]

The subscript of Table is an integer, so you must convert the letter variables to an integer. 'A' is converted to 1, 'B' to 2, etc. The element of Table is the second parameter of method ASSIGN. Remember that the header for the method is: PROCEDURE ASSIGN( ST1 :STRING; ST2: STRING_TYPE). This will result in the pointer for the linked list that stores 'nyu' being the contents of the element 24 of Table. The data for draft 2 is also on the home page.

Deleting linked lists

When a variable is assigned a new string in a simple assignment statement, e.g.,
   x := 'nyu'
   x := 'jay'
if you assign a new pointer to x, the pointer for the linked list representing nyu will no longer be accessible. That means that the locations for n, y and u in the heap will consume heap memory and will no longer be addressable. We say that leakage has occurred. The portion of the heap that is inaccessible (here,that which stores nyu) is called garbage. A language like List eliminates garbage by performing garbage collection. Pascal does not do this. In this project you will do this yourself using Delete.

Draft #3
Change draft #2 so that it accepts variables that consist of one letter or more, for instance first in:

  first := 'NYU'
This will require you to hash the variable name to an integer. One way of doing this is to: The data for this draft is on the homepage.