Computer Systems Organization I - Prof. Grishman

Assignment #6

Write a compiler in C which will accept the statements of a simple programming language, SIMPL, and translate these into a complete, executable LC-3 assembly language program.  SIMPL has four statements, an assignment statement, while and endwhile statements, and a print statement.  The resulting program should print (write to the LC-3 screen) whatever is specified by the print statements.  For example, the program

    num = 1
    while num < 4
       square = num * num
       print num
       print square
       num = num + 1
    endwhile

should output

    num = 1
    square = 1
    num = 2
    square = 4
    num = 3
    square = 9

SIMPL Specifications (minimum requirements)

Tokens

There are five types of SIMPL tokens:  variables, reserved words, constants, operators, and punctuation.
A variable consists of 1 to 9 letters (upper or lower case).
Reserved words:  the words "while", "endwhile", and "print" are reserved and may not be used as variable names.
A (decimal) constant consists of 1 to 3 decimal digits.
There are six operators:  +, -, *, /, < and >
There is only one punctuation token, =

Statement format

Each SIMPL statement is on a separate line.
Operators and = may be preceded or followed by zero or more whitespace characters;  'while' and 'print' are followed by one or more whitespace characters.
Statements may begin and end with zero or more whitespace characters.

Expressions

An expression element is a variable or constant.
An expression is either an expression element or two expression elements separated by an operator.
The operators < and > return a negative value if the relation is true, and a positive value or zero if the relation is false.
Variables are initialized to 0.

Statements

The assignment statement has the form 'variable = expression' and assigns to variable the value of expression.
The while statement has the form 'while expression' and executes the statements up to the following 'endwhile' statement as long as expression is negative (true).  A SIMPL program can have at most one while statement.
The print statement has the form 'print variable' and prints the name and value of variable to the terminal
After the last statement in the SIMPL program has been executed, the program terminates (HALT).

Optional features

This assignment offers substantial opportunity for adding optional features for extra credit.  Some possibilities are listed here:

Suggestions for organizing the program

You may write the program any way you like, so long as the final program meets the specifications just described.  However, we offer the following suggestions to help you get started.

Compilation can be done in a single pass, where the basic loop reads a line, breaks it into tokens, determines the type of statement, and generates the code accordingly.  A well-organized program should consist of a set of functions reflecting this structure (tokenize, classifyStatement, and one function to generate the code for each statement).

A general guideline in writing compilers and interpreters is that the structure of the program should parallel the structure of the language description.  One example is having a function for each statement type.  Also, because the assignment and while statements both involve arithmetic expressions;  they should both reference a common function, evaluateExpression, to generate the code to compute this value.Continuting this one step further, the expression involves one or two expression elements, so you should have a function evaluateExpressionElement.

The code will include references to variables, integer constants, and string constants (for print statements).  You should have three tables in your program which record the referenced variables and constants, and which output a series of .FILLs and .STRINGZs at the end of the program (after the HALT, before the END) for these variables and constants.  Variables should be initialized to zero (.FILL 0)

Because the code for multiplying, for dividing, and (particularly) for printing an integer is a bit long, you probably do not want to generate all the  required instructions each time.  It is more efficient to code a small run-time library with at least these three routines, and copy these routines on to the end of your generated code.   The multiply routine was done in class and is in the book;  the divide routine was done for homework.  P&P give an adequate binary to ASCII routine for signed numbers on p. 277;  you are free to copy that.  It only prints numbers in the range -999 to +999, but that is sufficient for this assignment.

As part of designing your compiler, you need to have conventions for register usage.  For example, you can decide that

Stages of implementation

To avoid leaving everything for the last minute, we have divided the project into three parts, due in (roughly) one week increments.

Stage 1:  due Nov. 22 -- assignment statements with + and -

This already requires you to assemble the basic framework of the compiler:  reading lines, breaking them into tokens, interpreting expressions, generating assembly code.  You already will have written the basic tokenizer;  you will probably need to add some very simple functions to test the type of a token -- isVariable and isInteger.  A sample input/output would be

Line 1:  A = 33
    LD    R1, C33
    ST    R1, A
Line 2:  A = B + 33
    LD    R2, B
    LD    R3, C33
    ADD   R1, R2, R3
    ST    R1, A
Line 3:  A = B - C
    LD    R2, B
    LD    R3, C
    NOT   R3, R3
    ADD   R3, R3, 1
    ADD   R1, R2, R3
    ST    R1, A

At this stage we are not yet generating a complete program, so you (and we) have to inspect the output by hand.

Stage 2:  due Dec. 2 --complete programs with FILL;  additional operators

The second stage requires you first to create complete LC-3 programs  by generating the .FILL statements and other boilerplate (.ORIG, HALT, .END).  For example, from

A = 33
B = A + 44

you would generate

        .ORIG  x3000
; Line 1: A = 33            generating such comments is not required
        LD     R1, C33
        ST     R1, A
; Line 2: B = A + 44        but can make debugging much easier
        LD     R2, A
        LD     R3, C44
        ADD    R1, R2, R3
        ST     R1, B
        HALT
A       .FILL  0
B       .FILL  0
C33     .FILL  33
C44     .FILL  44
        .END

If you have done things right you should be able to run these programs with the LC-3 simulator.  Then you have to add the remaining four operators (*, /, <, and >).  This will require you to introduce library routines for * and /.

Stage 3 (final version):  due Dec. 8 --  while and print statements

Here you will need one more library routine ... for printing the name and value of an integer.  This routine will have to take two arguments, a pointer to a string containing the name of the integer, and the value of the integer:

PRINT A

might become

       LEA      R1, S1
       LD       R2, A
       JSR      PRINT
...
S1     .STRINGZ "A"
A      .FILL    0

Grading

This project will be worth 15 points.  Successful completion of stage 1 on time gets 4 points;  successful completion of stage 2 on time gets 5 points;  and successful completion of the final system gets 7 points.  Credit will be based primarily on correct execution of correct programs, but some credit will also be given for comments (we expect at least a line before each function explaining what it does) and error checks (if we can get your program to crash with an invalid input, it won't get full credit;  make sure you don't overrun any arrays).

Extra credit will depend on features added, starting with 0.5 points for 'easy' additions;  we anticipate awarding up to 5 points of extra credit for interesting combinations of features, but stand ready to be wow-ed.

Label your program files for the three stages lastname6a.c, lastname6b.c, and lastname6.c.  Submit your program (.c file) by email, as an attachment, to me <grishman@cs.nyu.edu> and to the e-tutor, Andrew Montalenti <am1221@nyu.edu>, by one minute before midnight on the specified date. (Late assignments will be penalized 1/2 point for each weekday late.) Label your email "CSO Asgn 6a", "CSO Asgn6b", and "CSO Asgn 6".  For the final version  if you are doing any extra credit, describe in your mail and in comments in the program what extra features you are including, and provide at least one test file demonstrating these features. This is a complicated project, but if you complete it you will have gained a solid grasp of a lot of material.  It is important to keep up with the stages of the project.  If you are having trouble, or have a question about the project, we are here to help you;  contact me or Andrew by email.