Computer Systems Organization I - Prof. Grishman

Assignment #5

The next step in building a compiler, now that we have a function to read in one line at a time, is a function to divide that line into tokens.  We will assume a language consisting of three types of tokens:  names, consisting of one or more letters (upper or lower case);  numbers, consisting of one or more decimal digits;  and special characters -- any single character other than a letter, digit, blank, tab, return, or newline (the last four collectively called 'white space').

Write a C function

char* gettoken (char line[], int* offset)

which returns (a copy of) the next token in line, starting at line[*offset], or NULL if there are no more tokens.  This function should also advance offset to the index of the character following the token just returned (so that the next call on gettoken with the same arguments will read the next token). The token returned should be a pointer to a copy of the token (separately allocated on the heap by malloc), not a pointer into line.

Then write a main function which reads in a series of lines (using getline), divides each line into tokens, and writes the tokens to standard output, one token per line.  For example, if the file consists of
Hello there
quack=moo+5
the program should write.
Hello
there

quack
=
moo
+
5

For 0.5 points of extra credit, store all the tokens returned by gettoken into an array (of type char*[]) and sort the tokens (into ascending alphabetical order) before printing them.  Any simple sort routine (bubble sort, insertion sort, etc.) will be fine.  You may use the library function strcmp to compare strings (see page 613 of the text for the spec of this function).

Submit your program (.c file) by email, as an attachment,  to me <grishman@cs.nyu.edu> and to the e-tutor, Andrew Montalenti <am1221@nyu.edu>, by one minute before midnight on Thursday, November 10th. (Late assignments will be penalized 1/2 point for each day late, out of a total of 4 points.) Label your email "CSO Asgn 5".