Computer Systems Organization I - Prof. Grishman
The next step in building a compiler, now that we have a function to read
in one line at a time, is a function to divide that line into tokens. We
will assume a language consisting of three types of tokens: names,
consisting of one or more letters (upper or lower case); numbers, consisting
of one or more decimal digits; and special characters -- any single
character other than a letter, digit, blank, tab, return, or newline (the
last four collectively called 'white space').
Write a C function
char* gettoken (char line, int* offset)
which returns (a copy of) the next token in line, starting at line[*offset],
or NULL if there are no more tokens. This function should
also advance offset to the index of the character following the
token just returned (so that the next call on gettoken with the
same arguments will read the next token). The token returned should be a
pointer to a copy of the token (separately allocated on the heap by malloc),
not a pointer into line.
Then write a main function which reads in a series of lines (using getline),
divides each line into tokens, and writes the tokens to standard output,
one token per line. For example, if the file consists of
the program should write.
For 0.5 points of extra credit, store all the tokens returned by gettoken
into an array (of type char*) and sort the tokens (into ascending
alphabetical order) before printing them. Any simple sort routine (bubble
sort, insertion sort, etc.) will be fine. You may use the library function
strcmp to compare strings (see
page 613 of the text for the spec of this function).
Submit your program (.c file)
by email, as an attachment, to me <email@example.com> and to the
e-tutor, Andrew Montalenti <firstname.lastname@example.org>,
by one minute before midnight on Thursday,
November 10th. (Late assignments will be penalized 1/2 point for each
day late, out of a total of 4 points.) Label your email "CSO Asgn 5".