Programming Languages

G22.2110 Spring 1998

 

Class #3

Conventional Imperative Programming Languages

 

Homework #3: Sethi Exercises 3.5, 3.10, 3.13, 4.2, 4.5

Read Sethi’s Chapters 3, 4, 5, 15.1, 15.2, handouts and references

 

Contents

 

C3.0. Syntax and Grammars (continued)

C3.1. Statement-Level Control Structures

C3.2. Data Types

C3.3. Procedures and their implementation

C3.4. Comments on Sethi's Chapters 3, 4, 5, 15.1, 15.2

 

C3.0 Syntax and Grammars

 

Summary

 

Regular grammars:

productions limited to at most one LHS non terminal, and RHS consisting of non terminals

(finite automata, regular expressions)

 

Context-free grammars:

productions with LHS that are non terminals

(pushdown automata, BNF)

 

Context-sensitive grammars:

LHS longer than RHS

(linear-bounded automata)

e.g., Ada (context free productions + restrictions)

<block> ::= <block identifier> :

begin <statement> {<statement>}

end <block identifier>;

and restriction that the second block identifier must be equal to the first

in simple case, attribute grammars can be used to avoid resorting to context-sensitive grammars

 

Unrestricted grammars:

(Turing machines, Post systems)

 

Postsystems

(see C2.3.5)

 

Abstract syntax

(see C2.3.6)

 

C3.1. Statement-Level Control Structures

 

Introduction

 

Flow of control within expressions (operator associativity, precedence rules)

Flow of control among statements

Flow of control among units

 

Imperative programming language computations:

Expressions evaluation and assignment of resulting values to variables

Control Statements

 

Control Statements provide:

Some means of selecting among alternative control flow paths of statement execution

Some means of causing repeated execution of certain collections of statements (logically controlled iterations)

 

Too many control statements affects readability but enhances writability (e.g., for can be used to build loops that are naturally controlled by a counter instead of while).

 

Too few control statements requires the use of lower-level statements such as goto which affects readability.

 

How much should a language be expanded to increase its writability, at the expense of its simplicity and size.

 

Control Structure:

Control Statement + collection of statements whose execution it controls.

Control Structures should have single entries and single exits .

e.g., Multiple entries to iterative structures make program more difficult to read and understand.

 

Compound Statements

 

Auxiliary language feature which facilitates the design of control statements:

method of forming statement collections.

 

Feature was missing in early versions of FORTRAN.

 

ALGOL 60 introduced the first statement collection structure:

begin

statement_1;

…

statement_n;

end

 

Using compound statements, a collection of statements are abstracted to a single statement.

 

Data declarations can be added to the beginning of a compound statement in several languages to make it a "block".

 

Pascal does not allow blocks.

 

C uses braces to delimit both compound statements and blocks.

 

Several recently designed languages have eliminated the need for specially delimited compound statements by integrating compound statements into their control structures.

 

Control structures of selection and iteration control statements may or may not have multiple entries (the execution of code segments within a control structure may or may not always begin with the first statement in the segment)

 

Multiple entries add little flexibility and decrease readability.

 

Selection Statements

 

Means of choosing between two or more execution paths in a program.

 

Two-Way Selection Statements:

 

Degenerate form is "single-way" selectors.

 

Design issues:

What is the form and type of the expression that controls the selection ?

Can a single statement, a sequence of statements, or a compound statement be selected ?

(single statements lead to heavy dependence on gotos, statement sequences require a syntactic entity to terminate such sequences)

How should the meaning of selectors nested in then clauses of other selectors be specified ?

 

Examples:

All imperative languages (biut BASIC and FORTRAN) include a "single-way selector" as a subform of a two-way selector.

 

e.g., FORTRAN, logical IF:

 

IF (Boolean expression) statement

Selector control expression is boolean type

Only a single statement is selectable

Nesting of logical IF statements is not allowed

 

Single statement selection promotes the use of gotos

e.g, conditional execution of a group of statements by branching aroung the group

 

IF (.NOT. condition) GO TO 20

I=1

J=2

20 CONTINUE

 

Note that this structure can have multiple entries by labeling any of the segment statements.

 

e.g., ALGOL 60, compound statement

allows either a single statement to be selected or a compound one

 

if (Boolean expression) then

begin

statement_1;

…

statement_n;

end

 

Most languages that followed ALGOL 60 (FORTRAN 77, 90, etc.) provide single-way selectors that can select a compound statement or a sequence of statements.

 

e.g., ALGOL 60, first two-way selector

if (Boolean expression)

then statement

else statement

either or both selectable statements can be compound

under no circumstances are both clauses executed

 

The two-way selectors of ALGOL 60 and C can have more than one entry, but those of most other contemporary languages can have only one.

 

Nesting Selectors:

 

e.g., Pascal

if (sum = 0) then

if (count = 0)

then result := 0

else result := 1

 

can be interpreted different ways depending on whether the else clause is matched with the first then clause or the second (indentation is ignored by compilers !)

 

In Pascal, the static semantics of the language specify that the else clause is always paired with the most recent unpaired then clause.

Using a rule instead of a syntactic entity introduces clashes between intended and actual semantics.

To force the alternative semantics in Pascal, the inner if-then should be placed in a compound.

 

In ALGOL 60, syntax rather than a rule is used to connect else and then clauses. An if statement is not allowed to be nested directly in a then clause without being placed in a compound statement.

 

e.g., else paired with second then

if sum = 0 then

begin

if count = 0

then result := 0

else result := 1

end

e.g., else paired with first then

if sum = 0 then

begin

if count = 0 then result := 0

end

else result:= 1

 

C has the same problem as Pascal with selection statement nesting.

 

Special Words and Selection Closure:

 

Idea is to use a special word to mark the end of a whole selection construct.

This resolves the issue of semantics of nested operators and adds to the readability of the construct.

 

e.g., ALGOL 68, FORTRAN 77 and 90 (Modula-2, Ada) - Ada

if A > B then

SUM := SUM + A;

else

SUM := SUM + B;

end if;

 

or

 

if A > B then

SUM := SUM + A;

ACOUNT := ACOUNT + 1;

else

SUM := SUM + B;

BCOUNT := BCOUNT + 1;

end if;

 

The form is the same regardless of the number of statements in the then/else clauses.

Clauses consist of statement sequences rather than compound statements.

Constructs are more regular than that of Pascal and ALGOL 60 selection constructs.

 

e.g., Ada, ambiguity resolution

 

if SUM = 0 then

if COUNT = 0 then

RESULT := 0;

else

RESULT := 1;

end if;

end if;

 

v.s.,

 

if SUM = 0 then

if COUNT = 0 then

RESULT := 0;

end if;

else

RESULT := 1;

end if;

 

Modula-2 uses END only, and produces less readable programs (END only carries part of the information that END IF connotes).

 

Multiple Selection (n-way) Constructs

 

Allow the selection of one of any number of statements or statement groups.

 

Design Issues:

What is the form and type of the expression that controls the selection ?

May a single statement, a sequence of statements, or a compound statement be selected ?

Is the entire construct encapsulated in a syntactic structure ?

Is execution flow through the structure restricted to include just a single selectable segment ?

How should unrepresented selector expression values be handled at all ?

 

Early Multiple Selectors:

 

e.g., FORTRAN's three-way selector, the arithmetic IF, is a primitive degenerate multiple selector.

It is not, per say, a multiple selection statement, as only three or fewer statement collections can be selected.

 

IF (arithmetic expression) N1, N2, N3

 

N1 …

GO TO N4 …

 

No syntactic encapsulation of the GOTO and its selectable sequences.

Overall lack of encapsulation and possibly multiple entries.

Control flow not restricted to a single selectable segment.

 

e.g., FORTRAN's computed GO TO

 

GO TO (label1, label2, …, labeln), expression

 

e.g., FORTRAN's assigned GO TO

 

Modern Multiple Selectors:

 

Better form suggested by C.A.R. Hoare, and included in ALGOL-W.

Structure is encapsulated and has a single entry.

Control flow is restricted through the structure to a single selectable segment (via implicit branches to a single point at the end of the whole construct provided for each selectable statement or compound statement).

 

e.g., ALGOL-W

case integer_expression of

begin

statement_1;

…

statement_n

end;

 

e.g., Pascal

case expression of

constant_list_1: statement_1;

…

constant_list_n: statement_n;

end

Expression is an ordinal type (integer, boolean, character, or enumeration type)

Not all values in the range of the expression type need be present in the lists.

Constant_lists are not the legal targets of branch statements

Unrepresented selector expression values in Pascal caused undefined results ! (later to be detected and reported during execution by the code generated by Pascal compilers). Now handled via optional clause

 

e.g.,

case index of

1, 3: begin

odd := odd + 1;

sumodd := sumodd + index

end;

2, 4: begin

even := even + 1;

sumeven := sumeven + index

end;

else writeln ('Error in case, index =', index)

end

 

operational semantics description is

if index = 1 goto one_three

if index = 3 goto one_three

if index = 2 goto two_four

if index = 4 goto two_four

writeln('Error in case, index=',index)

goto out

one_three:

odd := odd + 1;

sumodd := sumodd + index

goto out

two_four:

even := even + 1;

sumeven := sumeven + index

out: …

 

e.g., C multiple selector (derived from ALGOL 68)

switch (expression) {

case constant_expression_1: statement_1;

…

case constant_expression_n: statement_n;

[default: statement_n+1]

}

 

control expression and constant expressions are integer type.

selectable statements can be statement sequences, compound statements, or blocks.

switch encapsulates the selectable code segments as in the Pascal case, but does not disallow multiple entries, and does not provide implicit branches at the end of those code segments.

 

e.g.,

switch (index) {

case 1:

case 3: odd += 1;

sumodd += index;

(break)

case 2:

case 4: even += 1;

sumeven += index;

(break);

default: printf("Error in switch, index = %d\n", index);

}

 

break acts as a restricted goto.

forgetting "break" …: decrease in reliability v.s. increase in flexibility.

 

e.g., Ada case allows subrabges such as 10 .. 15, and OR operators specified by | in the constant lists. An "other" clause is available for unrepresented values. Ada adds the estriction that constant lists should be exhaustive. Only integer and enumerated types are allowed.

 

e.g., FORTRAN 90 case is similar to Ada.

 

If selections must be made on the basis of a boolean expression, rather than some ordinal type, nested two-way selectors can be used. To alleviate the poor readability of deeply nested two-way selectors, some languages have been extended for that purpose (FORTRAN 90, Ada). Extension allows some of the special words to be left out (else-if sequences replaced with a single special word), and the closing special word on the nested if is dropped.

 

e.g., Ada

if COUNT < 10 then BAG1 := TRUE;

elseif COUNT < 100 then BAG2 := TRUE;

elseif COUNT < 1000 then BAG3 := TRUE;

end if;

 

v.s.,

if COUNT < 10

then BAG1 := TRUE;

else if COUNT < 100

then BAG2 := TRUE;

else if COUNT < 1000

then BAG3 := TRUE;

end if;

end if;

end if;

 

Note that else-if constructs are not as restrictive as multiple selection constructs, as they allow selector comparisons with more than one expression and some other values.

 

Iterative Statements

 

Causes a statement or collection of statements to be executed zero, one, or more times.

Alternative is recursion.

 

Design issues:

How is the iteration controlled ? (logical, counting, combination ?)

Where should the control mechanism appear in the loop ? (top/pretest, bottom/posttest, …)

 

Iteration statement + associated loop body form an iteration construct.

 

Counter-controlled loops:

Loop variable keeps track of the count value

Some means to specify the initial and terminal values of the loop variable and the stepsize (loop parameters)

 

Design issues for counter-controlled loops:

What is the type and scope of the loop variable ?

What value does the loop variable have at loop termination ?

Should it be legal for the loop variable or loop parameters to be changed in the loop, and if so, does the change affect the loop control ?

Should the test for completion be at the top or the bottom of the loop ?

Should the loop parameters be evaluated only once, or once for every iteration ?

 

e.g., FORTRAN IV DO

DO label variable = initial, terminal [, stepsize]

label is that of the last statement in the loop body.

stepsize defaults to 1 when absent.

initial, terminal, and stepsize parameters are restricted to unsigned integer constants or simple integer variables with positive values (no descending loop variables).

loop variable type is restricted to integer, scope is that of program unit in which it appears

value of loop variable undefined upon normal loop termination, and set to the most recently assigned value in case of abnormal termination.

loop variable and parameters cannot be changed in the loop body.

FORTRAN IV DO is a posttest loop construct

A DO construct can be exited and reentered from points outside the loop (only when GO TO is used to jump to an extended loop body).

 

e.g., FORTRAN 77 & 90 DO Statement

loop variable can be integer, real, or double-precision type.

loop parameters can be expressions, and can have positive or negative values.

loop parameters are evaluated at the beginning of the execution or the DO statement, and the value is used to compute an iteration count.

the loop is controlled by the iteration count, not the loop parameters.

iteration count is an internal variable inaccessible to the user code.

DO constructs can only be entered through the DO statement (single-entry structure)

When DO terminates, the loop variable has its most recently assigned value.

 

FORTRAN 90 adds a new form

[name:] DO variable=initial, terminal [,stepsize]

…

END DO [name]

 

loop variable in that case can only be integer.

 

e.g., ALGOL 60 for statement

<for_stmt> -> for var:=<list_element>{,<list_element>} do <statement>

<list_element> -> <expression>

| <expression> step <expression> until <expression>

| <expression> while <Boolean_expr>

 

can combine a counter and a boolean expression for loop control

 

for count := 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 do

list[count] := 0

 

for count := 1 step 1 until 10 do

list[count] := 0

 

for count := 1, count + 1 while (count <= 10) do

list[count] := 0

 

for index := 1, 4, 13, 41

step 2 until 47,

3 * index while index < 1000,

34, 2, -24 do

sum := sum + index

 

all the expressions in the for lists are evaluated for every iteration, or execution of the loop statements.

 

I : =1;

for count := 1 step count until 3 * I do

I := I + 1

 

count is increasing faster than the until clause expression, therefore the loop is not infinite, but it is not obvious …

 

loop variables can be integer or real type, and it is declared like any other variable (scope is that of declaration).

loop variable has its most recently assigned value after loop termination.

Loop parameters can be changed in the loop body (not loop variable).

Illegal to branch into the loop body.

Pretest loop.

Loop parameters are evaluated for every iteration.

 

e.g., Pascal for statement

 

for variable := initial_value (to | downto) final_value do

statement

 

loop variable grows or shrinks in steps of 1.

loop variable must be an ordinal type, and it has the scope of its declaration.

loop variable undefined at normal loop termination (last value if loop terminates prematurely).

loop variable may not be changed in the loop body.

initial and final values can be changed in the loop (evaluated once, so cannot affect loop control).

Pretest loop.

 

e.g., Ada for statement

for variable in [reverse] discrete_range loop

…

end loop

 

Similar to Pascal's

Pretest loop

Discrete range is a subrange of an integer or enumeration type (e.g., 1..10)

The scope of the loop variable is that of the range of the loop:

 

COUNT : FLOAT := 1.35;

for COUNT in 1..10 loop

SUM := SUM + COUNT;

end loop

 

loop variable cannot be assigned a value in the loop body

variables used to specify the discrete range can be changed in the loop (evaluated only once, so does not affect loop control).

Illegal to branch into the Ada for loop body.

 

e.g., C/C++ for statement

for (expression_1; expression_2; expression_3)

statement

 

controlled statement can be single, compound, or null statement.

first expression if for initialization (evaluated only once)

second expression is the loop control (evaluated before each execution of loop body)

last expression executed after each execution of the loop body (often used to increment the loop counter)

all of the expressions of C's are optional (absent second expression is considered true)

no explicit loop variable or loop parameters

all involved variables can be changed in the loop body

legal to branch into a C for loop body

 

for (index = 0; index <= 10; index++)

sum = sum + list[index];

 

for (count1 = 0, count2 = 0.0;

count1 <= 10 && count2 <= 100.0;

sum = ++count1 + count2, count2 *= 2.5);

 

Note above that all desired actions are part of the for statement (no loop body).

Can use first expression to define the loop counter variable

 

for (int count = 0; count < len; count++) {…}

 

for expressions are not part of the compound statement body of the for.

variable definitions in C functions must appear before any executable statements (i.e., above is illegal in C)

Note that in C++, the scope of a variable defined in the for statement is from its definition to the end of the function in which it is defined.

C's for need not count (can easily model counting and logical loop structures).

 

Logically Controlled Loops:

 

Repetition control based on a Boolean expression rather than a counter.

Every counting loop can be built with a logical loop, but the reverse is not true.

 

Design issues:

Should the control be pretest or posttest ?

Should the logically controlled loop be a special form of a counting loop or a separate statement ?

 

Examples:

 

e.g., Pascal, Modula-2, C

include both pretest and posttest logically controlled loops, which are not special forms of their counter-controlled iterative statements.

 

e.g., C

while (expression)

statement

 

do

statement

while (expression)

 

legal in C to branch into both a while and do loop bodies.

FORTRAN 77 has neither a pretest or posttest logical loop (as for FORTRAN 90)

Ada has a pretest logical loop but no posttest.

Pascal has posttest logical loop "repeat-until", but its body can be either a compound or a statement sequence (only structure in Pascal with this flexibility).

 

User-Located Loop Control Mechanisms:

 

Let the user choose a location for loop control other than the top or bottom of the loop.

 

Design issues:

Should the conditional mechanism be an integral part of the exit ?

Should the mechanism be allowed to appear in a controlled loop or only in one without any other control ?

Should only one loop body be exited, or can enclosing loops also be exited ?

 

e.g., Ada infinite loop

 

loop

…

end loop

 

exit [loop_label] [when condition]

 

e.g.,

loop

…

if SUM >= 10000 then exit;

…

end loop;

 

loop

…

exit when SUM >= 10000;

…

end loop;

 

OUTER_LOOP:

for ROW in 1..MAX_ROWS loop

INNER_LOOP:

for COL in 1..MAX_COLS loop

SUM := SUM + MAT(ROW, COL);

exit OUTER_LOOP when SUM > 1000.0;

end loop INNER_LOOP;

end loop OUTER_LOOP;

 

above exits to first statement after the outer loop

 

FORTRAN 90 has Ada like exit statement (unconditional however)

Both C and Modula-2 have unconditional loop exit statements (C's break, Modula-2's EXIT)

 

C and FORTRAN 90 include a control mechanism that transfers control to the control mechanism of the smallest enclosing loop (C's continue, FORTRAN 90's CYCLE which supports named loops).

 

e.g., C

while (sum < 1000) {

getnext (value);

if (value < 0) continue;

sum +=value;

}

 

User-Defined Iteration Control:

 

Rather than have a counter or boolean expression control the iterations, use the number of elements in a user-defined data structure.

 

e.g., assume the nodes of a binary tree are to be processed

assume tree root pointed to by a variable named root

assume traverse is a function that sets its parameter to point to the next element of a tree in the desired order

 

for (ptr = root; ptr; traverse(ptr)) {

…

}

 

traverse is the iterator

the second expression in the for is a boolean expression that compares ptr with zero (true if ptr is not zero)

most useful in OO programming (interators for user-defined types or classes, etc.)

 

Unconditional Branching

 

Transfers execution control to a specified place in the program

 

Problems:

Affects readability and makes programs unreliable and difficult to maintain.

Could restrict gotos so they can transfer control only downward in a program.

A few languages have been designed without a goto (e.g., Modula-2, Bliss, CLU, Euclid, Gypsy)

but most of these provide additional control statements (loop and subprogram exits) to replace typical applications of the goto.

Most currently popular languages include a goto statement.

 

Label Forms:

 

ALGOL 60 and C: use identifier forms for labels.

FORTRAN and Pascal: use unsigned integer constants for lables.

Ada: uses identifier form as the target part of its goto statement, but as the label appears on a statement it must be delimited by the symbols << and >>.

 

e.g.,

goto FINISHED;

…

<<FINISHED>> SUM := SUM + NEXT;

 

e.g.,

PL/I allows labels to be variables and be assigned values and used as subprogram parameters.

Branches have targets depending on values assigned at runtime !

 

Restrictions on Branches:

 

Pascal labels must be declared as if they were variables, but they cannot be passed as parameters, stored, or modified.

 

Assuming that a statement group be either a compound statement or the collection of statements in a repeat loop, the target of a Pascal goto must be one of the following:

The control statement that includes the goto or another statement in the statement group of which the goto is a part.

A statement in some statement group that contains the statement sequence that contains the goto.

Another statement in any enclosing subprogram scope, as long as the statement is not in a statement group.

 

A goto can never have as its target a statement in a compound statement of a control structure, unless execution of that compound statement has already begun and has not yet terminated.

Therefore the target can never be in a compound statement or subprogram scope that is nested more deeply than that of the one of the goto (prevents control structures from having more than one entry point).

 

e.g.,

while … do

begin

…

while … do

begin

…

100: …

…

end;

while … do

begin

…

goto 100;

…

goto 200

while … do

begin

…

200: …

…

end;

…

end;

…

end

 

both label/goto pairs are illegal (compound statement that contains 100 completes execution before the goto that references it can be executed, the goto that references lable 200 is executed before the compound statement that contains the label has begun execution).

 

e.g., legal goto

while … do

begin

…

100: …

…

while … do

begin

…

goto 100;

…

end;

end;

 

Problem with Pascal is that it is still legal to branch into a different procedure.

 

e.g.,

procedure sub1;

label 100;

…

procedure sub2;

…

goto 100;

…

end; { of sub2 }

…

100: …

…

end; { of sub1 }

 

goto above transfers control to the parent subprogram sub1, terminating sub2.

Therefore, a goto can terminate one or more procedure activations, or executions, but cannot start one by branching into it.

 

Guarded Commands

 

Used for concurrent programming in CSP and Ada.

 

Motivation is to provide control statements that support a program design methodology that ensures correctness during development, rather than relying on verification or testing of completed programs (Dijkstra 1976)

 

e.g., Dijstra's selection construct

if <Boolean expression> -> statement

[] <Boolean expression> -> statement

[]…

[] <Boolean expression> -> statement

fi

 

Small blocks are called fatbars, and are used to separate the guarded clauses and allow the clauses to be statement sequences.

All boolean expressions are evaluated each time the construct is reached during execution. If more than one expression is true, one of the corresponding statements is nondeterministically chosen for execution. If none is true, a run-time error occurs that causes program termination.

Programmer is forced to consider all possibilities.

 

e.g., Ada

if I = 0 -> sum := sum + I

[] I > j -> sum := sum + j

[] J > I ->sum := sum +I

 

 

if x >= y -> max := x

[] y >= x -> max := y

fi

 

if x and y are equal, it does not matter which we assign to max (form of abstraction provided by the nondeterministic semantics of the statement).

 

Selection construct is valuable to choose among program interrupts in some random way.

 

Program verification is virtually impossible when goto statements are used, but it is greatly simplified if either only logical loops and selections (Pascal-like) are used, or only guarded commands are used.

 

Dijkstra's loop structure:

 

do <Boolean expression> -> <statement>

[] <Boolean expression> -> <statement>

[] …

[] <Boolean expression> -> <statement>

od

 

All boolean expressions are evaluated on each iteration. If more than one is true, one of the associated statements is nondeterministically chosen for execution, after which the expressions are again evaluated. When all expressions are simultaneously false, the loop terminates. Programmer is forces to consider all possibilities.

 

Conclusions

 

Only sequence, selection, and pretest logical loops are absolutely required to express computations.

No single set of contril structures has been widely accepted. There is an agreement that the set should be rather small, and that there must be sufficient variety that the goto should rarely be necessary.

The control structures of functional and logic programming languages and of Smalltalk are all quite different from those discussed above.

 

C3.2. Data Types

 

(Complete During Next Class - February 23)

 

Introduction

 

Primitive Data Types

 

Numeric Types

Integer

Floating-Point

Decimal

Boolean Types

Character Types

 

Character String Types

 

User-Defined Ordinal Types

 

Array Types

 

Record Types

 

Union Types

 

Set Types

 

Pointer Types

 

Conclusions

 

C3.3. Procedures and their implementation

 

(Next Class - February 23)

 

C3.4. Comments on Sethi's Chapters 3, 4, 5, 15.1, 15.2