Programming Languages

Start Lecture #4

Chapter 7: Data Types

Will be done later.

Chapter 8: Subroutines and control Abstraction

Subroutines and functions are perhaps the most basic and important abstraction mechanism. They are found in all general purpose languages and many special purpose ones as well.

Procedures vs. Functions: Applicative vs. Functional

Procedures are always applicative, they must be called for their side effects since they do not return a value.

In a purely functional model (e.g., Haskell) functions are called solely for their return value; they have no side effects. These functions are like their mathematical namesakes: they are simply a mapping from inputs to outputs.

Much more common is a hybrid model in which functions can have side effects.

8.1: Review of Stack Layout

I do not cover this in detail here, since we cover it in some depth in the compilers class. I just make a few points.

8.2: Calling Sequence

The caller and the callee cooperate to create a new AR when a subroutine is called. The exact protocol used is system dependent.

There are two parts to the calling sequence: the prologue is executed at the time of the call, the epilogue is executed at the time of the return.

Some authors use calling sequence just for the part done by the caller and use prologue and epilogue just for the part done by the callee. I don't do this.

Saving and Restoring Registers

Some systems use caller-save; others use callee-save; still others have some registers saved by the caller, other registers saved by the callee.

Maintaining the Static Chain

In addition to the dynamic link, another inter-AR pointer is kept: the static link (access link in dragon). The static link points to the AR of the most recent activation of the lexically enclosing block.

The static link is needed so that non-local objects can be referenced (local objects are in the current AR).

It is not hard to calculate the static link during (the prologue of) the calling sequence unless the language supports passing subroutines as arguments. We will not cover it; the simple case is again in the compiler notes.

A Typical Calling Sequence

As mentioned previously, the exact details are system dependent, what follows is a reasonable approximation.

The calling sequence begins with the caller:

  1. Pushes some registers on to the stack (modifying sp).
  2. Pushes arguments.
  3. Computes and pushes the static link.
  4. Executes a call machine instruction and pushes the return address.

Next the callee:

  1. Pushes the old frame ptr and calculates the new one.
  2. Pushes some registers.
  3. Allocates space for temporaries and local variables.
  4. Begins execution of the subroutine.

When the subroutine completes, the callee:

  1. Stores the return value (perhaps on the stack).
  2. Restores some registers.
  3. Restores the sp.
  4. Restores the fp.
  5. Jumps to the return address, resuming the caller.

Finally, the caller:

  1. Restores some registers.
  2. Moves the return value to its destination.

Homework: CYU 5

Special Case Optimizations

8.2.1: Displays

8.2.2: Case Studies: C on the MIPS; Pascal on the x86

8.2.3: Register Windows

8.2.4: In-Line Expansion

8.3: Parameter Passing

Much of the preceding part of this chapter concerned compilers and the implementation of programming languages. When we study parameters, in particular parameter modes, we are discussing the actual semantics of programs, i.e., what answers are produced rather than how is it implemented. However, many of the semantic decisions are heavily influenced by implementation issues.

Most languages use a prefix notation for subroutine invocation. That is, the subroutine name comes first, followed by the arguments. For applicative languages the arguments are normally parenthesized. As we have seen in Lisp (e.g., Scheme) the function name is simply the first member of list whose remaining elements are the arguments.

Another syntactic difference between most applicative languages and Lisp is that, in the former, built in language constructs look quite different from function application; whereas, in Lisp they look quite similar. Compare the following C and Scheme constructs for setting z to the min of x and y.

    if (x < y) z = x; else z = y;
    (if (< x y) (set! z x) (set! z y))
    (set! z (if (< x y) x y))
  

Definitions:
Formal parameters (often called simply parameters) are the names that appear in the declaration of the subroutine.
Actual parameters (often called simply arguments) refer to the expressions passed to a subroutine at a particular call site.

8.3.1; Parameter Modes

The mode of a parameter determines the relation between the actual and corresponding formal parameter. For example, do changes to the latter affect the former.

There are a number of modes including.

  int c = 1;
  f(c);
  printf ("%d\n");
  ...
  void f(int x) { x = 5; }

Call-by-Value

When using call-by-value semantics, the argument is evaluated (it might by an expression) and the value is assigned to the parameter. Changes to the parameter do not affect the argument, even when the argument is a variable. This is the mode used by C. Thus the C program on the right prints 1.

As most C programmers know, g(&x); can change the value of x in the caller. This does not contradict the above. The value of &x can not be changed by g(). Naturally, the value of &x can be used by g(). Since that value points to x, g() can use &x to change x.

As most C programmers know, if A is an array, h(A) can change the value of elements of A in the caller. This does not contradict the above. The extra knowledge needed is that in C writing an array without subscripts (as in h(A) above) is defined to mean the address of the first element. That is, h(A) is simply shorthand for h(&A[0]). Thus, the h(A) example is essentially the same as the g(&x) example preceding it.

In Java, primitive types (e.g., int) are passed by value. Although the parameter can be assigned to, the argument is not affected.

Call-by-Reference

The location of the argument (the argument must be an l-value, but see the Fortran exception below) is bound the parameter. Thus changes to the parameter are reflected in the argument. Were C a call-by-reference language (it is not) then the example on the upper right would print 5.

By default pascal uses call-by-value, but if a parameter is declared var, then call-by-reference is used.

Fortran is a little weird in this regard. Parameters are normally call-by-reference. However, if the argument is not an l-value, a new variable is created, assigned the value of the argument, and is itself passed a call-by-reference.. Since this new variable is not visible to the programmer, changes by the callee to the corresponding parameter are not visible to the program.

In Java, objects are passed by reference. If the parameter is declared final it is readonly.

  with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
  procedure ScalarParam is
     A : Integer := 10;
     B : Integer;
     Procedure F (X : in out Integer; Ans : out Integer) is
     begin
        X := X + A;
        Ans := X * A;
     end F;
  begin
     F(A,B);
     Put (B);
  end ScalarParam;

  with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
  procedure Ttada is
     type IntArray is array (0..2) of Integer;
     A : IntArray := (10, 10, 10);
     B : Integer;
     Procedure F (X : in out IntArray; Ans : out Integer) is
     begin
        X(0) := X(0) + A(0);
        Ans  := X(0) * A(0);
     end F;
  begin
     F(A,B);
     Put (B);
  end Ttada;

Call-by-Value/Result

With call-by-value/result the value in the argument is copied into the parameter during the call, and, at the return, the value in the parameter (the result) is copied back into the argument.

Certainly this copying is different from call-by-reference, but the effect seems to be the same: changes to the parameter during the subroutine's execution are reflected in the argument after the return.

However, in the face of aliasing, call-by-value/result can differ from call-by-reference as illustrated in the top Ada program on the right. Clearly the first assignment statement sets X to 20 which will, eventually, be transmitted back to A. The question is what value of A is used in the next assignment statement: call-by-reference says the value is 20 (so 400 is printed); whereas, call by value/result says the value is 10 (so 200 is printed).

In Ada, scalar arguments and parameters are call-by-value/result so the answer printed is 200. However, the program just below is basically the same, but uses arrays. In such cases the language permits either call-by-reference or call-by-value/result. The language reference manual warns users that programs, like this one, whose result depends on the choice between call-by-reference and call-by-value/result have undefined semantics. The ada implementation on my laptop (gnat) uses call-by-reference and therefore prints 400.

The Euclid language outlaws the creation of such aliases.

Call-by-Name

Call-by-name, made famous 40 years ago in Algol 60, will perhaps seem a little weird when compared to the currently more common modes above. In fact it is quite normal today, just not for subroutine invocation. One should compare it to macro expansion, for example #define in the C programming language. One should remember that in 1960, the most widely used programming languages were assembly languages and macro expansion was very common.

If the parameter is not encountered while executing the subroutine, the argument is not evaluated.

As in macro expansion, the argument is re-evaluated every time it is used. More significantly, it is evaluated in the context of the caller not the callee.

This last point causes significant difficulty for the implementation. Remember that the caller and callee are compiled separately. Thus the mechanism used is that the caller passes to the callee, not only the argument, which can be an expression, but also the referencing environment of the caller so that the expression can be evaluated correctly every time it is used. Traditionally, this referencing environment is called a thunk.

As C programmers know, it is wise to have extra parentheses; these were automatically supplied. Also the names of local variables in the callee were automatically made not to clash with the names of the parameters.

Call-by-Need

This can be view as a lazy approximation of call by name. The first time the parameter is encountered in the subroutine execution, the value is calculated and saved. This value is then reused if the parameter is encountered again.

Variations on Value and Reference Parameters

We discussed this above (call-by-value/result)

Call-by-Sharing

We do not discuss the subtle difference between call-by-sharing and call-by-reference.

The Purpose of Call-by-Reference

There are two reasons to use call-by-reference instead of call-by-value (assuming both are available).

  1. To enable the called routine to change the value of the argument.
  2. To save copying a large argument to the corresponding parameter. Be careful: it is often unwise to make semantic choices for performance reasons.

Read-Only Parameters

Some languages permit parameters to be declared readonly. This means that the value of the parameter cannot be altered in the callee, which naturally implies that the corresponding argument will not have its value changed.

Modula-3 actually uses the word READONLY; C and C++ use const.

Parameter Modes in Ada

An Ada programmer may declare each procedure parameter to have mode in, out, or in out. The default is in and all function parameters are required to have mode in. These modes do not describe the implementation (e.g., value vs. reference), which we have discussed above. Instead they describe the flow of information between caller and callee. That is, they concern semantic intent, not implementation.

As the names suggest, an in parameter can be read by the callee, but not written; whereas, an in out parameter can be both read and written. Both in and in out parameters initially have the value of the corresponding argument. In addition, the final value of an in out parameter, becomes the value of the argument when the procedure returns. Hence this argument must be an l-value.

An out parameter is similar to an in out parameter. The difference is that the initial value in the callee is undefined.

  int f (int x);
  int g (int *x);
  int h (int &x);

References in C++

Like C, C++ is a call-by-value language—but with a twist. Specific parameters can be declared to be call-by-reference simply by preceding their names with an &. Compare the three C++ function declarations on the right.

We have already seen examples like the first two before; they are both legal in C. The first one includes a standard call-by-value integer parameter x. Changes to x in f are not visible to the caller.

The function g contains a call-by-value pointer argument x (recall that int *x means that *x is an integer, so x is a pointer to an integer). The caller issues a statement such as g(&a), with a declared as an integer, theryby passing by value the pointer &a. Function g cannot change &a, but can use it to modify a. Again, this is completely standard call-by-value semantics.

The declaration of h is a C++ extension to C. Unlike the analogy to int *x (and many other C declarations), int &x does not mean that &x is an integer (that wouldn't make sense, what would x be?). Instead, it means that x is an integer that has been passed by reference. Similarly, the caller issues a statement such as h(a), with a declared an integer.

To a beginner, it is, at the least, surprising that both &a and *a, which in some ways have opposite semantics, here both are used to indicate that a can be changed by the caller.

  procedure main()
    procedure outer(i:int; procedure P)
      procedure inner
      begin
        print i
      end inner
    begin
      if i=1 then outer(2, inner)
      else P
    end outer
  begin main
    outer(1,main)
  end main

Closures as Parameters (Passing Nested Subroutines)

Consider the code on the right, using lexical (static) scoping. Procedure outer has two parameters, the second of which f is itself a procedure. Procedure outer has a declaration of a nested procedure inner. The first time outer is invoked it calls itself passing the procedure inner. The second time outer is invoked, it calls its parameter P which is bound to inner.

When inner is called it must be able to reference not only its own declarations, but also the declarations in outer. Since outer is called twice, we must arrange for the correct outer environment to be visible to inner. Since, in this example the inner that is called was declared in the first invocation of outer, the value 1 is printed.

Note that the outer that actually called inner had i=2, but that is not relevant since we are assuming static scoping.

Thus the first outer must pass to the second one the referencing environment of inner (which, as we said, includes the declarations in outer). In other words, outer must pass to sub the closure of inner.

To repeat, nested procedures used as parameters complicates the picture considerably and necessitates the use of closures. A parameter that is a (pointer to a) non-nested procedure does not cause this problem. The nesting is required.

Functions that take other functions as arguments (and/or return functions as results are called higher-order functions. Programming languages that support functions taking function pointers as arguments (e.g. C) can emulate higher order functions.

Higher-order functions complicate the implementation, but we have not studied this. In particular, a parameter that is a (pointer to a) non-nested procedure does not cause the problem seen in the above example. Nesting is required for this problem to occur.

Homework: CYU 13, 14, 17.

Homework: 4, 6, 12.

8.3.2: Call-by-Name

Done previously.

8.3.3: Special-Purpose Parameters

Conformant Arrays

Default (optional) Parameters

Ada and C++ permit parameters to be specified with default values to be used if there is no corresponding argument.

    procedure f (x : integer; y : integer :=1) return integer;
    int f (int x, int y = 1);
  

Named Parameters

Ada permits the caller to refer to parameters by name (rather than position). Thus given the first declaration below, the two following invocations are equivalent.

    Function F (I : Integer; J : Integer; K : Integer);
    F(1,2,3);
    F(1, K=>3, J=>2);
  

Variable Numbers of Arguments

This is the famous varargs facility in C and friends.

    printf("X=%d, Y=%d, Z=%d, and W=%d\n", X,Y,Z,Z)
  
The number of subsequent arguments is determined by the first argument.

The caller knows how many arguments there are, but the callee code must be prepared for an arbitrary number. The solution is to have the frame pointer fp point to a location a fixed distance from the first argument. Then the prologue code in the callee can find this argument, determine the number of additional arguments, and proceed from there.

8.3.4: Function Returns

8.3.A: Parameter Passing in Certain Languages

There is no new material here. Instead, some previous material is reorganized by language rather than by concept.

Parameter Passing in C

Parameters are passed by value; the argument cannot be changed by the callee. Call by reference can be simulated by using pointers.

Readonly parameters can be declared using const

Parameter Passing in C++

Default is call-by-value as in C. Call by reference can be simulated with pointers as in C but can also be explicitly stated using &

Readonly parameters can be declared using const

Parameter Passing in Java

Primitive types (e.g., int) are passed by value; objects by reference. A parameter of primitive type can be assigned to but the argument is not affected.

A object parameter can be declared readonly via final.

Parameter Passing in Ada

Semantic intent is separated from implementation. The specific modes available to the programmerin, out, and in out determine the former, not the latter.

Functions (as opposed to procedures) can have only in parameters.

8.4: Generic Subroutines and Modules

8.5: Exception Handling

8.6: Coroutines

8.7: Events

8.A: First-Class Functions

Another complication is first-class functions, that is having the ability to construct functions dynamically and treat them similarly to how other data types are treated.

One difficulty is that activation records are no longer stack like. If procedure P builds function F dynamically and then returns it, the activation record for P must be maintained since F may be called later and needs the referencing environment of P where it was created (assuming lexical scoping).

We shall see first-class functions in Scheme.

8.B: Recursion

As we have stated previously, it is the possibility of recursion that forbids static allocation of activation records and forces a stack-based mechanism.