Programming Languages

Start Lecture #6

Remark: Review the meaning of free in M.

Remark: Lab 2 was assigned last thursday it is due 22 October 2009.

Remark: The midterm will be during recitation #8, 26 Oct. A practice final is available.

Numbers in Scheme

Scheme has a wide variety of numbers, including rational number (i.e., fractions). Most implementations include arbitrary precision rationals. We will stick to simple integers.

Vectors in Scheme

A vector is similar to an array but the elements may be of heterogeneous type, similar to a record in Ada or a struct in C. We will not study vectors.

10.3.3: Equality Testing and Searching

Boolean Values

Scheme has two Boolean constants, #t and #f. However, when performing a test (e.g., in a control flow structure) any value not #f is treated as true.

Eq?, Eqv? and Equal?

Scheme has three general comparison operators, the first two of which are similar. We will use only eq? and equal?.

  (eq 'a 'a)
  (let ((x 3) (y 3)) (eq? x y))
  (let ((x '()) (y '())) (eq? x y))
  (eq? 'x 'y) ⇒ #f
  (eq? (cons 2 3) (cons 2 3)) ⇒ #f
  (eq? "" "")  (eq "xy" "xy")
  (eq? 2 2)
  (eq? '(2 . 3) '(2 . 3))  ⇒ #t
  (eq? (list 2 3) (list 2 3)) ⇒ #f
  (equal? (lambda (x) x) (lambda (x) x))

The first eq? is cheap, it essentially just checks the memory address.
So (eq 'a 'a) ⇒ #t because Scheme only keeps one symbol a. (If two locations had the symbol a, how could a be evaluated?). Similarly, Scheme keeps only one constant 3 so the second example on the right yields #t. There is also only one empty list so the next example also gives #t. Naturally the two symbols x and y are stored separately, explaining the next example. Each cons invocation produces a new cons cell (a new dotted pair), explaining the next examples.

Implementations are given freedom on how to store strings so there might be two copies of the same string. Hence the next two examples (on one line) have undefined results. They will, however, be either #t or #f. The same is true for numbers (next example, eqv? would give #t). Even though the dotted pair example that is next looks just like the cons example, this time the result is unspecified. The principle here is that the implementation is free to store all uses of the same literals in the same location. The next example is not a literal, but a cons (actually 2 cons) so the result is #f.

The story for equal? is simpler. A good rule of thumb is that if the two arguments print the same, equal? evaluates to #t; otherwise #f. The only unspecified case is for functions is the last example (both mxscheme and scheme48 give #f)

We will show searching below, after introducing cond.

  (cond
     (pred1 expr1)
     (pred2 expr2)
     ...
     (predn exprn)
     (else def-expr)
  )

10.3.4: Control Flow and Assignment

Control Flow: Cond and If

Scheme actually has a number of special forms for control flow. However, you can get by with just one, cond, a case/switch-like construct, which is shown on the right. It is a special form since not all arguments to cond are necessarily evaluated and those that are evaluated have a special rule.

First pred1 is evaluated, if the value is not #f, it is considered true and expr1 is evaluated. This completes the evaluation of the cond; the value of the cond is the value of expr1.

If pred1 evaluates to #f, the same procedure is applied to pred2, etc.

If all n predi's evaluate to #f, def-expr is evaluated and that becomes the value of the cond.

For simple tests if is convenient

    (if (condition) (exprt) (exprf))
  
Again we have a special form. First condition is evaluated. If the value is #t, exprt is evaluated and that becomes the value of the if. If condition evaluates to #f, then exprf is evaluated and that becomes the value of the if.

Control Flow: Sequencing and Do Loops

  (begin
    (expr1)
    ...
    (exprn)
  )

What do you do when the then part of your code if is more than one expression? Or if this happens to one of the expri's in a cont? You need a grouping similar to {} in C. You could use one of the let's, but that is overkill when you don't need a new scope. Instead you use the begin special form shown on the right. The expri's in the begin form are all executed in order and the value of the last one is the value of the begin.

The basic mechanism for iteration is recursion, but various looping constructs are also available

I guess the keyword do was chosen for the looping form because lisp was invented around the time of Fortran, but I do not know. You can always use recursion instead of a do loop; but the do loop is in the book if you want to use it.

Assignment: The Bangs

The special forms involving assignment end in !, which is pronounced bang (at least set! is pronounced set-bang; I am not so sure about set-car! and set-cdr!.

The (side-) effect of set! is to change the value of its first argument to the value of its second argument. Again we have a special-form since the first argument is not evaluated. It is an error to set bang an undefined identifier; for that you should use define or one of the let's.

The special functions set-car! and set-cdr! change the car (resp. cdr) fields of their first argument to the value of the second argument. I advise against their use, as the results can be surprising. For some reason, they don't appear to be available in mzscheme. There are in scheme48 and also definitely appear in the scheme manual so I am surprised. But their absence is no great loss for us. Ang found the missing mzscheme bangs. They are part of a group of mutable functions including mcons that makes a cons cell you can mutate with the mutable bangs.

Recursion on Lists

Lists are the basic Scheme data structure and recursion is the basic iteration construct so it is important to see how to use recursion to step through the elements of a loop.

  (define member
    (lambda (elem lat)
      (cond
        ((null? lat) #f)
        ((eq? elem (car lat)) lat)
        (else (member elem (cdr lat))))))

(define count-members
  (lambda (a lat) (member1 a lat 0)))
(define member1
  (lambda (a lat count)
    (cond
     ((null? lat) count)
     ((eq? a (car lat))
      (member1 a (cdr lat) (+ 1 count)))
     (else (member1 a (cdr lat) count)))))

As an example the code on the right implements member, a function with two parameters: elem an element and lat a list of atoms (i.e., lat is a list and each element is an atom, no sublists). If the element does not appear in the list, member returns #f. It it does appear, member returns the suffix of the list starting with the first occurrence. We could return #t, but the above is more common. Recall that everything except #f is viewed as true when testing, so returning either the suffix or #t has the same effect when just testing. Sometimes it is helpful to have the rest of the list in hand, perhaps for further searching.

The code sequence shown is fundamental for list operations, be sure you understand it completely. The second example counts the number of times a occurs in lat

This version of the program uses eq? for the testing. We might want instead to use equal? or even eqv?. Thus we could write three versions of member and count-members just changing eq? to equal? and then to eqv?. A better alternative is to use higher-order functions, as shown below.

(define count-members-sexp
  (lambda (a s) (member2 a s 0)))
(define member2
  (lambda (a s count)
    (cond
     ((null? s) count)
     ((atom? (car s))
      (cond
       ((eq? a (car s))
        (member2 a (cdr s) (+ 1 count)))
       (else (member2 a (cdr s) count))))
     (else ;; the car is a sublist
      (+ (member2 a (car s) 0)
         (member2 a (cdr s) count))))))

Nested Lists and S-Expressions

An element of a list can itself be a list, for example (1 (2 3)). More generally a parenthesized sequence of symbols, with balanced parenthesis is called an s-expression. How do we write a program that can deal with a list containing sublists? The code on the right does this, again counting occurrences. It assumes the sexp is a list possibly with sub-lists. But it doesn't handle the case where the sexp is just an atom.

Homework: First, enhance the last example to handle atoms as well. Second, change the example code and your enhancement to use if instead of cond where a simple if-then-else is appropriate.

10.3.5: Programs as Lists

As mentioned previously, Scheme (and Lips in general) is homiconic, or self-representing. A parenthesized sequence of symbols, with balanced parenthesis is called an s-expression whether the sequence represents a list or a program. Indeed, an unevaluated program is exactly a list.

We have seen that Scheme has a special form quote that prevents evaluation. In addition there is a function eval that forces evaluation.

  (define fact-iter
    (lambda (prod lo hi)
      (if (> lo hi)
          prod
          (fact-iter (* prod lo)
                     (+ lo 1)
                     hi))))
  (define fact-tail (lambda (n) (fact-iter 1 1 n)))

10.3.A: Tail-Recursion Revisited

We have already noted that if the last action performed by a function is a recursive call of itself (and there are no other direct or indirect recursive calls of this function), then a new AR is not needed and the recursion can be transformed into iteration by a compiler.

The only new point to be made here is that sometimes a clever programmer can turn a seemingly fully recursive program into a tail-recursive one, often by defining an auxiliary (a.k.a. helper) function. We begin with the fact procedure fact shown above when discussing letrec above. That fact executes a multiply after evaluating its recursive call and thus is nottail recursive; however the transformed version on the right is.

Homework: CYU 10.

Homework: 6, 8.

10.3.6: Extended Example: DFA Simulation

10.4: Evaluation Order Revisited

10.4.1: Strictness and Lazy Evaluation

10.4.2: I/O Streams and Monads

10.5: Higher-Order Functions (i.e., Functions as Arguments and Return Values)

Assume you need three functions for a physics research project. All of them take as input the state of a system. The first function returns the heaviest object, The second function returns the densest object, The third function returns the object having the highest kinetic energy. You could write three separate programs, but you notice that they are all the same: they are determining a max but have different definitions of less than. Hmmm.

  (define make-member
    (lambda (test?)
      (lambda (elem lis)
        (cond
          ((null? lis) #f)
          ((test? elem (car lis)) lis)
          (else (member elem (cdr lis)))))))

  (define member-eq    (make-member eq?))
  (define member-eqv   (make-member eqv?))
  (define member-equal (make-member equal?))

Returning to the member example above, we want variants with different comparison functions. So let's pass in the desired comparison function and use that. In more detail, write make-member as a function with one input, the comparison function. Make-member returns as result a function equivalent to member above but using the given comparison function instead of having eq? hard-wired in.

The result is shown on the right. Again, this is a fundamental use of first-class functions, be sure you understand it.

10.7: Functional Programming in Perspective

There is some evidence that functional programs are less buggy.

There is even greater evidence that applicative programming dominates in the real world. The question is why. Current believe is that the reason is social not technical: more courses, textbooks, libraries, etc.

Chapter 7: Data Types

We can think of a type as a set of values (the members of the type). A different question is how should the type be represented on the computer (Intel Floating Point, IBM Floating Point, vs. IEEE Floating Point; two's complement vs. one's complement for negative integers; binary vs. hexadecimal vs. octal). We will not discuss this second question.

Types can give implicit meaning to operations. For example in Java + means concatenation if the operands are strings and means addition if the operands are integers.

Type clashes, using a type where it is not permitted, often indicates a bug and, if checked automatically by the run-time system or (better, yet) by the compiler, can be a big aid in debugging.

7.1: Type Systems

A type system consists of:

The synthesis/inference terminology is not standardized. Some texts, e.g., 3e, use type inference both for determining the type of the whole from the type of its parts, and for determining the type of the parts from the type of the whole. Other texts, e.g., the Dragon book, use type synthesis for the former and type inference for the latter.

Some languages are untyped (e.g., B the predecessor of C); we will have little to say about those beyond saying that B actually had one datatype, the computer word.

Types must be assigned to those constructs that can have values or that can refer to objects that have values. These include.

7.1.1: Type Checking

Definition: Type checking is the process of ensuring that a program obeys the type system's type compatibility rules.

Definition: A violation of the type checking rules is called type clash.

Definition: A language is called strongly typed if it prevents operations on inappropriate types.

Definition: A strongly typed language is called statically typed if the necessary checks can be performed at compile time.

Note that static typing is a property of the language not the compiler: A statically typed language could have a poor compiler that does not perform all the necessary checks.

int main (void) {
    int *x;
    int y;
    int *z;
    z = x + y;
    return 0;
}

(define test
  (lambda ()
    (let ((x 5) (y '(4)))
      (+ x y))))

procedure Ttada is
   X : Integer := 1;
   type T is access Integer;
   Y : T;
begin
   X := X + Y;
end Ttada;

Strong vs Weak Typing

The key attribute of strongly typed languages is that variables, constants, etc can be used only in manners consistent with their types. In contrast weakly typed languages offer many ways to bypass the type system.

A good comparison would be original C vs (Ada or Scheme). C has unions, varargs, and a deliberate confusion between pointers and arrays. Original C permitted many other type mismatches. A motto for C is "Trust the programmer!". Both Ada and Scheme are much tighter in this regard: both are strongly typed, Ada is (mostly) statically typed.

Compare the three programs on the right. The C program compiles and runs without errors! The Scheme define is accepted, but (test) gives an error report. The Ada program doesn't even compile.

Static vs Dynamic (Strong) Type Systems

Static and Dynamic strongly typed systems both prevent type clashes, but the prevention is done at different times and by different methods.

In a static type system

In a dynamic type system

Ada, Pascal, and ML have static type systems.

Scheme (Lisp), Smalltalk, and scripting languages (if strongly typed) have dynamic type systems. These systems typically have late binding as well.

A mixture is possible as well. Ada has a very few run-time checks; Java a few more.

Static type systems have the following advantages.

Dynamic type systems have the following advantages.

7.1.2: Polymorphism

Definition: Polymorphism enables a single piece of code to work with objects of multiple types.

Definition: In parametric polymorphism the type acts as though it is an extra unspecified parameter to every operation.

Consider dynamic typing as use in Scheme. Depending on the type of the operands, the addition operator + can indicate, integer arithmetic, real arithmetic, or infinite precision arithmetic. If the operands are of inappropriate type, + signals an error. Since the type was never specified by the programmer, this is often called implicit parametric polymorphism.

fun len xs =
  if null xs
  then 0
  else 1 + len (tl xs)

The example on the right is written ML, which is statically typed, yet still manages to support implicit parametric polymorphism. The (very slick) idea is that ML supports type inference so is able to deduce individual types from the type of an expression. In this case the interpreter determines that the type of the len function as 'a list→int, i.e., a function with parameter a list of type 'a (unknown) and result integer.

Consider instead generics in Ada and Java and templates in C++. In this case the programmer writes code for each type and system chooses which one to invoke depending on the type. This is called explicit parametric polymorphism.

We will soon learn that the positive integers can be considered a subtype of the integers and that a value in a subtype can be considered a value on the supertype. This is called subtype polymorphism.

Similarly, the ability to consider a class as one of its superclasses is called class polymorphism.

7.1.3: The Meaning of Type

Types can be though of in at least three ways, which we will briefly describe. They are the denotational, constructive, and abstraction-based viewpoints.

With denotational semantics:

  1. A type is simply a set T of values.
  2. A value has type T if it belongs to the set.
  3. An object has type T if it is guaranteed to be bound to a value in T.

An advantage of denotational semantics is that composite types, e.g., arrays and records, can be described using standard mathematical operations on sets.

With constructive semantics:

  1. A type is either built-in or
  2. Constructed from basic-in or other constructed type using a type-constructor (record, array, etc).

With abstraction-based semantics, a type is an interface consisting of a set of operations with well defined, consistent semantics. It is characteristic of object-oriented languages.

In practice, we normally think of a type using all three viewpoints.

7.1.4: Classification of Types

We will first discuss several scalar types and then composite types that consist of aggregates of both scalar and other composite types.

Scalar Types

Discrete Types

The term discrete comes from mathematics, where it is contrasted with continuous.

A type is considered discrete if there is a clear notation of successor and predecessor for values in the type. Mathematically, this gives is an isomorphism between the elements of the type and a consecutive subset of integers. Indeed, the basic examples of discrete types are the integers themselves and enumeration types.

Integers. Of course with only a finite number of bits to use for the representation (most commonly 16, 32, or 64), the integer type is really just a finite subset of the mathematical integers.

  type Suit  is (Club, Diamond, Heart, Spade);
  type Organ is (Lung, Heart, Brain, Skin, Liver);
  Card : Suit  := Heart;  -- Legal
  Sick : Organ := Heart;  -- Legal
  ...
  Sick := Card;           -- Compile time error

Enumeration Types. These types have an obvious and compact implementation: Values in the type are mapped to consecutive integers.

  type Score is new Integer range 0..100;
  type Day   is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
  subtype Weekday is Day range Mon..Fri;
  X  : Integer := 3;
  Y  : Score   := 3;
  D1 : Day     := Sun;
  D2 : Weekday := Mon;
  ...
  if D1 < D2 then   -- legal
    X := Y;            -- illegal
  end if;

Subrange Types. Ada and Pascal support types that are subsets of others; Ada has two quite useful variations. The type Score is another type. It happens to have a subset of the values and the same operations as does the type integer, but a Score is definitely not an integer: The assignment of Y to X on the right is a compile-time error.

In contrast Weekday is not a new type but instead a subtype of Day. Hence values of the two types can be combined and assignment from one to the other is legal. However, a (often run-time) check is needed if a Day is assigned to a Weekday to ensure that the constraint is satisfied.

Other Numeric (Scalar) Types

Nearly all languages have several other numeric types.

Non-numeric Scalar Types

We consider here Boolean, character and string, and void.

Boolean. The type was named after George Boole and is very common. C came late to the party: Boolean was added only in C99.

Character and Strings. Very common (exception: javascript has no character type). An important modern consideration is support for non-ascii characters. Most modern languages support at least a 16-bit-wide character. As an example of the growing importance of enhanced character types, I note that Ada83 had only (8-bit) character, Ada95 added 16-bit wide_characters, and Ada05 added in addition 32-bit wide_wide_characters. In each instance there is a string type holding the corresponding characters.

Another question is whether strings can be changed during execution or are they instead only constants. Java chose the latter approach, most other languages the former.

Finally we come to void, which is used as a return type of a procedure in C and Java to indicate that no value is returned. ML has a similar type unit type, which has only one value written (). Haskell has the same type but () names both the type and the only value of the type.

Composite Types

Non-scalar types are normally called composite and are generally created by applying a type constructor to one or more simpler types. We will study several composite types shortly. Here we just list a bunch of them with very brief commentary.

  type Univ is record
    Name : string (1..5);  -- fancier strings exist
    Zip  : integer;
  end record;
  NYU : Univ;
  A : array (1..5) of integer
  ...
  NYU := ("NYU  ", 10021);          -- positional
  NYU := (Zip=>10021, name=>"NYU"); -- named
  A := (5, 4, 3, 2, 1);
  A := (1..3=>5, 5=>2, 4=>3);

7.1.5: Orthogonality

Languages try to have their features orthogonal, i.e., the rules for one feature apply to all possibilities of another. This is as opposed to have special rules for all situations. Original Fortran required array bounds to be integers, Pascal, Ada, et al. permit any discrete type; Early C requires array bounds to be know at compile time, but modern C permits the bound to be a parameter; Ada requires the bound to be known at the time the array declaration is elaborated.

An important example of orthogonality is whether the language permits literals to be written for composite types. An Ada example is on the right.

7.1.A: Assigning Types

How does the programmer and/or system specify the type of a program construct? At least three methods exist.

  1. Explicit type declarations in the program. This is by far the most common.
  2. No compile-time bindings. This is for dynamically-typed languages like Scheme
  3. The syntax of the construct determines its type. In Fortran variables beginning with I-N were by default Integer; others were by default Real. I don't believe any new languages do this.

Homework: CYU 1, 2, 3, 4, 10.

7.2: Type Checking

The 3e terminology is not standard and many do not use it. The 3e uses type inference to include both the case where the type of a composite is determined by the types of the constituents, and the opposite case where the type of a constituent is determined from its context and the type of the composite.

I believe normally type inference is just used for the second case of determining the type of a constituent from its context. The first case (constituent to composite) is then called type synthesis.

For type synthesis the programmer declares types of variables and the compiler deduces the types of expressions and determines if the usage is permitted by the type system.

For type inference (e.g., ML and Haskell) the programmer does not declare the type of variables. Instead the compiler deduces the types of variables from how they are used. For a trivial example an occurrence of X+1 implies that X is an integer.

  type T1 is new integer;
  type T2 is new integer;
  type T3 is record x:integer; y:integer; end record;
  type T4 is record x:integer; y:integer; end record;
  subtype S1 is integer;

7.2.1: Type Equivalence

When are two types equivalent? There are two schools: name equivalence and structural equivalence. In (strict) name equivalence two type definitions are always distinct; Thus the four types on the right T1,...,T4 are all distinct. In structural equivalence, types are equivalent if they have the same structure so types T3 and T4 are equivalent and aggregates of those two types could be assigned to each other. Similarly, T1, T2, and integer are equivalent under structural equivalence.

Ada uses name equivalence so the types are distinct. However, Ada offers subtypes, which are compatible (see 7.2.2) to the parent type (but can, and often do, have range constraints). So S1 is equivalent to integer.

  type student = {
    name:    string,
    address: string }
  type school = {
    name:    string,
    address: string }

Most new languages, but not ML from which the example on the right is taken, adopt name equivalence to avoid have student and school considered equivalent types. Assigning a student to a school is normally a sign of a bug that should be caught as early as possible.

Many languages have a mixture of name and structural equivalence. For example in C structs use name equivalence; whereas structural equivalence is used for everything else.

Variants of Name Equivalence

In addition to strict name equivalence as used above, there is also a concept of loose name equivalence where one type can be considered an alias of another. For example in Modula-2, which has loose name equivalence,

    TYPE T5 = INTEGER;
  
would be considered an alias of INTEGER and variables of type T5 could be assigned to variables of type INTEGER.

Type Conversions and Casts (and Nonconverting Type Casts)

What happens if type A is needed and we have a value of type B? For example, if we have an assignment statement X:=Y with Y of type A and X of type B. For another example, suppose we invoke F(Y) with Y of type A and the parameter of F of type B. In these cases we need to convert the value of type A to a value of type B. In many languages, the programmer will need to indicated that the conversion is desired. For example in Ada, assuming X is of type T1 above and Y of type T2, the programmer would write

    X := T1(Y);
  
Consider four cases.
  1. Types A and B are structurally equivalent, but the compiler uses name equivalence. The programmer must state that the conversion is desired, but no code is generated since the types use the same implementation.
  2. The types are different, but for some values share the same representation. In this case only a check is needed. If the check passes (or can be deduced at compile time), the assignment is done without conversion.
  3. If the representation is different, both checking and conversion code may be needed. For example assigning an integer to a floating point variable requires conversion and the reverse assignment requires a range check as well as conversion (also the conversion may involve loss of precision).
  4. A nonconverting type cast is an assertion by the programmer that the system should just treat the value of type A as a value of type B. This is quite dangerous, but is sometimes needed in low-level code. For example, you read some bits from the network and then subsequent bits tell how to interpret the previous bits. This is called an unchecked_conversion in Ada and a reinterpret_cast in C++.

Homework: CYU 13, 14, 15.

Homework: 1, 2, 3, 6.