Programming Languages

Start Lecture #9

Modules

This topic includes the material in 3.3.4, 3.3.5, and 3.7 as well as additional material not in the text.

Chapter 3: Names, Scopes, and Bindings (continued)

3.3: Scope Rules (continued)

3.3.4: Modules

The Challenge of Monster Codes

A very serious problem in real-life computing is the super-linear rise in complexity of computer software. That is, if monolithic program 2 is twice as large as monolithic program 1, it is more that twice as complicated. As a very rough approximation, if program 1 has N things that can affect each other that gives N² possible interactions. For program 2, the numbers are 2N and 4N².

Thus, the following techniques prove to be very useful.

Break a size 2N program into 2 size N pieces that interact very weakly.
- Problem Decomposition: The primary goal is to minimize the amount of complexity that has to be dealt with at any one time.
- Information Hiding: Encapsulate the complexity of a given piece so that other pieces do not see the complexity.
Reduce the complexity of each piece.

In this section, we mostly study technique 1. Languages without (or with limited) side effects are believed to support technique 2, as is good programming style in any language.

There have been many cute quotes concerning the deleterious effects of program complexity; my favorite is from Tony Hoare (Sir Charles Antony Richard Hoare, quicksort, Hoare logic, CSP).

There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deﬁciencies, and the other is to make it so complicated that there are no obvious deﬁciencies.

There are additional benefits of information hiding, which include

The risk of name conflicts is reduced since fewer names defined in one piece of code are visible in other pieces.
The data abstractions introduced in one piece are less likely to be violated when other pieces cannot see the implementation.
Run time errors are easier to localize. Only those pieces able to update an object need be checked if the object has an incorrect value.
If only the interface of an abstraction is made visible, then the implementation can be changed without affecting other pieces of code.

Encapsulating Data and Subroutines

We have already seen an important method to split a program into pieces and to hide (some of) the details of one piece from the other pieces: namely the subroutine.

Certainly if you are given subroutine sort(A,B,n) that sorts the array A of size n, putting the result into array B, you do not need to worry if the algorithm used is bubble sort, heap sort, (Hoare's) quicksort, etc. Moreover, any local variables in sort are safe from your misuse.

The difficulty with subroutines is that they only solve a part of the problem. For example, how about a queue package that has two subroutines as well as some private data?

For this reason modules have been introduced in many new languages and retrofitted in some old ones.

Modules as Abstractions

Definition: A module is a programming language construct that enables problem decomposition and information hiding. Specifically a module

Defines a set of logically related entities. If the module is well designed, these entities depend on each other. In other words the module should have strong internal coupling.
Has a public interface that defines the entities exported by this component. Equally significant is the fact that the entities exported by this component are limited to those appearing in the public interface.
May include other entities that are not exported. By limiting the exported items to just those in the public interface, the module supports information hiding.

Modules were introduced by Barbara Liskov (recent Turing award winner) in her language Clu; she called them clusters (hence the name of the language). As an aside a Liskov design was one of four finalists in the competition for the design of Ada. Another early language using modules was Modula (Wirth), which begot Modula 2 and Modula 3. Ada 83 was one of the next languages to support modules. Now many languages do.

Although the concept of modules is now widely accepted, the name module is not. As mentioned they were call clusters in Clu. All versions of Ada call them packages as do Java and Perl. C++ C#, and PHP call them namespaces. C permits a weak form of modules through the separate compilation facility.

A module is somewhat like a record, but with a crucial distinction.

A record consists of a set of names called fields, which refer to values in the record.
A module consists of a set of names, which can refer to values, types, routines, other language-specific entities, and possibly other modules.

Imports and Exports

As mentioned above, an important feature of modules is the selective exporting of name: Only those names in the public interface can be accessed outside the module; those in the private implementation cannot.

Finer control is also possible. For example a name may be exported read-only or its internals may not be exposed.

A question arises as to whether the public interface is automatically imported into all other components. For example, if a queue module exports insert as part of its public interface, is the name insert automatically part of the environment of every other module or must these other modules specifically import the name? Modules for which the importation is automatic are said to have open scopes; those for which a specific import is needed are said to have closed scopes.

Ada, Java, C#, and Python take a middle ground. If module stack exports the name push, then all other modules have a compound name something like stack.push in their environment. If, in addition module B specifically imports names from stack, then the simple name push is available. In Ada, it is actually a little more complicated: if module A must issue a with of module B in order to access the items in B with dotted notation. If, in addition A issues a use of B, then A can use the short form.

Modules as Managers

One can imagine an application that needed exactly one queue which is encapsulated together with insert and remove in a module you might call TheQueue. More common, however, is to have a module called Queues, which defines a type Queue. Then a user of that module writes Q1:Queue and Q2:Queue to obtain two queues.

In this common case one says that the module manages the type Queue. Note that insert and remove now need to be given the specific Queue as an argument. So you might see something like the left hand procedure call

    Queues.Insert(element,Q1);           Q1.Insert(element);

In the next section, we will see that an object oriented viewpoint might lead to the right hand invocation.

Language Choices

Different languages use different terminology for modules and support somewhat different functionality. For example:

Ada: Extensive support for modules is provided. A module is called a package and comes in two parts.
- The package specification, which gives the public interface.
- the package body, which gives the private implementation.
We will see a full implementation of a simple queue package.
C: There is support for modularization with judicious use of header files and #include directives. However, C does not have a full module interface.
C++: In addition to the facilities it carries forward from C, recent definitions of the C++ language include namespaces, using declarations/directives, and name space alias definitions. We will see examples of these.
Java: Like Ada, Java calls modules packages. Also provided is an import statement.
ML: Module support is provide in ML via signature, structure, and function definitions.

package QPack is
   procedure Insert (X:Float);
   function Remove Return Float;
end QPack;
package body QPack is
   Size : constant Natural := 50;
   type QIndex is new Integer range 0..Size;
   type Queue is array (QIndex) of Float;
   TheQueue : Queue;
   Front, Rear : QIndex :=0;
   procedure Insert (X:Float) is
   begin
      Rear:=(Rear+1) mod Qindex(Size);
      TheQueue(Rear):=X;
   end Insert;
   function Remove return Float is
   begin
      Front:=(Front+1) mod Qindex'Last;
      return TheQueue(Front);
   end Remove;
end QPack;
with QPack; with Text_IO;
procedure Main1 is
   X: Float;
begin
   QPack.Insert(15.0);
   QPack.Insert(9.5);
   X:=QPack.Remove;
   if X=15.0 then
      Text_IO.Put("OK");
   else
      Text_IO.Put("NG");
   end if;
end Main1;

A Simple One-Queue Ada Package

On the right is a full implementation and use of a very simple module, namely an Ada package for a single (size 50) queue (of floats). Note the three parts: the package specification or public interface, the package body, or private implementation, and the use of the package in a client procedure. The example on the right exports two procedures.

The package specification includes only declarations (not definitions, those come in the body) for the subroutines Insert and Remove. The queue itself is not visible outside the package.
The package body gives the implementation of the two routines as well as the necessary declarations. In particular TheQueue is define here. My code is a poor implementation of a queue: At the very least it should check for full and empty queues and raise an exception if they occur. Moreover, most queue implementations allows the user to specify the size of the queue rather than hard-wiring the size as I have done.
Note the conversion Qindex(Size). Since Size is a Natural and Rear+1 is a Qindex, one must be converted to the other so that the mod can be applied. For the second mod, I used a different method: Given any discrete type, the attribute 'Last gives the last element.
Note that TheQueue is indexed only by the variables Rear and Front, which are of type Qindex. Hence a successful compilation guarantees that I am not accessing TheQueue outside its definition. Ada is indeed statically typed! To be fair a run-time check might be needed for statements like
```
        Rear  := (Rear+1)  mod Qindex(Size);
        Front := (Front+1) mod Qindex'Last;
      
```
to ensure that the range constraints are met for the assignment. I don't know if the gnat Ada compiler is smart enough to realize the 2nd version can't be out of bounds, but it is computable at compile time. The 1st version can't be out of bounds either, but that takes an additional step of logic to determine.
The procedure Main1 uses Insert and Remove in a trivial manner. There are two points to note. First, the with statements are required to access the external modules, Q, which I wrote, and Text_IO, which is part of the standard Ada library. Even having with I needed to use names like Q.Insert. Second, by adding use statements, the dotted notation can be avoided.

package QQpack is
   type Queue is private;
   procedure Insert (X : Float; Q : in out Queue);
   procedure Remove (X : out Float; Q : in out Queue);
   QueueFull, QueueEmpty : exception;
private
   Size : constant Natural := 50;
   type QIndex is new Integer range 0..Size;
   type Contents is array (Qindex) of Float;
   type Queue is record
      Front : Qindex := 0;
      Rear  : Qindex := 0;
      NItems: Natural range 0..Size := 0;
      Item  : Contents;
   end record;
end QQPack;
package body QQPack is
   procedure Insert (X : Float; Q : in out Queue) is
   begin
      if Q.NItems=Size then raise QueueFull; end if;
      Q.NItems := Q.NItems+1;
      Q.Rear := (Q.Rear+1) mod Qindex(Size);
      Q.Item(Q.Rear) := X;
   end Insert;
   procedure Remove (X : out Float; Q : in out Queue) is
   begin
      if Q.NItems=0 then raise QueueEmpty; end if;
      Q.NItems := Q.NItems-1;
      Q.Front := (Q.Front+1) mod Qindex'Last;
      X := Q.Item(Q.Front);
   end Remove;
end QQPack;
with QQPack; use QQPack;
with Text_IO; use Text_IO;
procedure Main2 is
   X: Float;
   TheQueue : Queue;
begin
   Insert(15.0, TheQueue);
   Insert(9.5,  TheQueue);
   Remove(X,TheQueue);
   if X=15.0 then
      Put_Line("OK");
   else
      Put_Line("NG");
   end if;
exception
   when QueueFull  => Put_Line ("Queue Full");
   when QueueEmpty => Put_Line ("Queue Empty");
end Main2;

A Better Ada Queue Package

On the right, we now see a better queue package. A major change between the previous package and this one is that this package is for many queues not just one. Instead of implementing a queue, we now make public the type of the queue and have the user allocate however many queues they want.

We again see three parts to the implementation: The package specification, the package body, and the procedure using the package. However, this time the package specification has two parts, the normal public section and a new section labeled private.

The package specification again includes declarations (not definitions) of the Insert and Remove procedures. There are two differences between the versions here and in the previous section. First, since the user may be declaring multiple queues, the procedures now have the specific queue as an additional (in out) argument. Second, Ada functions can have only in parameters so Remove is now a procedure with the removed value X as an out parameter.
We see that the type Queue is declared here in the public section, but is declared to be private. What does this mean? It means that the user can declare objects of type Queue but cannot see inside those objects, i.e., cannot see the implementation. This is sometimes called an opaque declaration. In the private part of the specification, we see the details of the queue definition, including definitions of the types used within queue. Note that Front and Rear are now per-queue quantities. We added a count of the items present to detect full and empty queues.
Finally, back in the public part, we see declarations of two exceptions whose lack we criticized in the previous version.
The package body contains only the definitions of Insert and Remove. It includes the natural code to count items and detect full and empty queues. It raises the QueueFull and QueueEmpty exceptions when appropriate.
The procedure Main2 declares a queue and uses it as expected. It could just as easily declare more queues. Note the exception section at the end of the procedure. If any of the statements between begin and exception raise either the QueueFull or QueueEmpty exception, the corresponding when clause will be triggered and the appropriate message will be printed. The exceptions are available because they appeared in the specification. However, to determine what events can trigger them, we need to look at the package body.
Note that we included use statements for QQ and Text_IO. Had we not, Insert, Remove, QueueFull and QueueEmpty would require a QQ. prefix and Put_Line would require Text_IO.

   function Empty (Q : Queue) return Boolean;
   function Full  (Q : Queue) return Boolean;
   function "=" (Q1, Q2 : Queue) return Boolean;
   function Empty (Q : Queue) return Boolean is
   begin
      return Q.NItems=0;
   end Empty;
   function Full (Q : Queue) return Boolean is
   begin
      return Q.NItems=Size;
   end Full;
   function "=" (Q1, Q2 : Queue) return Boolean is
   begin
      if Q1.NItems /= Q2.NItems then
         return False;
      end if;
      for J in 1..Q1.NItems loop
         if Q1.Item((Q1.Front+QIndex(J-1)) mod QIndex(Size)) /=
            Q2.Item((Q2.Front+Qindex(J-1)) mod QIndex(Size)) then
               return False;
         end if;
      end loop;
      return True;
   end "=";
   AnotherQ : Queue;
   if TheQueue = AnotherQ then
      Put_Line("eq");
   else
      Put_Line("ne"); end if;
   if QQPack."="(TheQueue,AnotherQ) then

Some Extra Goodies On the right, in the first frame, we see three possible additions to public part of the package specification (they could be placed right after Remove). The corresponding additions to the package body are below the line.

The first two functions Empty and Full are trivial but, and this is important, the client can not write these functions since they use the private object NItems. With these two functions added, users can either test for full and empty themselves or can ignore the problem and process any QueueFull and QueueEmpty exceptions that occur.

The third function "=" is the equality comparison function. It simply checks that the two given queues have the same number of items and that corresponding items are equal. In Ada, when you define an equality predicate, as we did, the system automatically defines the corresponding inequality predicate.

If the user of QQPack includes a use QQPack statement, then the equality and inequality operators on Queues can be written simply as = and /=. For example, the third frame to the right contains inserts to the client code to utilize the queue equality predicate. First we define AnotherQ and then compare it to TheQueue. If no use statement is given, the if statement must be written as shown in the last frame.

Why Are They All Equal Size Queues of Floats? Because we haven't learned about Ada generics ... yet.

namespace stack {
    void push (char);
    char pop();
}
#include <stdio.h>
#include "stack.h"
namespace stack {
    const unsigned int MaxSize = 50;
    char item[MaxSize];
    unsigned int NItems = 0;
    void push (char c) {
	if (NItems >= MaxSize)
	    printf("Stack overflow!\n");
	item[NItems++] = c;
    }
    char pop () {
	if (NItems <=0)
	    printf("Stack underflow!\n");
	return item[--NItems];
    }
}
#include "stack.h"
int main(int argc, char *argv[]) {
    stack::push('c');
    if (stack::pop() != 'c')
	    return 1;
    return 0;
}
using namespace stack;
using stack::push;

A Simple One-Stack C++ Namespaces

Namespaces are a late addition to C++ that serve some of the same functions as Ada packages. They encapsulate related entities and privatize names. This privatization permits a developers of a namespace to define "namespace global" names such as error_count without fear of clashing with other namespaces.

On the right we see three programs stack.h, stack.cpp, and main1.cpp, that correspond to the familiar specification, implementation, client usage pattern we used above with Ada. Indeed, the pieces do correspond roughly to their Ada counterparts.

One difference between modules in the two language is that C++ is more filename oriented. Whereas in ada the client asked for the name of the package; in C++, the client includes the file by filename. Although no one would do it, I tried replacing all occurrences of stack.h with x.y and everything worked.

Another effect of the filename versus package name difference is that the C++ module implementation (stack.cpp) #includes the module specification (stack.h); whereas, in Ada this is automatic (e.g., in the better queue implementation, Size was defined in the specification and used without declaration in the body). In the very simple C++ code on the right there is no real need for the insertion since everything in .h file is in the .cpp as well. However, the #inclusion does ensure that the compiler check for consistent declarations (actually, g++ does not check very well; I changed the .h file so the push took a an integer argument and the .cpp file still compiled. But keep this quiet.

using Declarations and Directives The C++ using directive is similar to the Ada use statement. For example the statement in the fourth frame on the right just after the #include in the client code would permit programmer to elide both uses of :: in main. Unlike Ada, C++ has another form of the using directive that imports just a single name. The bottom frame shows a statement that would permit eliding the :: prefixing push, but not the :: prefixing pop.

What about C++ Generics? They are called templates and will be discussed later.

Argument Dependent Lookup (ADL) in C++

C++ has a feature in which an unqualified function name is searched automatically in several namespaces, depending on the types of the arguments at the call site. This rather technical feature is also called argument dependent name lookup and Koenig lookup, named after Andrew Koenig. We will not be using it.

Non-Holistic Aspects of the C++ Module System

The C++ compiler does not check to see that all entities declared in the visible interface (.h file) are actually defined in the implementation (.cpp files) or that multiple different definitions are given by the various .cpp files. This is because compilation is done just one file at a time. Consequently errors of this kind are found by the linker. In contrast, the package system is fully understood and checked by the Ada compiler.

Abbreviating Module Names in Ada and C++

Both Ada and C++ have simple facilities for abbreviating long module names. This becomes especially important when you have submodules. So if entity x is in module B which is a submodule of module A we would write A.B.x in Ada and A::B::x in C++.

package P1 renames A.Very_Long.Nested.Package_Name;
namespace n1 = a.very_long.nested.namespace_name;

The abbreviation facility is called renames in Ada and is written simply as = in C++. Both are illustrated on the right.

Packages in Java

Modules in Java are called packages and again aid in privatizing names thereby preventing unintended name clashes. Packages are closely linked to directories in the file system. Normally one package is stored in one directory and the name of the package is the name of the directory. There are other possibilities and what follows is a simplification.

Recall that a .java file can contain at most one public class. For simplicity, we ignore non-public classes and all subclasses. Hence each file contains exactly one class, which is public. In this case the file name must be the class name with the suffix .java appended.

package package-name;

Ignoring whitespace and comments, the beginning of the file determines the package in to which the contained public class will be placed. If the file begins with a package statement as shown on the right, the class becomes part of package package-name. If the file does not begin with a package statement, the class becomes part of the default anonymous package.

Although the language definition actually permits more flexibility, it is recommended and we will assume that a package equals a directory, that is, we assume that all the .java files in directory .../pack1 are members of the pack1 package and no file outside .../pack1 is a member of pack1.

All the classes in a given package (i.e., all the .java files in a given directory) are directly visible to each other; no directive is required.

import package-name.class-name
import package-name.*

By default, different packages are completely independent, neither can access any of the entities in the other. This behavior can be modified by either of the import directives to the right. All import directives must occur immediately after the package statement. The first directive states that the class in the current file has access to class class-name from package package-name. The second directive states that the class in the current file has access to all the classes in package package-name.

signature STACKS =
sig
   type stack
   exception Underflow
   val empty : stack
   val push : char * stack -> stack
   val pop : stack -> char * stack
   val isEmpty : stack -> bool
end;
structure Stacks : STACKS =
struct
   type stack = char list
   exception Underflow
   val empty = [ ]
   val push = op::
   fun pop (c::cs) = (c, cs)
     | pop [] = raise Underflow
   fun isEmpty [] = true
     | isEmpty _ = false
end;
- val s1 = Stacks.empty;
val s1 = [] : Stacks.stack
- val s2 = Stacks.push (#"x",s1);
val s2 = [#"x"] : Stacks.stack
- val s3 = Stacks.push(#"y",s2);
val s3 = [#"y",#"x"] : Stacks.stack
- val (x,s4) = Stacks.pop s3;
val x = #"y" : char
val s4 = [#"x"] : Stacks.stack
- Stacks.isEmpty (#2 (Stacks.pop s4)); 
val it = true : bool
- Stacks.isEmpty s4;
val it = false : bool

Modules in ML

ML has a fairly rich module structure that we are not covering, but on the right is an illustration of the most basic aspects.

The first frame shows a signature, which is the module's visible interface. As we shall see, it is the constant empty (not to be confused with the predicate isEmpty) that serves as the constructor for the STACKS signature.

The next frame shows one structure for the STACKS signature. This structure is named Stacks.

Whereas Ada has exactly one implementation for each specification, ML is more liberal. The user can have several structures active at one time, each implementing the same signature (in different manners). As we shall see, the long form name of an entity includes the structure name, not the signature name. In this case the structure Stacks gives a list-based implementation of the signature STACKS. ML also permits a single structure to implement multiple signatures, but we will not discuss that feature.

The remaining frames show an ML session after the first two frames have been entered. We first create an empty stack, and push two chars on to it. Note that the following pop produces a 2-tuple of output, the value popped and the stack after popping.

We then pop s4, which has only one element and test the 2nd component of the output (the new stack) to verify that it is empty. Finally we confirm that s4 has indeed not been changed by popping; it is still not empty.

What if I Want Different Types of ML Stacks Read about functors, which are parameterized structures and correspond to Ada generics.

var A, B : queue  { queue is a module name }
var x, y : element
...
A.insert(x)

Module Types and Classes (and Object Orientation)

In Ada, C++, and Java a module can be viewed as an encapsulation or manager of data structures and operations on them. Consider our Ada treatment. We first defined a package that supported one queue. We could not instantiate this package several times to get several queues; instead we wrote a more elaborate package that permitted the user to declare multiple queues.

An alternative approach to modules has been adopted by Euclid (and to some extent ML, but we haven't exploited that). In Euclid a module is a type and hence a queue module would be a queue type (not just supply a queue type). The user then instantiates the module multiple times to get multiple queues and (logically) multiple insert and remove operators.

Note that, the sample Euclid code on the right instantiates two queues, each of which has an associated insert procedure. In Ada this would have been written Insert(X,A).

Object Orientation Viewing a module as a type (rather than as a manager) should remind you of object oriented programming. Indeed, a module type is quite similar to a class in an object oriented language, with the big exception that module types do not support inheritance. Note that, with both module types and classes, each instantiated object has defined an instance of all the operations associated with the module type or class.

Modules Containing Classes Since module types are so similar to inheritance-free classes, one might ask why an object oriented language like Java, which has a class structure, would include modules as well.

The answer is that they serve somewhat different purposes. Object orientation works well for multi-instance abstractions. Early on in your Java career, you learn about classes called rectangle, circle, triangle, etc each of which inherits from class shape. Then you instantiate many rectangles, circles, and triangles.

A prototypical module usage would be for a large problem that has a large functional subdivision. For example an aircraft control system has large subsystems concerned with guidance, aerodynamics, and life support. While you might have multiple aerodynamics codes using different algorithms for reliability, this is not the same as having multiple rectangles on a graphics canvas.

Homework: CYU 21, 22, 24, 26.

Homework: I provided two queue packages in Ada, consider the second (better) one. Produce a similar package for dequeues (double ended queues). These support 4 operations inserting and removing from either end. Name them InsertL, InsertR, RemoveL, RemoveR. Your solution should have three parts: dequeue.ads, the specification (interface), dequeue.adb, the body (implementation), and client.adb the client (user) code. Include the goodies. If you want to compile and run your code, ada is available on the nyu machines.

    gnatmake client.adb

will compile everything.