Programming Languages

Start Lecture #10

Chapter 9. Data Abstraction and Object Orientation

We have jumped from the middle of chapter 3 to the beginning of chapter 9, which suggests that there will be a great change of focus. However, this is not true and the beginning of chapter 9 flows perfectly after the part of chapter 3 we just finished.

As we mentioned at the end of last lecture, modules have much in common with the classes of object-oriented programming. Indeed, in one (admittedly less common) formulation of modules, namely modules as types, modules are quite close to classes without inheritance.

However, do not think that classes are a new and improved version of modules. The primary ideas of object oriented programming actually predate modules.

9.1: Object-Oriented Programming

With modules we have already gained these three benefits.

  1. The conceptual load has been reduced by information hiding.
  2. Fault containment is increased by denying client access to the implementation.
  3. Program components are more loosely coupled since only changes to the module interface can require changes to client code.

We are hoping for more. In particular we want to increase the reuse of code when the requirements change very little. Object-oriented programming can be seen as an attempt to enhance opportunities for code reuse by making it easy to define new abstractions as extensions or refinements of existing abstractions.

What is Object-Oriented Programming (OOP)

Generally OOP is considered to encompass three concepts.

  1. An abstract data type (ADT).
  2. Inheritance.
  3. Dynamic binding.

Lets describe these in turn.

Abstract Data Type Recall the module-as-type formulation, in which we instantiate the module multiple times to obtain multiple copies of the data and operations. When described in OOP terminology we say that the idea of an object is to

An object is an entity containing

A class is a construct defining the data and methods contained in all its instances (which are objects).

Inheritance With inheritance, classes can be extended

If class B extents class A, B is called a subclass (or a derived or child class) of A, and A is called a superclass (or a base or parent class) of B.

Dynamic Binding Whenever an instance (i.e., object) of a class A is required, we can use instead an instance of any class derived (either directly or indirectly) from A.

Examples from the CD

#include <iostream>
using std::cout;
using std::flush;

class list_err { public: const char *description; list_err(const char *s) {description = s;} };
class list_node { list_node* prev; list_node* next; list_node* head_node; public: int val; list_node() { prev = next = head_node = this; val = 0; } list_node* predecessor() { if (prev == this || prev == head_node) return 0; return prev; } list_node* successor() { if (next == this || next == head_node) return 0; return next; } int singleton() { return (prev == this); } void insert_before(list_node* new_node) { if (!new_node->singleton()) throw new list_err ("attempt to insert listed node"); prev->next = new_node; new_node->prev = prev; new_node->next = this; prev = new_node; new_node->head_node = head_node; } void remove() { if (singleton()) throw new list_err ("attempt to remove unlisted node"); prev->next = next; next->prev = prev; prev = next = head_node = this; } ~list_node() { if (!singleton()) throw new list_err ("attempt to delete listed node"); } };
class list { list_node header; public: int empty() { return header.singleton(); } list_node* head() { return header.successor(); } void append(list_node *new_node) { header.insert_before(new_node); } ~list() { if (!header.singleton()) throw new list_err ("attempt to delete non-empty list"); } };
class queue : public list { public: void enqueue(list_node* new_node) { append(new_node); } list_node* dequeue() { if (empty()) throw new list_err ("attempt to dequeue empty queue"); list_node* p = head(); p->remove(); return p; } };
void test() { list my_list; list_node *p; for (int i = 0; i < 4; i++) { p = new list_node(); p->val = i; my_list.append(p); } p = my_list.head(); for (int i = 0; i < 4; i++) { cout << p << ' ' << p->val << ' ' << p->successor() << '\n'; p = p->successor(); } p = my_list.head(); while (p) { cout << p->val << '\n'; list_node *q = p->successor(); p->remove(); delete p; p = q; } queue Q; for (int i = 0; i < 4; i++) { p = new list_node(); p->val = i; Q.enqueue(p); } cout << "queue:\n"; while (1) { p = Q.dequeue(); cout << p->val << '\n' << flush; delete p; } } int main() { try { test(); } catch(list_err* e) { cout << e->description << '\n'; } }
0x603010 0 0x603040 0x603040 1 0x603070 0x603070 2 0x6030a0 0x6030a0 3 0 0 1 2 3 queue: 0 1 2 3 attempt to dequeue from empty queue

The code on the right is from the CD. It gives a C++ implementation of a (circular) list of integers. The entire file can be compiled at once with g++.

The basic idea is that a list is composed of list nodes, one of which is the list header. The list is doubly linked and circular with each node having an additional pointer (called head_node) to the list header. When a node is created, its prev, next, and head_node pointers are set to point to itself. A node in this state is called a singleton. Creating a list just consists of creating one node, which becomes the header of the list.

The code in the first frame just includes a standard library and enables referencing two I/O members without using the full :: notation.

The next frame captures error messages. It is actually a little fancier (see below).

Public and Private Members

The third frame is the list_node class. Note that modules, like classes restrict the names accessible by clients. However, unlike modules, classes must take inheritance into account. In C++ there are three levels of accessibility.

Private is the default so the prev, next, and head_node pointer declarations are available only in this class.

In addition a class C1 can specify that specific external subroutines and external classes are friends. Such friends have access to all of C1's data and methods

In this particular class, all the rest is public. The val field holds the (integer) value contained in this node. The list_node() method is important and special. Since it has the same name as the class, it is the constructor for the class and is called automatically whenever an object is instantiated. In this case it sets the three (private) pointer fields to refer to the node itself thereby creating a singleton. Note that this is a reserved word and always refers to the current object.

When first created, a node is a singleton. However, we shall soon see that nodes can be inserted before other nodes thereby making non-trivial lists

The predecessor and successor methods return the appropriate node of such a list. In case the corresponding node doesn't exist (e.g., the node right before the head—remember these will be circular lists—has no successor), the method returns 0. Does the 0 signify the integer before 1, the null list_node* pointer, or the Boolean false? Next is the singleton predicate, which is straightforward.

The next two methods, which insert and remove an element seem, at first glance, to be misplaced. Shouldn't they be in the list class? Looking more closely, we see that the insert procedure really isn't inserting the new node onto a certain list, it is inserting the node before another node (and hence onto the same list as the old node). Similarly the remove method just plays with the predecessor and successor nodes and thereby removes the current node from whatever list it was on. There is error checking to be sure you don't insert a node onto a second list or remove a node not on any list.

Finally, we see the destructor, i.e., the method whose name is the class name prefixed with a ~. This method is automatically invoked when the node is delete'd (similar to free in C) or goes out of scope. The system would reclaim the storage in any case, the destructor is used for error checking and for managing other storage. In this case we check to be sure the node is not on a list (since reclaiming storage for such a node would leave dangling references in the predecessor and successor).

After plowing through the rather fancy node class, the list itself, shown in the next frame, is rather simple. The only data member is the header, which is a list_node and hence automatically initialized to be a singleton. The first method head returns a pointer to the first element on the list (this successor of the header). For an empty list this is a pointer to a the header. The second method append adds the new node to the end of the list by making it the predecessor of the header. The destructor makes sure the list is empty, by ensuring that the header is a singleton.

Tiny Subroutines

It is quite noticeable how small each method is. Although other, more complicated, examples will have larger methods, it is nonetheless true that OOP does encourage small methods since methods are often used in situations where a more monolithic programming style would use direct access to the data items. (This has negative effects from a computer architecture viewpoint, but we will not discuss that point.) For example, it would be more in the OOP style to make the val field private and have trivial public methods get_val and set_val that access and update val. Indeed, C# has extra syntax to make this especially convenient.

Derived Classes

The next frame shows how to derive one class from another. Specifically, we derive a queue class from our list class. This is indicated in the class statement by writing public list. The keyword public makes all the methods and data members of list accessible to clients of queue. Since the append function in the node class adds the new node to the end (tail) of the list it works fine as enqueue. We need dequeue to remove the first entry and, guess what, our list class has a head() method to give us the head entry, which our node class can then remove.

In this example we just added two methods to the base class. It is also possible to redefine existing methods and to decrease the visibility of the components of the base class.

Testing the Class

The next frame is the client code using these classes; it mostly plays around with them. Note that some objects are declared and hence have stack-like lifetime, whereas others are heap allocated. The last frame is the output generated by running the client.

By declaring my_list we obtain the header. Using this list (i.e., the header), we heap allocate some nodes and append them to the list giving them val's of 0, 1, 2, 3. We then print out the values together with the node's predecessors and successors. The final zero in the successor column is the same zero we commented on earlier (the successor of the last node is the header). Then we delete the nodes, again printing out the val's.

Much the same is done for queues. A difference is the fancy exception handling to catch the event of trying to dequeue an empty queue.


#include <iostream>
using std::cout;
using std::flush;
class list_err {
public:
  const char *description;
  list_err(const char *s) {description = s;}
};
class gp_list_node {
  gp_list_node* prev;
  gp_list_node* next;
  gp_list_node* head_node;
public:
  gp_list_node() : prev(this), next(this),
                   head_node(this) {
  }
  gp_list_node* predecessor() {
    if (prev == this || prev == head_node)
      return 0;
    return prev;
  }
  gp_list_node* successor() {
    if (next == this || next == head_node)
      return 0;
    return next;
  }
  bool singleton() {
     return (prev == this);
  }
  void insert_before(gp_list_node* new_node) {
    if (!new_node->singleton()) {
      throw new list_err
        ("attempt to insert listed node");
    }
     prev->next = new_node;
     new_node->prev = prev;
     new_node->next = this;
     prev = new_node;
     new_node->head_node = head_node;
  }
  void remove() {
    if (singleton()) throw new list_err
        ("attempt to remove unlisted node");
    prev->next = next;
    next->prev = prev;
    prev = next = head_node = this;
  }
  ~gp_list_node() {
    if (!singleton()) throw new list_err
        ("attempt to delete listed node");
  }
};
class list {
  gp_list_node head_node;
public:
  int empty() {
    return head_node.singleton();
  }
  gp_list_node* head() {
    return head_node.successor();
  }
  void append(gp_list_node *new_node) {
    head_node.insert_before(new_node);
  }
  ~list() {
    if (!head_node.singleton())
      throw new list_err
        ("attempt to delete non-empty list");
  }
};
class queue : private list {
public:
  using list::empty;
  using list::head;
  void enqueue(gp_list_node* new_node) {
    append(new_node);
  }
  gp_list_node* dequeue() {
    if (empty()) throw new list_err
        ("attempt to dequeue empty queue");
    gp_list_node* p = head();
    p->remove();
    return p;
  }
};

Another Example from the CD

This example generalizes the previous one. We study it not only for the improved generality, but also for the additional C++ features that are used

General-Purpose Base Classes

What if we wanted to have lists/queues of floats? of chars?

One could, as shown on the right start with a generalized_list_node class that includes only the list-like aspects of list-node (basically everything except val).

Note that the constructor for this class looks weird (at least to me, a common occurrence for C++ code). This notation is a fast way of initializing various fields and, for this example, has the same effect as

    gp_list_node() {
      prev = this;
      next = this;
      head_node = this;
    }
  
Below we show an example where the two initialization methods differ.

Classes list and queue are derived from the generalized class and again do not mention val.

Then we derive classes such as int_list_node (shown in the next frame), float-list-node, etc. from generalized_list_node by adding an appropriately typed val.

Alternatively, and perhaps better, after learning about C++ generics (called templates), we would define a list_node<T> class and then instantiate it for whatever type T we need.


class int_list_node : public gp_list_node {
public:
  int val;   // the actual data in a node
  int_list_node() {
    val = 0;
  }
  int_list_node(int v) {
    val = v;
  }
//  // complete rewrite:
//  void remove() {
//    if (!singleton()) {
//      prev->next = next;
//      next->prev = prev;
//      prev = next = head_node = this;
//    }
//  }
  // use existing but catch error:
  void remove() {
    try {
      gp_list_node::remove();
    } catch(list_err*) {
      ;   // do nothing
    }
  }
  int_list_node* predecessor() {
    return (int_list_node*)
           gp_list_node::predecessor();
  }
  int_list_node* successor() {
    return (int_list_node*)
           gp_list_node::successor();
  }
};
void test() {
  list L;
  int_list_node *p;
  for (int i = 0; i < 4; i++) {
    p = new int_list_node(i);
    L.append(p);
  }
  p = (int_list_node*) (L.head());
  for (int i = 0; i < 4; i++) {
    cout << p << ' ' << p->val
         << ' ' << p->successor() << '\n';
    p = (int_list_node*) (p->successor());
  }
  p = (int_list_node*) L.head();
  while (p) {
    cout << p->val << '\n';
    int_list_node *q = (int_list_node*)
                       p->successor();
    p->remove();
    delete p;
    p = q;
  }
  queue Q;
  for (int i = 0; i < 4; i++) {
    p = new int_list_node(i);
    Q.enqueue(p);
  }
  cout << "queue:\n";
  while (1) {
    p = (int_list_node*) Q.dequeue();
    cout << p->val << ' ' << Q.empty()
                   << '\n' << flush;
    delete p;
  }
}
int main() {
  try {
    test();
  } catch(list_err* e) {
    cout << e->description << '\n';
  }
}

Overloaded Constructors

If we look at closely at int_list_node (shown to the right) we see that there are two constructors given: one with no parameters and one with a single int parameter. This feature of C++ permits the client code to construct an int_list_node with either zero or one argument. In this case the code treats the zero argument case as specifying a default value of zero for the one argument case. As you would expect, you can have other number of parameters and can have different constructors with the same number of parameters, but of different types. That is C++ chooses the constructor whose signature matches those in the object instantiation.

Modifying Base Class Methods

Up to now the derived class has inherited the base class in its entirety and simply added additional functionality. However the derived class can override base functionality as we now show.

Recall that list_node (from the previous example), when asked to remove a node that was not on a list (an unlisted node), would throw an error. Suppose this time we prefer to do nothing, that is removing an unlisted node should be a no-op. The next frame shows two implementations. The first solution, which is commented out in the code, is to simply do the removal only for the common case (a listed node). This solution can be criticized for relying on implementation details of list_node.

The second solution is to let list_node signal an error when removing an unlisted node but have int_list_node catch the error and do nothing.

Private Inheritance

In the first example queue inherited all of list (note the keyword public in the class statement. In the second example the inheritance is private and hence the public interface of list is not available to clients of queue. However, queue then explicitly adds empty and head to its namespace so the only name removed is append.

Containers/Collections

These examples illustrate a common occurrence in OOP, the need for one class to be a container (or collection) of another. In the examples we have seen queues and lists are containers of nodes.

Another implementation strategy, which would support heterogeneous lists, would be for lists to have their own nodes that contain a field that is a pointer (or reference) to an object. This approach is used in several standard libraries.

Homework: CYU: 1, 3, 4, 7, 10.

9.2: Encapsulation and Inheritance

Most of the chapter and essentially all of our coverage treats OOP as an extension of the module-as-type framework. This subsection recasts OOP from the module-as-manager framework. We will not cover this material in any detail at all.

9.2.1: Modules

The this Parameter

Making Do without Module Headers

9.2.2: Classes

9.2.3: Nesting (Inner Classes)

Classes can be nested in both Java and C++, which bring up a technicality when the several objects of the nested class are created and the outer class has non-static members.

9.2.4: Type Extensions

9.2.5: Extending without Inheritance

9.3: Initialization and Finalization

As we have seen C++, and most other OO (object-oriented) languages, provide mechanisms, constructors and destructors, to initialize and finalize objects. Formally:

Definition: A constructor is a special class method that is called automatically to initialize an object at the beginning of its lifetime.

Definition: A destructor is a special class method that is called automatically to finalize an object at the end of its lifetime.

We shall address the following four issues related to constructors and destructors.

  1. Choosing a constructor.
  2. References and values.
  3. Execution order.
  4. Garbage Collection.

9.3.1: Choosing a Constructor

Most OO languages permit a class to have multiple constructors. There are two principle methods used to select a specific constructor for an object.

9.3.2: References and Values

Recall that Java uses the reference model for objects. That means that a variable can contain a reference to an object but cannot contain an object as a value. (Java uses the value model for built in types so a variable can contain an int for a value.) Thus, every object must be created explicitly. (This is not true for int's: if x contains the integer 5, y=x; creates another integer without a create operation.) Since every object is created explicitly, a call to the constructor occurs automatically.

C++, like C, uses the value model so a variable can contain an object. (C also has references, but the point is that sometimes a variable contains an object not just a reference to it).

class Point {  // C++
  double x, y;
public:
  Point() : x(0), y(0) {}
  Point(const Point& p) : x(p.x), y(p.y) {}
  Point (double xp, double yp) : x(xp), y(yp) {}
  void move (double dx, double dy) {
    x += dx;  y +=dy;
  }
  virtual void display () { ... }
};

Point p; // Calls #1, the default constructor Point p2(1.,2.) // Calls #3 Point p3(p2); // Calls #2, the copy constructor Point p4 = p2; // Same as above

On the right we show simple C++ class with three different constructors. In the second frame are four object declarations.

  1. The first and simplest declaration includes no arguments. This matches the signature of the first constructor and hence p will have zero x and y coordinates. Again note the syntax in the constructor: the items between the : and the { are field initializers. The constructor with no parameters is called the default constructor
  2. The second declaration has two double arguments and hence matches the signature of the third constructor. This point, p2 will have coordinates (1.,2.).
  3. The third declaration occurs quite frequently. A new object is to be created based on an old object of the same class. This is called the copy constructor and we see that the constructor does indeed make p3 a copy of p2. (I believe a constructor with one parameter of type a reference to the class—i.e., classname&—is called the copy constructor even if the constructor does not make a copy.)
  4. Finally, the four declaration is just a syntactic variation of the third.