Programming Languages

Start Lecture #11

Point *q1, *q2, *q3; // no constructor calls
q1 = new Point(); // calls default constructor
q2 = new Point(1.,2.) // calls #3
q3 = new Point(*p1); // calls copy constructor

The above examples were all stack allocations. To the right we see examples with heap allocation. The first line shows three pointers being declared. Since the *qi are not Point's (they are pointers) there are no constructor invocations. The next three lines are the heap analog of the corresponding lines above and hence the constructors call are the same as above.

9.3.3: Execution Order

Next we must deal with inheritance. Assume we have a derived class ColoredPoint based on Point as shown below on the right. We need to execute the constructors from both the base and derived class and must do so in that order so that when the derived class constructor executes, it is dealing with a fully initialized object of the base class. The client code specifies the appropriate argument(s) to the derived class constructor but not the arguments to the base class (the client is dealing with ColoredPoint's and should not be thinking about constructing a Point).

enum Color {red, blue, green, yellow} ;
class ColoredPoint : public Point {
  Color color;
public:
  ColoredPoint (Color c) : Point(), color(c) {
  }
  ColoredPoint (double x, double y, Color c) :
    Point(x,y), color(c) {
  }
  Color getColor () { return color; }
  void display () { ...}  // now in color
};

Consider the first constructor on the right. Although it has exactly one parameter, it is not a copy constructor since that parameter is not of type ColoredPoint&. Let's examine carefully the funny stuff between the colon and the braces.

The first item is Point(). Point is a class so we are invoking the default constructor for that class, which we recall sets the coordinates to zero. The compiler guarantees that the base-class constructor (Point in this case) is executed before the derived-class constructor (ColoredPoint) begins. Color is a field of the current class (ColoredPoint) so we are simply setting that field to c, the parameter of the constructor we are writing. Since this is all that is needed, there is no code between the braces.

The second constructor has three arguments. The first two are passed to the Point constructor where they become the coordinates of the constructed point. The third again is used to set the color. As before nothing further is needed between the braces.

We mentioned previously that the material between the colon and braces was in simple cases equivalent to placing the obvious analogues between the braces, where you would have expected them. We can now see some differences between putting items between the colon and the braces vs inside the braces. The ones between are done first. Moreover there is a subtle point that applies even without derived classes. To date all the fields of our classes have been simple types like float, or enun or a pointer. Assume instead that a field is another class say C1 and that obj1 is an object of class C1, then placing C1(obj1) between colon and braces results in calling the copy constructor of C1; whereas, placing it between the braces results in the default constructor being called and then an assignment of obj1 over-ridding the default value.

9.3.4: Garbage Collection

When a C++ object is destroyed the destructors for all the classes involved are called in the reverse order the constructors were called, i.e., from most derived all the way back to base.

The primary use of destructors in C++ is for manual storage reclamation. Thus OOP languages like Java and C# that include automatic garbage collection have little use for destructors. Indeed, the programmer cannot tell when garbage collection will occur so a destructor would be a poor idea. Instead, Java and C# offer finalize, which is called just prior to the object being garbage collected, whenever that happens to be. However, finalize is not widely used.

Java Differences

Here we show some differences between C++ and Java

The Meaning of Protected

In Java protected is extended to mean accessible within the class, derived classes, as well as within the packages in which the class and derived classes are declared.

No Private or Protected Derivation

Recall the line class queue : private list { from our early C++ code. Java permits only public extensions. Thus whereas a C++ derivation can decrease, but not increase, the visibility of the base class, a Java derivation can neither decrease nor increase the visibility of the base class.

  class Point {
    private double x,y;
    public Point () { this.x = 0;  this.y = 0; }
    public Point (double x, double y) { this.x = x;  this.y = y; }
    public void move (double dx, double dy) { x += dx;  y +=dy;  }
    public void display () { ... }
  }

class ColoredPoint extends Point { private Color color; public ColoredPoint (double x, double y, Color c) { super (x,y); color = c; } public ColoredPoint (Color c) { super (0.0, 0.0); color = c; public Color getColor() { return color; } public void display () { ...} // now in color }
Point p1 = new Point(); Point p2 = new Point(1.5, 7.8); Point p3 = p2; // no constructor called

The Point and Colored Point Classes

On the right we see the java equivalent of the Point and ColoredPoint classes. The syntax is similar, but as we see there are differences. Most of these differences are small: the base class is called super, each item is tagged public or private, extends is used not a :, etc.

However, the last line reveals a real difference. In C++ that line would invoke the copy constructor and p3 would be a new point (independent of p2) that happens to be initialized to the current value of p2. In Java we do NOT have a new point. This is a consequence of Java having reference not value semantics for objects. The result is that p3 is just another reference to the point that p2 refers to. Changing the point (not just the reference) changes the point each of them refers to.

I believe this last difference is important enough to explain with another example of C++ and Java code for a Point class.


public class Point {
    public int x, y;
    public Point() {this.x=0; this.y=0;}
    public static void main (String[] args) {
	Point pt1 = new Point();
	Point pt2 = new Point();
	Point p = pt1;
	pt2 = pt1;
	System.out.println ("pt1.x is " + pt1.x);
	System.out.println ("pt2.x is " + pt2.x);
	System.out.println ("p.x is " + p.x);
	pt1.x = 1;
	System.out.println ("pt1.x is " + pt1.x);
	System.out.println ("pt2.x is " + pt2.x);
	System.out.println ("p.x is " + p.x);
    }
}

pt1.x is 0 pt2.x is 0 p.x is 0 pt1.x is 1 pt2.x is 1 p.x is 1

Value vs Reference Semantics; Shallow vs Deep Copies

Remark: We did this section last time. But review it since it was tricky and surprising.

Java: The Java code on the right declares a very simple Point class. I has just 2 data fields, x and y both integers. The default constructor simply sets the two fields to zero.

The only method is main which begins execution. It instantiates two points pt1 and pt2 and declares another p. The declared point is then set equal to p1. As mentioned above, Java has reference semantics for points so all three variables are reference to points.

Two points have been created and are referred to as p1 and p2. Then p is declared and set to refer to the first point. Both points have zero x components. I then assigned pt1 to pt2. This has consequences! Due to reference semantics, pt2 and pt1 now refer to the same point. There are no remaining references to the point previously referred to by pt2 and hence this data can be garbage collected. Thus we have only one actual point with three references to it. Sure enough, changing the x component of pt1 changes the x component of all three references.

val-ref shallow-deep java
#include <stdio.h>
#include <stdlib.h>
class Point {
public:
  int x, y, w[2], *z;
  Point() : x(0), y(0) {
    w[0] = w[1] = 3;
    z = new int[2];
    z[1] = 9;
  }
};
int main(int argc, char *argv[]) {
  Point pt1, pt2;
  pt2 = pt1;
  printf("pt1: %d %d %d\n",pt1.x,pt1.w[1],pt1.z[1]);
  printf("pt2: %d %d %d\n",pt2.x,pt2.w[1],pt2.z[1]);
  pt1.x = 1;  pt1.w[1] = 5;  pt1.z[1] = 6;
  printf("pt1: %d %d %d\n",pt1.x,pt1.w[1],pt1.z[1]);
  printf("pt2: %d %d %d\n",pt2.x,pt2.w[1],pt2.z[1]);
  return 0;
}
  
pt1: 0 3 9 pt2: 0 3 9 pt1: 1 5 6 pt2: 0 3 6

C++: The C++ code illustrates both value semantics and shallow copies. To see the value semantics, just concentrate on the x field in the Point class. As in the Java code x is initialized to zero by the default constructor. The two points are simply declared, but with value semantics, this declaration creates points (via the default constructor). Assigning one to the other copies the point not the reference. Hence we still have two points each referenced by one variable and, therefore, changing the x component of one point does not affect the other.

Note the addition of w and z to the class Point. Each is in a sense an array of two integers (e.g., I print each using array notation). However z is heap allocated and the assignment of pt2 to pt1 results in only a shallow copy. That is only the pointer is copied not the corresponding integers. Thus changing pt1.z[1] changes the corresponding field of pt2 as well. Also, the heap space allocated in creating pt2.z is now unreferenced and hence inaccessible. Thus, we have leaked memory.

val-ref shallow-deep cpp

Homework: CYU 23, 26, 30 (substitute Java for Eiffel).

9.4: Dynamic Method Binding

Referring back to the Point/ColoredPoint pair of classes, we see that a ColoredPoint has all the properties of a Point (plus more). So any place where a Point would be allowed, we should be allowed to use a ColoredPoint.

Recalling Ada type terminology, the derived class acts like a subclass of the base class.

Definition: (Presumably) for this reason, the ability of a derived-class object to be used where a base-class object is expected, is called subtype polymorphism.

ColoredPoint *cp1 =
  new ColoredPoint (2.,3.,red);
Point *p1 = cp1;
p1->display();

Consider the code on the right. We create a ColoredPoint pointed to by cp1 and declare a Point pointer (sorry for the name) p1. We initialize the second pointer using the first one. This looks like a type mismatch, but is ok by subtype polymorphism. The question is, Which display() method is invoked?.

If the answer is Point.display() we have static method binding. This seems to be the right answer since after all the type of p1 is pointer to Point.

If the answer is ColoredPoint.display() we have dynamic method binding. This seems to be the right answer since after all the object pointed to by p1 is a ColoredPoint.

Semantics and Performance

Dynamic method binding does seem to have the preferred semantics. The principal argument in its favor is that if the base-class method is invoked on a derived object, it might not keep the derived object consistent.

Performance considerations favor static method binding. We shall see that dynamic binding requires each object to contain an extra pointer and also requires an additional pointer dereference when calling a method.

Language Choices

Smalltalk, Objective-C, Modula-3, Python, and Ruby use dynamic binding for all methods. Java and Eiffel use dynamic method binding by default. This can be overridden with final in Java or frozen in Eiffel.

Simula, C++, C# and Ada 95 use static method binding by default, but permit dynamic method binding to be specified (see below).

9.4.1: Virtual and Nonvirtual Methods

In Simula, C++, and C# a base class declares as virtual any method for which dynamic dispatch is to be used. For example the move method of the Point class would be written

    virtual void move (double dx, double dy) { x += dx;  y += dy; }
  

We shall see below how a vtable is used to implement virtual functions with only a small performance penalty.

Ada 95 uses a different mechanism for virtual functions that we shall not study at this time.

class DrawableObject {
public:
  virtual void draw() = 0;
};

abstract class DrawableObject { public abstract void draw(); }

9.4.2: Abstract Classes

In most OO languages, it is possible to omit the body of a virtual method in the base class, requiring that it be overridden in derived classes. A method having no body is normally called an abstract method (C++ terms it pure virtual) and a class with one or more abstract methods is called an abstract class.

As shown on the right, C++ indicates an abstract method by setting it equal to zero; Java (in my view more reasonably) labels both the method and class abstract.

Naturally, no objects can be declared to be of an abstract class since at least one of its methods is missing.

The purpose of an abstract class is to form a base for other concrete classes. Abstract class are useful for defining the API when the implementation is unknown or must be hidden completely.

Classes all of whose members are abstract methods are called interfaces in Java, C#, and Ada 2005. Note that interfaces by definition contain no data fields.

9.4.3: Member Lookup

How can we implement dynamic method dispatch? That is, how can we arrange that the method invoked depends on the class of the object and not on the type of the variable?

The method to call at each point cannot be determined by the compiler since it is easy to construct an in which the class of the object referred at a point in the code depends on the control flow up to that point. Thus some run-time calculation will be needed and the method of choice is as follows. During execution a virtual method table (vtable) is established for any class with one or more virtual methods; the table contains a pointer for each of the virtual methods declared.

class B { // B for base
  int a;
public:
  virtual void f() {...}
  int          g() {...}
  virtual void h() {...}
  virtual void j() {...}
} b;
class D : public B {
  int w;
public:
          void f() {...}
  virtual void h() {...}
  virtual void z() {...}
} d;
  
vtable

When a class is derived from this base class the derived vtable starts with the base vtable and then

When an object is created it contains a pointer to the vtable of its class. When a virtual method invocation is called for, we follow the object's pointer to the class vtable and then follow the appropriate pointer in the vtable to the correct method.

Converting Between Base and Derived Classes

#include <stdio.h>
class B {
public:
    int a;
    virtual void f() {printf("f()\n");}
} bg, *pbg;

class D : public B { public: int y,w; } dg, *pdg;
int main (int argc, char *argv[]) { B bl, *pbl; D dl, *pdl; printf ("main\n"); bg = dg; // bg = dynamic_cast<B>(dg); not for objects bg = (B)dg;
// dg = bg; type error // dg = dynamic_cast<D>(bg); not for objects // dg = (D)bg; type error
pbg = &bg; pdg = &dg; pbg = pdg; pbg->a =1; pbg = dynamic_cast<B*>(pdg); pbg->a =1; pbg = (B*)pdg; pbg->a =1;
pbg = &bg; pdg = &dg; // pdg = pbg; error caught at runtime pdg = dynamic_cast<D*>(pbg); // pdg->y=1; seg fault pdg = (D*)pbg; pdg->y=1; // works pdg->w=1; // works presumably lucky return 0; };

C++: On the right we see some C++ code that uses different methods to convert between a base class B and a derived class D. There are naming conventions to keep straight the properties of all the variables.

Some of the lines are commented out. These generated compile or run time errors as stated in the comment. The version as shown compiles and runs.

The first set of attempted conversions tries to assign dg to bg. Recall that the first is a global variable containing an object of the derived class and the second is a global variable containing an object of the base class. This is supposed to work (an object of the derived class can be used where a base object is expected ) and indeed it does either naked, as in the first line, or with an explicit cast as in the third. The second line is an erroneous attempt to use for objects a feature designed for pointers (see below).

Next we try the reverse assignment, which fails. As expected, we cannot use a base object where a derived object is expected. Recall that the derived class typically extends the base so contains more information.

Now we move from objects to pointers. The first two lines initialize the pointers to appropriate objects for their types. Setting the base pointer to the derived pointer is legal: Since the derived object can be considered a base object, the base pointer can point to it. Now the dynamic_cast is being used correctly. It checks at run time if casting the pointer is type correct. If so it performs the cast if not it converts the pointer to null (which is type correct, but often leads to segmentation errors, see below). We also do the sometimes risky nonconverting type cast, which here is type correct.

Finally, we reinitialize the base and derived pointers to point to base and derived objects and then try the invalid assignment of the base pointer to the derived pointer, which would result in the derived pointer pointing to a base object. A naked assignment is caught at runtime by the system. The dynamic cast does its thing and returns the null pointer, which results in a segmentation fault when the pointer is used. The nonconverting type cast works, but the result is that the derived pointer points to a base object, which has only one data component. Surprisingly, assigning to the second data component does not cause a segmentation fault.

public class B {
    public int a;
    public void f() {System.out.println("f()");}
}

public class D extends B { public int y,w; public void g() {System.out.println("g()");} }
public class M { public static void main (String[] args) { B bg = new B(); D dg = new D(); System.out.println("main"); bg = dg; bg = (B)dg;
bg = new B(); dg = new D(); // dg = bg; type error // dg = (D)bg; will not convert
bg = new D(); // dg = bg; type error dg = (D)bg; } }

Java: On the right we see a Java example. Although shown here as one listing, each public class is actually a separate .java file. Java does not have C++ (really C) pointers so we don't use the variables beginning with p as we did with the C++ example above. But remember that, for objects, Java uses reference not value semantics so all the variables are in a sense pointers; in particular, an assignment statement changes the pointer not the object.

The first group of statements shows that, as with C++, we can use a derived object where a base is expected; the explicit cast is optional.

In the second group we try to use a base class where a derived class is expected. Note that is this group the base variable refers to a base object and the derived variable refers to a derived object (unlike the next group). The first line is a naked assignment and results in a compile time type error. In the second we employ a cast, which does compile (see the next group), but then generates a run time error since a base object cannot be viewed as a derived object.

The last group is perhaps the most interesting. As before the derived variable refers to a derived object; however, the base variable refers to a derived not base object. This is perfectly legal, for example, this is the state that occurred after the first group of statements was executed. It is still true that we cannot assign the base object to a derived variable (type error) with a naked assignment. However, the cast works because the error found in the second group was via a dynamic, i.e., run time, check. This time bg does refer (i.e., point) to a derived object and hence the type can be converted and the assignment made.

More C++ vs Java: Terminology


JavaC++


MethodVirtual member function
Public/protected/privateSimilar
Static MembersSame
Abstract methodsPure virtual member functions
InterfacePure virtual class with no data members
Interface implementationinheritance from an abstract class
fun mkAdder addend = (fn arg => arg+addend);

val add10 = mkAdder 10;
add10 25
class Adder { int addend; public: Adder (int i) : addend(i) {} int operator() (int arg) { return arg+addend;} };
int main (int argc, char *argv[]) { Adder f = Adder(10); printf("f(25)=%d\n",f(25)); return 0; }

Objects vs First-Class Functions

Using an Object to Produce a First-Class Function Here is a clunky implementation of a simple first-class function via an object. The ML function mkAdder returns a function that adds the given addend. In the second frame we use mkAdder to produce add10, which we use in the third frame (it gives an answer of 35).

In the next frame is a C++ class that very cleverly does the same thing. It is tested in the next frame and the answer is again 35.


class Account { // Java
  private float theBalance;
  private float theRate;
  Account (float b, float r) {theBalance=b; theRate=r;}
  public void deposit (float x) {
    theBalance = theBalance + x;
  }
  public void compound () {
    theBalance = theBalance * (1.0 + rate);
  }
  public float balance () { return theBalance; }
}

(define Account (lambda (b r) (let ((theBalance b) (theRate r)) (lambda (method) (cond ((eq? method 'deposit) (lambda (x) (set! theBalance (+ theBalance x)))) ((eq? method 'compound) (set! theBalance (* theBalance (+ 1.0 theRate)))) ((eq? method 'balance) theBalance))))))

Using a First-Class Function to Produce an Object This time we are given a Java class that maintains a bank balance and has methods for making a deposit, applying interest, and retrieving the current balance. When you create an instance of the class you give it an initial balance and an interest rate to be applied whenever you invoke the compound method (the name is chosen to suggest compound interest).

The frame below gives a Scheme implementation using a function Account that produces another function as a result. It also uses two bangs.

Rather than give two more frames with the usage and answers, I copied the file over to access.cims.nyu.edu so we can hopefully see it work here.


9.4.4: Polymorphism

We will learn generics soon and have seen hints already. The idea is that you give a type variable as a parameter of a generic and then instantiate the generic for various specific types. This is some times called explicit parametric polymorphism as opposed to the subtype polymorphism offered by inheritance.

Thus generics are useful for abstracting over unrelated types, which is something inheritance does not support.

The Circle and the Ellipse: Subtype Polymorphism Gone Wrong

Every circle is an ellipse so it makes sense to derive a Circle class from an Ellipse class and, by subtype polymorphism, to permit a circle to be supplied when an ellipse is expected.

But this doesn't always work. A reasonable method to have in the ellipse derived type is to enlarge the ellipse by stretching it in two directions, parallel and perpendicular to its major axis. But if the two expansion coefficients are different a circle would not remain a circle.

Homework: CYU 31, 32, 36, 37, 38.

9.5: Multiple Inheritance

9.6: Object-Oriented Programming Revisited

An interesting read; I recommend you do so.

9.A: Effective C++ (Scott Meyers)

class String {
private:
  char *data;
public:
  String(const char *value) {
    if (value) {
      data = new char[strlen(value) + 1];
      strcpy(data, value);
    }
    else {
      data = new char[1];
      *data = '\0';
    }
  }
  ~String() { delete [] data; \}
  ... // no copy constructor or operator=
};

String a("Hello"); { // introduces a local scope String b("World"); b = a; } String c = a; // Same as String c(a);

Meyers wrote two books (Effective C++ and More Effective C++) that offer suggestions on a good C++ style. I just show a very few, using material based on Prof Barrett's notes.

Constructors, Destructors, and Assignment Operators

Item 11

Meyer's item 11 states

Declare a copy constructor and an assighment operator for classes with dynamically allocated memory.

We saw an example previously, where the shallow copy performed by the C++ default copy constructor, fails to copy dynamic memory. It copies instead the pointer to the memory.

Look at the code on the right and notice that

What is needed is a deep copy constructor that copies the string as well as the pointer. This is emphasized in lab 3.

#include <stdlib.h.>
class Vector {
private:
    int size;
    int *A;
public:
    Vector(int s) : size(s) {A = new int[size];}
};

class Array { private: Vector data; // another class int size; int lBd, uBd; public: Array(int low, int high) : lBd(low), uBd(high), size(high-low+1), data(size) {} };

Item 13

Meyer's item 13 states

List members in an initialization list in the order in which they are declared.

The code on the right is flawed. The constructor for data will be passed an undefined value because size has not yet been initialized, even though it looks as though it has.

The reason is that members are initialized in the order they are declared, not in the order they are listed in the constructor.

So, to avoid confusion, when using an initialization list, you should always list members in the order in which they are declared.


int w, x, y, z;
w = x = y = z = 0

C& operator=(const C& rhs) { ... return *this; }

Item 15

Meyer's item 15 states

Have operator= return a reference to *this.

The first frame is quite common in C and C++. So you want to permit it for your classes, which is easy to do. Just make sure that your operator= function returns *this.

Item 16

Meyer's item 16 states

Assign to all data members in operator=.

If you don't have an operator=, C++ will do a shallow assignment, which is fine for simple data like integers but not for heap allocated data. This is the reason you have operator=. But, once you write operator=, C++ does nothing not even the shallow assignment for items you don't mention. So be sure to assign them all.

If this is a derived class, be sure to explicitly call the base class operator= (this applies to the copy constructor as well).

class My_Array {
  int * array;
  int count;
public:
  My_Array & operator = (const My_Array & other) {
    if (this != &other) // protect against invalid self-assignment {
      // 1: allocate new memory and copy the elements
      int *new_array = new int[other.count];
      std::copy(other.array, other.array + other.count, new_array);

      // 2: deallocate old memory
      delete [] array;

      // 3: assign the new memory to the object
      array = new_array;
      count = other.count;
    }
    // by convention, always return *this
    return *this;
  }
    ...
};

Item 17

Meyer's item 17 states

Check for assignment to self in operator=.

Since operator= will delete the old contents of the left hand side, you must be check for the case where the client code is simply x=x;.

Wikipedia gives a good fairly generic code example, which I copied on the right. Note that it makes sure to return *this

The principle followed is to proceed in this order