Object-Oriented Programming

CSCI-UA.0470-001

NYU, Undergraduate Division, Computer Science Course - Fall 2013

OOP Final Exam Study Guide (Fall 2013)

Contributors

Felicity Lu-Hill: "Definition of Object Oriented Programming" through "Version Control Systems (VCS) and Build System"

Jason Forsyth: "Inheritance" through "Implementation of Inheritance and Virtual Methods By Hand"

Andrew Gaffney: "Object Oriented Design As Seen Through Star Trek" through "Traversing Graphs and the Visitor Design Pattern"

Jacob Gofman: "Difference Between Method Overloading and Method Overriding" through "Operator Overloading"

Chen Tian: "What is a Smart Pointer?" through "C++ Casting"

Definition of Object Oriented Programming

  • The structure of data and their encapsulation with the procedure
  • Models real-world objects
    • Greater flexibility
    • Good for large-scale software engineering

Introduction to Java

  • Java code progression: .java (goes into)>> javac (creates a)>> .class (goes into)>> java (returns)>> programming execution
    • .java is the source files in java
    • .class is the java byte code, not compatible with processors
    • Does not use registers, stack based per invocation
    • Compiled in a “just in time” manner into standard binary by Java VM
  • Java Virtual Machine (JVM)
    • JVM does not have registers, it is a stack-based architecture
    • The Java VM figures out referenced code in a .class file.
    • It links/loads libraries dynamically, meaning it only loads class files that are referenced from classes that are running.
    • It finds the class files through the class path (directories that contain the class that you’re referencing).
    • A JAR (Java Archive) file is functionally a zip file.
  • Just-in-time compiler
    • Java compiler compiles source code into byte code.
    • Byte code is an instruction set the Java language designers invented for their virtual machine (JVM).
    • It’s called “JIT” because it translates byte code into machine code just before it is needed.
  • Scope
    • A scope is the area of visibility for variables.
    • Static scopes are all the variables and methods of a class visible within the class. The scope of a method is within the method.
  • Java inheritance error hierarchy, i.e.:
    • (everything that is an error is a throwable) Object<=Throwable<=Exception<=RuntimeException<=IndexOutOfBoundsException<=ArrayIndexOutOfBoundsException
  • Java Exceptions:
    • Exceptions signal coding error or things malfunctioning at runtime; need ‘try{};’ or ‘catch{}’ cases to resolve.
    • ‘try{};’ and ‘catch{};’ also hide implementation details.
  • Programming Notes:
    • Get a small scale version of the program working first before trying to add things.
    • Use descriptive names, important for others to be able to use.
    • Interface:
      • Hide any and all implementation details
      • Try to make cohesive –all methods related to a single abstraction
      • Should be complete –supports all operations that are part of the abstraction that the class represents
      • Should also be convenient –not just complete but easy to use
      • Should have clarity –clear to whoever is using it
      • Should be consistent –operations in a class should be consistent with each other with respect to names, parameters and return values, and behavior
    • Java handles variables declared inside loops in such a way that there is no performance cost.
    • Don’t use hardcoded values, such as defined number of iterations for for loops, define a “public static final” value to store the number.
    • Don’t add strings in loops because it produces a new string each time making it very inefficient. However, can add them outside loops because compiler is smarter.

C++ Program Compilation and Linking

  • C++ implementation files are named .cc or .cpp
  • Header files
    • Contains declarations of classes, methods, functions, globals, and constants
    • Named .h
  • Compilation process and static linking
    • .h/.c => cpp => g++ => a.out
    • The preprocessor combines and runs through implementation and header files
    • The resulting file is compiled into an .o file
    • The .o may by linked to other precompiled libraries
  • C++ Style, Structure, and Syntax
    • C++ coders tend to be terse
      • Names are usually lowercase, separated by underscores
    • Code Organization
      • Classes are organized into namespaces
      • Namespaces are delimited with :: i.e. std::string references the string class from the standard library
    • C++ differs from Java
      • Folder structure doesn’t matter. In Java, the class org.example.Foo would be located in org/example/Foo.class
      • Namespaces, use :: instead of .
      • Visibility control differ from Java’s
        • We can’t specify that classes are public or private
        • We can make class member public or private.
        • However, if not specifed access control is private
        • Specifiers are “private:” and “public:” and apply to all members following them until the next specifier
      • Method dispatch is non-virtual
        • Use keyword “virtual” to make dynamic
          • But adds 8 bytes to each object
          • Starting with highest superclass that has a virtual method
    • Immutability
      • Use “const” in parameter for const arguments that will not be modified in function
      • When a method does not modify the object it contains, the method should be marked “const” after the parameter list
      • If the function tries to change the referenced object, “const” prevents the program from compiling.
      • In C++, the integrity of a class versus its implementation is made explicit.
      • The contract says that certain methods will not modify the instance. “const” says that in the implementation you must not modify the internal state of this instance.
    • Arrays
      • Arrays and their exact sizes can be specified in a class definition
      • Arrays of a constant size are inlined in the object, and are not objects in and of themselves, unlike arrays in Java.
    • Pointers
      • Pointers are denoted with “ * ”
      • These behave like C pointers
    • In C++ in-line eliminates function call overhead
    • Chars can be signed or unsigned, must be specific
    • The field declaration order dictates the order in which an object is initialized
    • Public inheritance in C++ is like Java inheritance
  • References (the “&” symbol)
    • In implementation, they are like pointers,
    • However, cannot refer to NULL, (pointers can be NULL)
    • They also do not support pointer arithmetic
    • References allow passing objects “Java-style,” instead of by value
    • Calling a method:
      • T v.m() – on a values
      • T& v.m() – on a reference
      • T* v->m() –on a pointer
  • The “this” keyword gives a pointer to the containing object, not a reference, so must use “->”
  • If you do not add an “&”, C++ will pass the object by value (copy it), dangerous because objects can be very large.
  • Classes
    • Classes and their members are declared in the header
    • The implementation of the members goes in the .cc (or .cpp) and the name is declared with the prefix of the class name, i.e.: Classname::methodName
    • As in Java, classes have a constructor that shares their name
    • If an instance is declared without being explicitly initialized, the constructor is automatically called with no arguments.
    • If an instance of a class is initialized outside of the main program, its constructor is called before the “main” method starts
    • Must end class declarations in semicolons.
  • The Preprocessor and Directives
    • The preprocessor modifies the text of a program before it’s compiled
    • Instructions to the preprocessor, or directives, begin each line with a ‘#’ symbol.
    • Preprocessor directive lines do not end in semicolons
    • The “#define” directive is used to declare named constants
    • The “#include” directive includes the text of another file in the file that is being processed. This causes that file to be processed in turn
      • #include <foo> looks in the library header file path for the C++ header foo
      • #include <foo.h> looks in the library header path for the C header foo.h
      • #include "foo" looks in the user's path
      • Circular #includes may lead to an infinite recursion –in this case, the compiler will stop at some point and print an error
      • To prevent these loops we have conditional directives
        • “#ifndef <name>” will include everything following it until ‘#endif’ if no constant “<name>” is defined
      • “#pragma” is used to denote compiler specific directives
  • C++ does not implement bounds checking for arrays. The programmer must implement bounds checking.
  • Errors for bounds checking can be thrown with the keyword “throw”.
  • C++ does not check that a value was actually returned, i.e. “int main()” (the entry point) will compile even if you don't return a value. C++ will leave whatever happens to be in EAX as the return code.
  • “If(variable = 0)” will compile and will always be true, even though it was likely to mean “if (variable == 0),” which would not always be true. To avoid these errors, put the value on the left: “if (0 == variable)” –‘defensive ifs’
  • C++ supports operator overloading, so classes can define how symbols in the language work on them.
    • One result of this are the “<<” and “>>” stream operators, which are used in “cin”, “cout” and other stream objects like “std::ostringstream” (which is used in Point.cc as a string builder) to control input and output of a stream

Object Oriented Programming Design

  • Objects are entities in a computer program that have three characteristic properties:
    • State
    • Behavior
    • Identity
  • The collection of all information held by an object is the object’s state.
  • The behavior of an object is defined by the operations that the object supports.
  • Each object has its own identity, i.e. two different classes may have the same contents, yet the program can tell them apart.
  • Programming By Contract:
    • Precondition: the condition that must be true before the service provider promises to do its part of the bargain
    • Postcondition: the promise that every operation does the ‘right thing’ provided the preconditions are fulfilled
    • Invariants: conditions that are true before and after a program is executed (but can be violated within the method call)
  • Users should be unaware of implementation details
  • Crc cards and UML diagrams
    • Training wheels for OOP design
    • Sub-note: We do not have to draw a UML diagram for the final unless Professor Grimm is in an ‘evil mood’

Version Control Systems (VCS) and Build System

  • Version Control Systems (VCS)
    • VCS is a repository that keeps files and history of files
      • So to efficiently down track revision history of your source code.
    • VCS can also be used to have backups for your source code and data.
    • However, it may not work perfectly:
      • Some problems can rise from multiple people working simultaneously --determining the authoritative version can be problematic.
    • There are two types of VCS. You can choose from the following two:
      • Client-server (CVS and SVN)
      • Distributed (git)
  • Build System aka "make"
    • We use "make" to simplify many standard procedures.
    • "make" will compile any updated source.

Inheritance

  • Common problem: Adding new features to a pre-existing class
    • Class example: creating ColorPoint after writing Point
  • There are naive approaches:
    • Add the new features to the existing class
      • Even when you don't need the new features, they are still there, adding overhead and making the code more complex
    • Copy/paste the existing code to a new class
      • May copy bugs from one class to another.
      • Any updates/improvements/bugfixes made to the existing class at a later date must also be copied. More difficult to maintain.
  • Inheritance is a better approach
    • Feature of object-oriented languages that lets you speciy that one class is a subclass of another.
      • e.g. ColorPoint is a subclass of Point
    • The subclass inherits all members from the superclass. It may add new members or redefine existing ones.
    • There is an "is a" relationship between the subclass and the superclass. (in constrast to a "has-a" relationship)
      • e.g. A ColorPoint "is a" Point. But a Point is not a ColorPoint.
    • Is a square a rectangle? Depends on who you ask.
      • In geometry, all squares are rectangles. This would imply that the answer is "yes".
      • But a rectangle is likely implemented with both a width and a length field. If a Square class inherits from Rectangle, it gets both. Do we really want squares to have both fields?
      • This highlights an important distinction: "super" and "sub" refer to the sets of all classes that are of a given type. That is, the set of all classes that "are a" Point is a superset of the set of all classes that "are a" ColorPoint. So we call Point the superclass. But the set of members (fields and methods) of Point is a subset of the set of members of ColorPoint. The reason that there is controversy with the square/rectangle problem is that the set of all squares is a subset of the set of all rectangles, but we would like to implement Square with a subset of the members of Rectangle.
    • The subclass has the full public interface as its superclass, and may optionally add more public methods/fields. Despite maintaining this external interface, it might behave differently. This concept is called polymorphism.

Java Inheritance

  • All classes inherit from java.lang.Object. It is the root of the class hierarchy tree.
  • Inheritance syntax: class Subclass extends Superclass { ... }
    • It is unnecessary to explicitly extend java.lang.Object.
  • Since every subclass "is a" superclass, you can use every subclass wherever a superclass is expected. By definition, the subclass has all the public fields and methods as the superclass, so it is known at compile-time that there will be no problem.
    • Java will implicitly cast a subclass to a superclass for assignments, when passing the subclass as a method argument, when returning a subclass as a method return value, etc.

C++ Inheritance

  • There is no equivalent to java.lang.Object in C++. Any inheritance must be explicitly declared.
  • Inheritance syntax: class Subclass : public Superclass { ... };
  • Initialization syntax, from the ColorPoint example in class:
       ColorPoint::ColorPoint(Color color, double c1, double c2, double c3, double c4)   : Point(c1, c2, c3, c4), color(color) {   }
    • Note how the fields of the superclass, Point, are initialized.

Static vs Dynamic Types

  • Java and C++ are both statically-typed languages. Programmers declare types of variables, method parameters, method return values, etc. At compile-time, the compiler determines the static types of all expressions.
  • Due to inheritance, the actual type of an object when the program is running may differ from its static type. The actual type of the object is called the dynamic type or the runtime type.
  • Example:
  •    Object o; // Variable declared with the static type of Object.   o = new Object(); // "o" is a reference to an instance of Object. Static type matches dynamic type   o = new String(); // Now, "o" is a reference to an instance of String. Static type does not match dynamic type.
  • The static type is fixed due to language type rules (you cannot redeclare variables in Java or C++ with different types), but the dynamic type may change during runtime.

Virtual Method Dispatch

  • Understanding the difference between static and dynamic types is necessary to understand the reason for virual method dispatch
  • In languages with inheritance, such as Java and C++, a language with a static type X may, at runtime, have the dynamic type of X or any subclass of X
  • If a subclass overrides a method of X, it is often desireable to call the newly defined method in all scenarios where you are dealing with the subclass. This includes all scenarios when its static type is different than its dynamtic type
  • It's not possible to determine the correct method at compile-time, because the dynamic type is not known and may change during the execution of the program. So there needs to be some way, at runtime, to choose the correct overriden method. This is called virtual method dispatch, and a method that can be overriden is called a virtual method.
  • Java virtual methods:
    • All methods that are not static or private are virtual. There is no way (or reason to) explicitly declare a method as virtual.
    • Virtual method dispatch is a feature of the JVM. Virtual method calls made in Java are compiled into the bytecode instruction, "invokevirtual."
  • C++ virtual methods:
    • Methods are not virtual by default, as in Java. Static types are used to choose the method to call, by default.
    • Methods must be declared as virtual explicitly.
    • Virtual method dispatch is implemented by the compiler by adding a pointer to each class's data layout (called the "vptr"), which points at a table of virtual methods (called the "vtable").
    • Adding a virtual method to a class therefore adds some memory overhead to each instance of it and its subclasses: there needs to be space to store the vptr, along with whatever padding the compiler wants to add.
    • Virtual method dispatch only work with pointers or references to objects. Since subclasses may have more fields, instances of a subclass cannot be passsed by value where an instance of the superclass is expected. Therefore, virtual method dispatch makes no sense for values.

Implementation of Inheritance and Virtual Methods By Hand

  • Subclasses should have all the fields of their superclass, and may define additional fields. So where should those additional fields be stored in memory?
    • Below the memory block already allocated for the superclass.
  • Having a consistent data layout scheme is critical to implementing inheritance.
    • Referencing a member of a struct is equivalent to accessing a specific memory offset.
    • If a field inherited by a subclass does not exist at the same offset in its data layout as in the data layout of the superclass, then the subclass cannot be used in place of the superclass. Both the superclass and subclass need to store the data at the same offset.
  • We cannot pass object instances by value, because the size of its data layout is unknown. Instead, we can store them on the heap and pass them by reference.
    • In Java, all objects are allocated on the heap. There is no way to create them on the stack or pass them by value.
    • In C++, passing by value is possible. However, for the sake of the translation, we will create them on the heap and pass them as pointers
      • new ClassName() : This create a new instance of ClassName on the heap and returns a pointer to the newly-allocated memory
  • To implement virtual methods, every class has a vptr as the first member of its data layout that points to a virtual method table (vtable)
  • Virtual method table implementation:
    • There only needs to be one vtable for each class. All vptrs of a given class point to the same table
    • The vtable contains pointers to methods
    • Just as with the data layout for each class structure, the vtable layout for a subclass must be consistent with the layout of the vtable for its superclass.
    • When a method is overriden in a subclass, the pointer in the subclass's vtable at the method's offset is changed to point to the new definition
    • When a new virtual method is declared in a subclass, a new vtable entry is appended to the subclass's vtable
  • Making virtual method calls
    • Instead of calling methods directly, we must use the vptr to access the vtable and dereference the correct method.
    • Additionally, there is no concept of an instance method at the assembly/machine code level. We cannot use instance methods in our translation.
      • The ability to access instance members from within the body of an instance method is implemented by passing the instance as an argument to that method.
      • In our translation scheme, we implement instance methods in Java as static methods that accept an "implicit this" as their first parameter
    • So, in implementing virtual methods manually, we must a) indirectly call methods through the vptr, and b) add the implicit this as the first argument to every instance method call
    • Example translation of a virtual method call from Java to C++:
      Java
         Object o = new Object();   String s = o.toString();
      C++
         Object o = new __Object();   String s = o->__vptr->toString(o);
  • Purpose and implementation of the Class class
    • In Java, every class has a corresponding instance of the Class class, that is accessible using the getClass() method.
    • Class inherits from Object, as usual.
    • It contains information about a class, such as its superclass, its name, and whether its an array or a primitive (we support these in our implementation of Class). In Java, Class objects also offer reflection information about the class it represents. For example, you can retrieve a list of fields or methods declared by that class. (We do not support this in our translator).
    • The fact that every Class instance has a reference to its superclass's Class instance allows the class hierarchy tree to be determined at runtime. This is necessary to determine whether one object is an instance of another, which is required to accurately translate Java's instanceof operator, and to throw ClassCastException's and ArrayStoreException's appropriately.
    • It would be problematic to implement a class's Class instance as a static field, because the constructor for Class requires the Class instance of its superclass. In C++, the order in which static fields are initialized is undefined (this is referred to as the "static initialization order fiasco"), so this kind of dependency would not work.
    • Instead, each class is given a static __class() method that creates its corresponding Class instance, and declares it as having a static storage duration (this is possible in C++, but not Java). In doing so, the static initialization order fiasco is avoided, but its still ensured that only a single Class instance is created for each class.

Object Oriented Design As Seen Through Star Trek

  • If each “Human is a Person, Klingon is a Person and Ferengi is a Person” is true how does the is a relationship work in Java?
  • Each Human/Klingon/Ferengi is a class
    • Human/Klingon/Ferengi is a subclass of Person
  • Next if all humans are bland, all Klingons are ferocious and all Ferengis are mercenary, how do we implement this?
  • Through a String getDescription method in Person, it would be abstract and would be overridden in each subclass. We can't create an instance of Person so it should be abstract as well.
  • What was wrong about the initial observations regarding the various races?
  • That race is a has a relationship, not an is a relationship, each person can be multiple races.
  • How would one track race of each Person?
  • Create object with race and cache it, avoids imposing arbitrary algorithm and is computationally cheap.
  • How would one compute race?
  • Simply ask Person what race they are, allows for flexibility and avoids sensitive issues.
  • How can one allow for generality with mixed races without large overhead?
  • Create a class cluster, implement abstract class Person and then implement HumanPerson, KlingonPerson, FerengiPerson, ..., MixedPerson. If race of person doesn't fit cleanly into any of the other classes, store in MixedPerson.

Arrays in Java and C++

  • Java does not have multidimensional arrays, it only has arrays of elements.
  • Arrays are objects and extended objects, and all their methods, but they do not ovverride methods. Arrays also have .length.
  • All java arrays are reference types, all access to members/elements are index checked, java arrays don't override Object's methods, they support covariant subtyping and more dynamic subtyping is needed.
  • C++ arrays are not bounds chcked, they are type checked, no initialized values.
  • Memset initializes every element in C++ array one by one, but calling constructor() is a more elegant way of doing it.

Generics in Java and C++ Templates

  • If one wanted to implement two singly linked lists, one of doubles and one of Strings, what would be the best way to do so?
  • Best approach would be with Java Generics.
  • Add type parameters to classes i.e. public class List<T>
  • Now lists are statically safe, compiler will detect errors when an element doesnt not match the predefined type of the list.
  • Generics in Java Compiler?
  • With type erasure: removes generic type information from source code, adds casts where needed and produces byte code. Why does it work this way? Because it doesn't necessitate a change to the JVM and allows compiler to catch static type safety errors.
  • Issues: no generic arrays or instantiations using new and can't handle primitives types since they require wrapper classes.
  • Templates are C++ version of Java generics, except even more flexible.
  • Templates can be made for classes and functions and allow for the intake of many different types.
  • One can have new keyword instantiations, generic arrays and static fields that are generic.
  • Class templates: class parameters can be generically defined to make a class more flexible. Through this, one can use data structures than can handle various types without declaring separate classes for each.
  • Class templates must be declared in the header file and defined in the header file.
  • Template specialization: specialized versions of class templates can be created for specific types, for specified types compiler will use specialized templates, defined in .cc file since they are not instantiated by compiler.

Traversing Graphs and the Visitor Design Pattern

  • Given a graph of nodes, each node has a Type and Operations that can be done on them.
  • How are these implemented?
  • Can only really have easy extensibility of one, kinds of nodes or operations on nodes (called the Expression Problem).
  • If you want extensibility of types use standard OOP implementation, i.e. abstract super class with abstract methods for each operation. For new type, make a new class that extends root class. However, now, to add a new method one has to edit every node class and add it.
  • Use Visitor Design Pattern instead!
  • Now one has visitors and node classes.
  • Visitor handle operations, make visitor interface with visit methods for every type, i.e. visit(addition), visit(multiplication), etc.
  • To make a new visitor, implement aforementioned interface and make all the overloaded visit methods.
  • Each method in the interface returns some generic type R and can throw some generic exception E that extends Throwable.
  • Node classes handle types, all extend some abstract class with a method public abstract <R, E extends Throwable> R accept(Visitor<R,E> v) throws E;
  • This is double dispatching, the leaf class has an accept method which essentially calls visit on itself. Then the Visitor class calls the corresponding visit method.
  • Visitor design pattern works because we don't ask the object what it is, we let it answer for itself, we let the object pass itself as a reference.
  • Add new operation by making a new Visitor.
  • Add a new type have to edit every Visitor class to add a new visit(Type)

Difference Between Method Overloading and Method Overriding

  • Cannot confuse the two
  • Overloading
    • Occurs within the same class
    • Methods have identical names
    • Either number or types of parameters are different
    • Resolved at compile-time – based on statically declared types
  • Overriding
    • Occurs when a subclass’s method behaves differently from its superclass’s method
    • Methods also have identical names
    • Number and types of parameters are identical as well
    • Subclass’s redefined method is invoked instead of the inherited superclass’s method
    • Resolved at run-time – don’t know which instance we have

Method Overloading

  • Allows there to be several methods with the same name that differ in their inputs and outputs
    • Pros
      • Methods do the same thing conceptually and pick the right method to call with Method Overload Resolution
      • One method can perform a variety of tasks
      • Keeps your code simple – same function name for a specific task that handles various types differently
    • Cons
      • Same name for different methods that semantically don't do the same thing
  • Each overloaded method must be different in either the parameter types or the number of parameters
    • Cannot have two methods that are the same in one class
    • We DON'T distinguish by return type because selecting the correct overloaded method requires analyzing the number and types of the arguments to pick the right method
  • Typically found in statically-typed, object-oriented programming languages like Java and C++, but not Objective C
  • Method Overload Resolution in Java
    • Can’t add chars and bytes so Java treats them as ints
      • byte + byte = int
    • If there is no appropriate type, Java looks to the parent's class
      • If passing the type Exception to method m(), Java looks looks for m(Exception e), then if it doesn’t exist it looks for m(Throwable t), and then to m(Object o)
        • Since m(Object o) exists, it will execute that method
        • If m(String s) was not defined, passing a String to m() woul d be executed as m(Object o)
        public void m(Object o) {  	            System.out.println("m(Object)  : " + o);              } 					            
      • Must also make sure chosen method is appropriate: instance method from static context, for example, will not compile
    • Avoid ambiguity at all costs
      • The order of which method will run cannot be arbitrary
      • Example: input type m(B, B)
        • Since B inherits from A and there is no m(B, B), compiler won’t know whether to execute m(A, A), m(A, B), or m(B, A)
        • Program will produce an error at compile time
        • Solution: Upcast one B explicitly to be a statically typed A
          • Since upcasts are always safe → no runtime checking required
        • The compiler will choose the most specific method in regards with number of arguments passed and in regards with objects passed in arguments
          • If there does not exist a method with the specific object passed, the compiler will look for the closest related superclass in the inheritance hierarchy
    • Translator Implementation
      • Number and types have to match
        • See Java Language Specification
      • Naming is an issue as the methods implemented in the V-Table are named after the Java methods and C++ structs cannot have two fields with the same name
        • Solution is to mangle names and be consistent
        • Must know the static types of expressions and identifiers tracked by the Symbol Table
        • Overloaded methods have separate slots in the V-Table because the compiler treats them as totally different
    • Overloaded.java is provided below:
                     public class Overloaded {                 public static class A           { public String toString() { return "A"; } }                 public static class B extends A { public String toString() { return "B"; } }                 public static class C extends B { public String toString() { return "C"; } }                  public void m()           { System.out.println("m()        : ---"); }                 public void m(byte b)     { System.out.println("m(byte)    : " + b); }                 public void m(short s)    { System.out.println("m(short)   : " + s); }                 public void m(int i)      { System.out.println("m(int)     : " + i); }                 public void m(long l)     { System.out.println("m(long)    : " + l); }                 public void m(Integer i)  { System.out.println("m(Integer) : " + i); }                 public void m(Object o)   { System.out.println("m(Object)  : " + o); }                 public void m(String s)   { System.out.println("m(String)  : " + s); }                 public void m(A a)        { System.out.println("m(A)       : " + a); }                 public void m(B b)        { System.out.println("m(B)       : " + b); }                 public void m(A a1, A a2) { System.out.println("m(A,A)     : "+ a1 +", "+ a2);}                 public void m(A a1, B b2) { System.out.println("m(A,B)     : "+ a1 +", "+ b2);}                 public void m(B b1, A a2) { System.out.println("m(B,A)     : "+ b1 +", "+ a2);}                 public void m(C c1, C c2) { System.out.println("m(C,C)     : "+ c1 +", "+ c2);}                  public static void main(String[] args) {                     Overloaded o = new Overloaded();                     byte n1 = 1, n2 = 2;                     A a = new A();                     B b = new B();                     C c = new C();                      o.m();                     o.m(n1);                     o.m(n1 + n2);                     o.m(new Object());                     o.m(new Exception());                     o.m("String");                     o.m(a);                     o.m(b);                     o.m(c);                     o.m(a, a);                     o.m((A)b, b);                     o.m(c, c);                 }             }

Operator Overloading

  • Not supported in Java (with exception of the built-in plus operator for strings)
  • Supported in C++ (used in Translator)
    • Operators treated as methods; x + y can be thought of as x::operator+(y)
  • Advantages:
    • Flexibility with style and function
    • Convenience (e.g. matrix additions, subtractions, multiplications and inversions)
    • Simplicity of complex processes
    • Concision
  • Disadvantages:
    • May be confusing if an operator is overloaded for multiple classes → same symbol used for different functions → makes code harder to read and debug
    • Operators could be overloaded to do something unintuitive (switching + and -) → can cause more confusion
  • Implementation in Our Translator
    • The [] operator
      • We have to check to see if the index is out of bounds of the Array before accessing the referenced element
      • Java checks this while accessing the element, C++ does not
      • We can add this functionality by overloading the [] operator in C++
        T& operator[](int32_t index) {  	            if (0 > index || index >= length) throw ArrayIndexOutOfBoundsException();  //check index range  	            return __data[index];              }
      • We can then access the element by using this code: (*a)[2];
      • The [] must return a reference to the element in the array if it is to be modified in any way
      • We have to also define a [] const operator that will not allow modification of a const array by returning a const element
      • Java has no multidimensional arrays, only array of arrays
        • When translating, we only need to translate array of arrays
      • Representing Java arrays of arrays in C++ using our translation scheme
        int[] a;			// Java             __rt::Array* a;			// C++              int[][] a;			// Java             __rt::Array<__rt::Array*>* a;	// C++
      • Or we can use the convenient operator overloaded syntax for arrays of arrays:
        a[4][5];       // Java             (*(*a)[4])[5];		// C++
      • Overloading this operator is an example of an operator that is a method of a class, the receiver is a instance of that class
    • The << operator
      • To print a string we need to use this code: cout << k->__vptr->getName(k)->data // k.getName()
      • To avoid typing ->data every time we want to print an element, we overload the << (left shift operator)
        std::ostream& operator<<(std::ostream& out, String s)	{ 	            out << s->data; 	            return out;             }
      • This allows us to overload the left shift operator for the ostream when the String object is passed in
      • Returning out enables chaining -- if we had a void return type, then chaining woul d be impossible
      • This overloads the injection operator without modifying the standard libraries
        • We can’t and shouldn’t modify the standard libraries
        • This prints the string with less notation in the translation
      • Overloading this operator is an example of overloading an operator where a receiver is not in the class
    • Ptr Class
      • This is a wrapper class for a built-in pointer in C++
        • Has same functionality as a regular C pointer but later on it will also handle automatic memory management (smart pointers)
      • Made a new class, Ptr
        • Ptr<int> syntax more intuitive than int*
        • Contains an address to a T since we want it to behave like a real pointer
      • Constructor: Stores the address
          Ptr(T* addr): addr(addr)	{ 	            TRACE("PTR: constructor");             }
        • Invoked when the Ptr is created with a T*
      • Copy Contrstructor: Makes a copy of an existing instance
        • Called automatically by C++ when initializing a value
          • When passing an argument to a function
          • When returning an argument
          • When invoking an initializer
        • Ptr(const Ptr& other):addr(other.addr)	{ 	            TRACE("Ptr: copy constructor");             }
      • Destructor: Gets called when the value goes out of scope
        • It doesn't matter if it is on the stack or the heap
        • If you do not overwrite it, it does nothing
        • ~Ptr() { 	            TRACE("Ptr: destructor");             }
    • The = operator
      • We want to make sure the address that is stored in the instance is assigned
      • operator=(const Ptr& right)	{ 	            TRACE("Ptr: assignment operator"); 	            if (addr != right.addr)		{ 		            addr = right.addr;  	            } 	            return *this;             }
      • When overloading the assignment operator, you MUST protect against self assignment
    • The dereference (*) operator
      • We want to make sure we can deference our pointer
      • T& operator*() const	{ 	            TRACE("Ptr: dereference"); 	            return *addr;             }
    • The -> operator
        T* operator->() const		{ 	            TRACE("Ptr: arrow"); 	            return addr;             }

What is a Smart Pointer?

  • Smart pointers are C++ objects that simulate simple pointers by implementing operator-> and the unary operator*
    • Performs tasks such as memory management and locking
  • A smart pointer is a C++ class that mimics a regular pointer in syntax and some semantics, but does more
  • Since smart pointers to different types of objects tend to have a lot of code in common, almost all smart pointers are templated by the pointee type (generalizing SmartPtr to smart pointer of any type)
     template <class T> class SmartPtr { public: 	explicit SmartPtr(T* pointee) : pointee_(pointee); 	SmartPtr& operator=(const SmartPtr& other); // Equals operation on SmartPtr reference 	~SmartPtr(); // Deconstructor – constructor’s complement, called if any cleanup is needed when an object of the class goes out of scope 	T& operator*() const 	{ 		... 		return *pointee_; 	} 	T* operator->() const 	{ 		... 		return pointee_; 	} private: 	T* pointee_; 	... };
  • SmartPtr<T> sets a pointer to T in its member variable pointee_
     class Widget { public:    void Fun(); };  SmartPtr sp(new Widget); sp->Fun(); (*sp).Fun();                  
  • Aside from the definition of sp, nothing reveals it as not being a pointer (to a Widget object)
    • You can replace pointer definitions with smart pointer definitions without incurring major changes to application code

Why Replace Simple Pointers with Smart Pointers?

  • Smart pointers have value semantics, just like simple pointers, but also execute application-specific code, whereas simple pointers do not
    • An object with value semantics is an object that you can copy and assign to (able to create, copy, and change object freely)
    • A pointer with value semantics is a pointer that you use to iterate in a buffer
      • Initialize pointer to point to the beginning of the buffer, and you bump it until you reach the end
      • Along the way, you can copy its value to other variables to hold temporary values
  • Smart Pointers offer ownership management in addition to pointer-like behavior
    • Pointers that hold values allocated with new
       Widget* p = new Widget;                         
      • The variable p not only points to, but also owns, the memory allocated for the Widget object
        • You must issue delete p to destroy the Widget object and release its memory
         p = 0; // assign something else to p 
      • Lose ownership of the object previously pointed to by p and have no chance at all to get a grip on it again
        • Resource leak
    • Copy p into another variable
      • Since ownership of memory not automatically managed by compiler – result in two pointers pointing to same object
        • Issue of double delete or no delete
        • Must be careful tracking pointers
  • Std::auto_ptr – after copying a smart pointer to an object, the source pointer becomes null and the destination points to (and holds ownership of) the object
  • Reference counting – track the total count of smart pointers that point to the same object, when the count reaches 0, delete pointed-to object

Ownership Handling Strategies

  • Smart pointers own the objects to which they point and takes care of deleting the pointed-to object under the covers
  • Various ownership management strategies and how smart pointer implements them
    • Deep Copy
      • Copy the pointee object whenever you copy the smart pointer
      • Only one smart pointer for each pointee object, thus the smart pointer’s destructor can safely delete the pointee object
      • Smart pointers are vehicles for transporting polymorphic objects safely
      • I.e. you hold a smart pointer to a base class, which might actually point to a derived class – when you copy the smart pointer you want to copy its polymorphic behavior too
         class AbstractBase {    ...    virtual Base* Clone() = 0; };   class Concrete : public AbstractBase {    ...    virtual Base* Clone()    {       return new Concrete(*this);    } }; 
      • Clone implementation must follow the same pattern in all derived classes – prevents slicing (only parent class of object gets copied and not the subclass’s portion)
    • Copy on Write
      • Clone the pointee object at the first attempt of modification and until then, several pointers can share the same object (optimization to avoid unnecessary object copying)
      • COW effective mostly as an implementation optimization for full-featured classes
        • Smart pointers at too low a level to implement COW semantics effectively
    • Reference Counting
      • Reference counting tracks the number of smart pointers that point to the same object; when that number goes to 0, the pointee object is deleted
        • Do NOT keep smart pointers and dumb pointers to the same object
        • Actual counter must be shared among smart pointer objects – each smart pointer holds a pointer to the reference counter in addition to the pointer to the object itself
      • Most effective solution is to hold the reference counter in the pointee object itself
      • Faster than reference linking – only an indirection and an increment needed
    • Reference Linking
      • Relies on the observation that you don’t really need the actual count of smart pointer objects pointing to one pointee object – only need to detect when that count goes down to 0
      • Ownership list – all smart pointer objects that point to a given pointee form a doubly linked list, when list becomes empty, the pointee object is deleted
    • Reference management – counting or linking – victim of the resource leak Cyclic Reference
      • If object A holds smart pointer to object B and B holds a smart pointer to A – these 2 objects form a cyclic reference even though you don’t use any of them anymore
      • Reference management strategy cannot detect such cyclic references and the 2 objects remain allocated forever
    • Destructive Copy
      • During copying, destructive copy destroys the object being copied – destroys the source smart pointer by taking its pointee object and passing it to the destination smart pointer
      • During the copying or assignment of one smart pointer to another, the “living” pointer is passed to the destination of the copy and the source’s pointee_ becomes 0
      • Because smart pointers do not support value semantics – smart pointers with destructive copy cannot be stored in comtainers and in general must be handled with care
       template <class T> class SmartPtr { public:    SmartPtr(SmartPtr& src)    {       pointee_ = src.pointee_;       src.pointee_ = 0;    }    SmartPtr& operator=(SmartPtr& src)    {       if (this != &src)       {          delete pointee_;          pointee_ = src.pointee_;          src.pointee_ = 0;       }       return *this;    }    ... };  void Display(SmartPtr<Something> sp); ... SmartPtr<Something> sp(new Something); Display(sp);  // sinks sp – after Display(sp) called, sp holds the null pointer 
    • Address-of Operator (operator&)
       template <class T> class SmartPtr { public:    T** operator&()    {       return &pointee_;    }    ... };  
      • Overloading unary operator& harmful
        • Exposing the address of the pointed-to object implied giving up any automatic ownership management (i.e. reference counts becomes invalid)
          • Overloading unary operator& for a type makes generic programming impossible for that type because the address of an object is a raw pointer outside the smart pointer's control
          • Most generic code assumes that applying & to an object of type T returns an object of type T*
    • Implicit Conversion to Raw Pointer Types
      • Example code
         void Fun(Something* p); ... SmartPtr<Something>       sp(new Something); Fun(sp); // OK or error?   template <class T> class SmartPtr { public:    operator T*() // user-defined conversion to T*    {       return pointee_;    }    ... }; 
      • User-defined conversions dangerous – inherently giving the user unattended access to the raw pointer that the smart pointer wraps and user-defined conversions pop up unexpectedly

Explicit Memory Management - Law of the Big Three

  • If your class requires a copy constructor, destructor, or assignment operator, it is likely to require all three!

Value Semantics vs. Reference Semantics

  • Encapsulates string with value semantics (ability to manipulate without worrying about other variables – localized)
    • Strings - reference semantics - since size of the string unknown, it cannot be allocated to stack
    • Strings stored on the heap – forced to have references
    • Copy constructer – pass stringval instances as parameters to a function
      • Get length of cstring, set data to be new array of char (size of length), then copy contents of s into data
       stringval::stringval(const stringval& other)    : len(other.len), data(new char[len]) {   TRACE();   std::memcpy(data, other.data, len); } 
    • Destructor – free up heap memory (delete data)
       stringval::~stringval() {   TRACE();   delete[] data; } 
    • Assignment operator – overwrite the default c++ assignment to copy the contents only
      • When no operator is defined, assignment operator copies the context of the instance (copy the pointer instead of creating a new copy)
        • Causes memory leak since it overwrites the old pointer
      • Like destructor combined with copy constructor
        • No self-assignment – check to make sure data and other.data are not equal
        • Clean up the current instance before taking new information
        • After deleting data, re-instantiate data as a char[] before copying
       stringval& stringval::operator=(const stringval& other) {   TRACE();   if (data != other.data) {     delete[] data;     len = other.len;     data = new char[len];     std::memcpy(data, other.data, len);   }   return *this; } 

Smart Pointers

  • Smart pointer – template class that behaves like a pointer but does more work (i.e. counts references)
  • Takes away necessity of explicit upcasts
    • EX: If Ptr<T> and Ptr<U>, type constructor that takes Ptr<U> and casts to Ptr<T>
    • Advantage - no more upcasts
    • Disadvantage - performs implicit downcasts and even cross-casts between unrelated types
  • Instead of raw pointers, use pointer templates
    • Advantage – declare == and != between T's and U's with no upcast needed and convert between unrelated types
  • Reference Counting
    • Delete data layouts when there is no reference to them (number of references pointing to object == 0)
      • If there are no pointers, that data layout cannot be accessed anymore, so we can delete it
      • Reference counting counts number of pointers to a data layout
      • Gives us a rule for when we can safely deallocate data on the heap
      • Smart pointer template allows us to observe when references are created, copied, and deleted
    • Counter of references to a particular object should be stored in a separate data structure on the heap or as part of the object’s data layout
      • All pointers MUST share the same count variable
      • Count has to be shared between all references
      • Separate counter structure on heap – two pointers in every smart pointer object but no change to reference-counted object (the object that the reference counter is counting the references of)
      • Memory overhead for each smart pointer object
      • Counter as part of object data: one pointer in smart pointer but also requires data structure be prepared in reference-counted object to store references
      • Have to modify data structure that is storing references
      • ptr.h implementation
        • Constructor
           Ptr(T* addr = 0) : addr(addr), counter(new size_t(1)) {       TRACE(addr);     }
        • Copy Constructor
           Ptr(const Ptr& other) : addr(other.addr), counter(other.counter) {       TRACE(addr);       ++(*counter); //Increments the address which is wrong     }
        • Destructor
           ~Ptr() {       TRACE(addr);       if (0 == --(*counter)) {         if (0 != addr) addr->__vptr->__delete(addr);         delete counter;       }     }
        • Assignment Operator
           Ptr& operator=(const Ptr& right) {       TRACE(addr);       if (addr != right.addr) {         if (0 == --(*counter)) {           if (0 != addr) addr->__vptr->__delete(addr);           delete counter;         }         addr = right.addr;         counter = right.counter;         ++(*counter);       }       return *this;     }
    • Implementation
      • Need to know when a reference to a data layout goes away and when a reference to a data layout is added
        • Copy constructor – always invoked when a new reference is formed to an existing data layout
        • On initialization of a newly created reference with an already existing reference to comparable types
        • Destructor – invoked when that variable goes out of scope and when variable is removed from the stack and the heap
        • Assignment operator - performs work of both copy and destructor, or none on self assignment
      • Cannot be part of the smart pointer itself – can't tell all other smart pointers to update the count
      • Put reference count on the heap, invasively or separately
        • Invasive – change each object to have a reference field
        • Seperately – doubles memory usage because we have an extra pointer for each object
      • When initializing a smart pointer, the count is one
      • When copying, we have to increase the count by one
      • When a smart pointer goes away, we need to decrease the reference count and make sure its not 0
        • If it is zero, we have to deallocate the data layout and the counter
    • Performance overhead every time create or remove references – need to look at the value of the count, check it, maybe increase it, and determine if it needs to be removed
  • How does C++ know the right size of types/object to delete with subtyping?
    • Virtual methods/method overwriting
    • Virtual destructor for our translated objects
      • New vtable slot after __isa for __delete
      • __delete in initialization of VTable
  • Arrays
    • Special destructor for arrays because we separately allocated __data on the heap – only after we delete data can we delete addr

C++ Casting

  • C casting is allowed in C++, but it is deprecated since C++ has it's own ways of casting
  • Implicit Cast
    • Always safe
    • BUT Problematic since wrong method will be called for not-virtual methods
    • C++ can have non-virtual methods while Java cannot
     Shape* s = new Circle();  Class Shape{ 	Type() { 		Cout << “I’m Shape”; 	} }  Class Circle{ 	Type() { 		Cout << “I’m Circle”; 	} }  s.type() //returns “I’m shape” 
  • Why is Shape* s = new Circle() a cast?
    • Static and dynamic types are different
    • Don't need an explicit cast because its an upcast which is always safe
    • As soon as you use subtyping in C++, you should use virtual methods
  • Static DownCast
    • Asserts the cast is correct
    • Unsafe, but compiler still checks if the classes are related (i.e., the cast is either up or down the inheritance hierarchy)
       Circle* c1 = static_cast<Rectangle*>(s); 
    • No runtime overhead
    • Assume the cast is valid – you as the programmer know better than the compiler
      • Not safe from human error
    • What happens at compile time?
      • Determines the static type of the argument expression, in this case s
      • Determines the type of the type parameter, <Circle>
      • Makes sure the classes are related
      • Does nothing else – no code at runtime executed
    • Treat something as const
      • Safe
      • BUT object may change by accessing original pointer
         const Circle* c2 = c1; //object can still be changed through c1 
    • Const Cast
      • Casting away const-ness (semantics of const) or to treat something as const
        • The compiler checks that types are same modulo const
        • Compiler makes sure the types are related – so still stay within confines of C++ type system
        • For caches
      • Dangerous
      • Going from unconst to const is safe – taking rights (to modify object) away so always works, implicitly
        • No need to use const_cast
         Circle* c2 =const_cast<Circle*>(c1); 
      • No runtime overhead
      • Assume the cast is valid – you as the programmer know better than the compiler
    • Reinterpret Cast
      • Treats bits completely differently
      • Insanely bad – able to cast into unrelated classes (At compile time allows us to go between completely unrelated types)
      • Disregards type system
         intptr_t i = reinterpret_cast<intptr_t>(c3);  //changing bit pattern of pointer to a number  Window* w1 = reinterpret_cast<Window*>(c3); // Compiles and returns a Window object - BAD //Forge pointer (objects) – reinterpret as pointer to Window* and treat as Window object (able to call Window methods and members) 
      • No runtime overhead
      • Assume the cast is valid – you as the programmer know better than the compiler
    • Dynamic Cast
      • Guaranteed to be safe
      • Requires some overhead
      • Safe, but need to check if the casting succeeded for pointers
    • Only two casts exist in java
      • Explicit cast (can go up or down)
        • Handled by the java_cast template function
        • Invokes an reinterpret cast (written as a C cast) in the initialization list of the copy constructor for the smart pointer
      • Implicit upcast
        • Handled with smart pointer copy constructor