Object-Oriented Programming

CSCI-UA.0470-001

NYU, Undergraduate Division, Computer Science Course - Fall 2013

OOP Class Notes 09/26/13

    Implementing inheritance and virtual method dispatch by hand

    We talked about inheritance on Tuesday 9/24, but the challenge of implementing it efficiently is that we have to worry about the layout in memory. We will first generally go over inheritance and virtual methods in Java by drawing the data layout of classes, instances, and vtables. Then we will write C++ code that implements java.lang.Object, java.lang.String and java.lang.Class so that you can see what these structures look like in C++ code without using inheritance and virtual methods.

    Note that you CANNOT use C++ inheritance or virtual methods in your translator

    This is why we are going to write these Java object structures by hand in C++.

    Overview: data layout of class inheritance

    • If we want to add fields in a subclass, where do we store that data in memory?
      • Below the memory reserved for the superclass.
    • Code from the subclass can access the memory above using the same offsets as code from the superclass.
    • If we have polymorphic data structures of variable sizes, how should we pass the data?
      • By reference. Hence in Java, all objects are passed by reference
    • For virtual methods we have a per class (for all instances) vtable containing pointers to the different implementations
    • Every instance has a vptr pointing to the vtable
    • For example, at offset zero in the vtables we have the toString method
    • New methods added by a subclass extend the vtable
    • Overriden methods go into an existing slot
    • Object and String have their own vtables, and String’s vtable is an overriden clone of Object’s vtable
    • In Java when you declare a subclass that extends a superclass, you clone the superclass' vtable and add the addresses of any new methods that are not private or static to the vtable
    • Here is a visual representation:
    • Please note that the data layouts of object instances and their respective vtables are not necessarily adjacent in memory

    C++ layout of java.lang.Object, java.lang.String and java.lang.Class in the header file

    • Everything is inside the lang namespace which is inside the java namespace
    • We are using double underscore convention to prefix names which are internal to the translator.

      // Forward declarations of data layout and vtables.
      struct __Object;
      struct __Object_VT;
      
      struct __String;
      struct __String_VT;
      
      struct __Class;
      struct __Class_VT;
      
      // Definition of type names, which are equivalent to Java semantics,
      // i.e., an instance is the address of the object's data layout.
      typedef __Object* Object;
      typedef __Class* Class;
      typedef __String* String;
      
    • We then use typedef to define the Java type names as pointers to the internal structs that are prefixed with the double underscores.
    • Data layout of java.lang.Object in C++

      struct __Object {
        __Object_VT* __vptr;
        // The constructor.
        __Object();
      
        // The methods implemented by java.lang.Object.
        static int32_t hashCode(Object);
        static bool equals(Object, Object);
        static Class getClass(Object);
        static String toString(Object);
      
        // The function returning the class object representing
        // java.lang.Object.
        static Class __class();
      
        // The vtable for java.lang.Object.
        static __Object_VT __vtable;
      };
      
    • __Object has a __vptr field which points to its virtual method table, a static __vtable field for all instances, the built in methods hashCode, equals, getClass, toString, and a __class method which returns the class object representing itself.

      • Remember that every instance of __Object has a pointer to the vtable, otherwise the whole premise of virtual methods would not work.
    • We use static Class __class() as opposed to declaring _class as a static variable to avoid the possibility that it is initialized after other static variables that depend on it. See this resource for more details about the “static initialization order fiasco”.
    • __Object’s vtable layout:

      struct __Object_VT {
        Class __isa;
        int32_t (*hashCode)(Object);
        bool (*equals)(Object, Object);
        Class (*getClass)(Object);
        String (*toString)(Object);
      
        __Object_VT()
          : __isa(__Object::__class()),
            hashCode(&__Object::hashCode),
            equals(&__Object::equals),
            getClass(&__Object::getClass),
            toString(&__Object::toString) {}
      };
      
    • __Object_VT has an __isa property which points to its class, and then pointers to the methods of java.lang.Object. It has a “no-argument” constructor denoted by __Object_VT() which stores the addresses of __Object’s hashCode, equals, getClass and toString methods in the appropriate fields of the vtable.
      • For example, int32_t (*hashCode)(Object) is a pointer to a function that takes an Object as a parameter and returns an int32_t type.
      • Notice how we use & to store an address in the no-argument constructor
    • Remember that in Java this is an implicit argument for every instance’s method so each vtable method has __Object as the parameter.
    • int in Java has 32 bits, but in C++ it depends on the architecture so we specify int32_t for hashCode's return type.
    • The data layout of java.lang.String in C++ is a clone of __Object except in addition it adds data which uses the C++ std::string type, and also adds a length and charAt method.

        struct __String {
            __String_VT* __vptr;
            std::string data;
      
            // The constructor;
            __String(std::string data);
      
            // The methods implemented by java.lang.String.
            static int32_t hashCode(String);
            static bool equals(String, Object);
            static String toString(String);
            static int32_t length(String);
            static char charAt(String, int32_t);
      
            // The function returning the class object representing
            // java.lang.String.
            static Class __class();
      
            // The vtable for java.lang.String.
            static __String_VT __vtable;
        };
      
    • The vtable for __String

       // The vtable layout for java.lang.String.
       struct __String_VT {
           Class __isa;
           int32_t (*hashCode)(String);
           bool (*equals)(String, Object);
           Class (*getClass)(String);
           String (*toString)(String);
           int32_t (*length)(String);
           char (*charAt)(String, int32_t);
      
           __String_VT()
           : __isa(__String::__class()),
             hashCode(&__String::hashCode),
             equals(&__String::equals),
             getClass((Class(*)(String))&__Object::getClass),
             toString(&__String::toString),
             length(&__String::length),
             charAt(&__String::charAt) {
           }
       };
      
    • The type of the first parameter for __Object’s getClass and __String’s getClass differ (the implicit this), so we need a cast – getClass((Class(*)(String)) …)

    • We also need to define java.lang.Class because every object needs a class object, which is static and shared by all instances of the class.
    • The Class objects are used to keep track of the dynamic type of objects
    • The data layout of java.lang.Class in C++

      struct __Class {
        __Class_VT* __vptr;
        String name;
        Class parent;
      
        // The constructor.
        __Class(String name, Class parent);
      
        // The instance methods of java.lang.Class.
        static String toString(Class);
        static String getName(Class);
        static Class getSuperclass(Class);
        static bool isInstance(Class, Object);
      
        // The function returning the class object representing
        // java.lang.Class.
        static Class __class();
      
        // The vtable for java.lang.Class.
        static __Class_VT __vtable;
      };
      
    • __Class has a name field to denote the class' name as well as a parent field to reference the parent class. The latter is used to implement the getSuperclass method, which return's a reference to an object’s superclass.

      // The vtable layout for java.lang.Class.
      struct __Class_VT {
        Class __isa;
        int32_t (*hashCode)(Class);
        bool (*equals)(Class, Object);
        Class (*getClass)(Class);
        String (*toString)(Class);
        String (*getName)(Class);
        Class (*getSuperclass)(Class);
        bool (*isInstance)(Class, Object);
      
        __Class_VT()
          : __isa(__Class::__class()),
            hashCode((int32_t(*)(Class))&__Object::hashCode),
            equals((bool(*)(Class,Object))&__Object::equals),
            getClass((Class(*)(Class))&__Object::getClass),
            toString(&__Class::toString),
            getName(&__Class::getName),
            getSuperclass(&__Class::getSuperclass),
            isInstance(&__Class::isInstance) {}
      };
      
    • We have a __rt namespace for a null value function and a String literal to convert a C string to a java.lang.String object instead of letting C++ implicitly convert the C string to a std::string

      namespace __rt {
        // The function returning the canonical null value.
        java::lang::Object null();
      
        // Function for converting a C string lieral to a translated
        // Java string.
        inline java::lang::String literal(const char * s) {
          // C++ implicitly converts the C string to a std::string.
          return new java::lang::__String(s);
        }
      }
      

    Notes on C++ implementation of java.lang.Object, java.lang.String and java.lang.Class methods

    • java.lang.Object

      // java.lang.Object()
      __Object::__Object() : __vptr(&__vtable) {}
      
      // java.lang.Object.hashCode()
      int32_t __Object::hashCode(Object __this) {
        return (int32_t)(intptr_t)__this;
      }
      
      // java.lang.Object.equals(Object)
      bool __Object::equals(Object __this, Object other) {
        return __this == other;
      }
      
      // java.lang.Object.getClass()
      Class __Object::getClass(Object __this) {
        return __this->__vptr->__isa;
      }
      
      // java.lang.Object.toString()
      String __Object::toString(Object __this) {
        // Class k = this.getClass();
        Class k = __this->__vptr->getClass(__this);
      
        std::ostringstream sout;
        sout << k->__vptr->getName(k)->data
             << '@' << std::hex << (uintptr_t)__this;
        return new __String(sout.str());
      }
      
      // Internal accessor for java.lang.Object's class.
      Class __Object::__class() {
        static Class k =
          new __Class(__rt::literal("java.lang.Object"), (Class)__rt::null());
        return k;
      }
      
      // The vtable for java.lang.Object.  Note that this definition
      // invokes the default no-arg constructor for __Object_VT.
      __Object_VT __Object::__vtable;
      
    • For hashCode we cannot cast __this directly to int32_t because that doesn’t work on 64 bit architectures. So we cast first to intptr_t and then to int32_t
    • hashCode and other methods take this as an implicit parameter. Since this is a reserved keyword in C++, we use Object __this as a reference to the instance receiving the method call.
    • See the rest of the code in java_lang.cc and main.cc
    • The implementation of __Class is important because without it we would not be able to track the dynamic type of objects. Class is what links objects in the inheritance hiearchy.
    • isInstance traverses the inheritance hierarchy upwards (until it hits null) to determine whether an object is an instance of a given class

      // java.lang.Class.isInstance(Object)
      bool __Class::isInstance(Class __this, Object o) {
        Class k = o->__vptr->getClass(o);
      
        do {
          if (__this->__vptr->equals(__this, (Object) k)) return true;
          k = k->__vptr->getSuperclass(k);
        } while ((Class)__rt::null() != k);
      
        return false;
      }
      

    Summary

    • We have no notion of classes, inheritance, or virtual methods in the target language of the translator. That is, we need to translate these concepts by hand because using C++ inheritance and virtual methods in our translator IS NOT ALLOWED
    • In a statically typed language we can build a per class vtable that represents the behavior of each class.
      • We only need the contract once for all instances, and we can hook the behavior up to the vtable with a pointer.
    • Then the question becomes, how do we fill things in.
      • So, say we have a class B, which inherits from A, which inherits from Object, and we want to do the data layout for B.
      • B, by definition of inheritance, has all the same fields and data as A.
      • So the data layout for B consists of the data layout for A, and then the new data of B appended to that.
    • Similarly, the data layout for A consists of the data layout for Object, and the new data of A appended.
    • We know that the data layout for Object only consists of a vptr, because we programmed it today. If we have a class C that is also a subclass of A, its data layout will also consist of the data layout of A and Object, but because it is a sibling of B, it will not have access to the data that is unique to B.
    • The vtable for Object also has the __isa pointer and the pointers to the four methods we need – that’s the contract of __Object.
    • If we override a method with a new implementation, we already know what slot it has in the vtable.
    • That is, overriding of virtual methods is implemented by replacing a pointer in the vtable.

    C++ Virtual Method Layout

    • The pointer required for an object in C++ to use virtual methods adds 8 bytes of space to the object, but any additional virtual methods will not further increase the size of the objects if we add more virtual methods, i.e. once a class has a single virtual method the increase in size is set
      Method Layout
       
      Without Virtual Methods
      With Virtual Methods
      sizeof(Point)
      32 bytes = 4 doubles
      40 bytes = 1 Point + pointer for vtable
      sizeof(ColorPoint)
      40 bytes = 1 Point + 1 Color + padding
      48 bytes = 1 ColorPoint + pointer for vtable