OOP Class Notes 09/26/13
- If we want to add fields in a subclass, where do we store that data in memory?
- Below the memory reserved for the superclass.
- Code from the subclass can access the memory above using the same offsets as code from the superclass.
- If we have polymorphic data structures of variable sizes, how should we pass the data?
- By reference. Hence in Java, all objects are passed by reference
- For virtual methods we have a per class (for all instances) vtable containing pointers to the different implementations
- Every instance has a vptr pointing to the vtable
- For example, at offset zero in the vtables we have the
toString
method - New methods added by a subclass extend the vtable
- Overriden methods go into an existing slot
Object
andString
have their own vtables, andString
’s vtable is an overriden clone ofObject
’s vtable- In Java when you declare a subclass that extends a superclass, you clone the superclass' vtable and add the addresses of any new methods that are not private or static to the vtable
- Here is a visual representation:
- Please note that the data layouts of object instances and their respective vtables are not necessarily adjacent in memory
- Everything is inside the
lang
namespace which is inside thejava
namespace We are using double underscore convention to prefix names which are internal to the translator.
// Forward declarations of data layout and vtables. struct __Object; struct __Object_VT; struct __String; struct __String_VT; struct __Class; struct __Class_VT; // Definition of type names, which are equivalent to Java semantics, // i.e., an instance is the address of the object's data layout. typedef __Object* Object; typedef __Class* Class; typedef __String* String;
- We then use
typedef
to define the Java type names as pointers to the internal structs that are prefixed with the double underscores. Data layout of
java.lang.Object
in C++struct __Object { __Object_VT* __vptr; // The constructor. __Object(); // The methods implemented by java.lang.Object. static int32_t hashCode(Object); static bool equals(Object, Object); static Class getClass(Object); static String toString(Object); // The function returning the class object representing // java.lang.Object. static Class __class(); // The vtable for java.lang.Object. static __Object_VT __vtable; };
__Object
has a__vptr
field which points to its virtual method table, a static__vtable
field for all instances, the built in methodshashCode
,equals
,getClass
,toString
, and a__class
method which returns the class object representing itself.- Remember that every instance of
__Object
has a pointer to the vtable, otherwise the whole premise of virtual methods would not work.
- Remember that every instance of
- We use
static Class __class()
as opposed to declaring_class
as a static variable to avoid the possibility that it is initialized after other static variables that depend on it. See this resource for more details about the “static initialization order fiasco”. __Object
’s vtable layout:struct __Object_VT { Class __isa; int32_t (*hashCode)(Object); bool (*equals)(Object, Object); Class (*getClass)(Object); String (*toString)(Object); __Object_VT() : __isa(__Object::__class()), hashCode(&__Object::hashCode), equals(&__Object::equals), getClass(&__Object::getClass), toString(&__Object::toString) {} };
__Object_VT
has an__isa
property which points to its class, and then pointers to the methods ofjava.lang.Object
. It has a “no-argument” constructor denoted by__Object_VT()
which stores the addresses of__Object
’shashCode
,equals
,getClass
andtoString
methods in the appropriate fields of the vtable.- For example,
int32_t (*hashCode)(Object)
is a pointer to a function that takes an Object as a parameter and returns anint32_t
type. - Notice how we use
&
to store an address in the no-argument constructor
- For example,
- Remember that in Java
this
is an implicit argument for every instance’s method so each vtable method has__Object
as the parameter. int
in Java has 32 bits, but in C++ it depends on the architecture so we specifyint32_t
forhashCode
's return type.The data layout of
java.lang.String
in C++ is a clone of__Object
except in addition it addsdata
which uses the C++std::string
type, and also adds alength
andcharAt
method.struct __String { __String_VT* __vptr; std::string data; // The constructor; __String(std::string data); // The methods implemented by java.lang.String. static int32_t hashCode(String); static bool equals(String, Object); static String toString(String); static int32_t length(String); static char charAt(String, int32_t); // The function returning the class object representing // java.lang.String. static Class __class(); // The vtable for java.lang.String. static __String_VT __vtable; };
The vtable for
__String
// The vtable layout for java.lang.String. struct __String_VT { Class __isa; int32_t (*hashCode)(String); bool (*equals)(String, Object); Class (*getClass)(String); String (*toString)(String); int32_t (*length)(String); char (*charAt)(String, int32_t); __String_VT() : __isa(__String::__class()), hashCode(&__String::hashCode), equals(&__String::equals), getClass((Class(*)(String))&__Object::getClass), toString(&__String::toString), length(&__String::length), charAt(&__String::charAt) { } };
The type of the first parameter for
__Object
’sgetClass
and__String
’sgetClass
differ (the implicit this), so we need a cast –getClass((Class(*)(String)) …)
- We also need to define
java.lang.Class
because every object needs a class object, which is static and shared by all instances of the class. - The
Class
objects are used to keep track of the dynamic type of objects The data layout of
java.lang.Class
in C++struct __Class { __Class_VT* __vptr; String name; Class parent; // The constructor. __Class(String name, Class parent); // The instance methods of java.lang.Class. static String toString(Class); static String getName(Class); static Class getSuperclass(Class); static bool isInstance(Class, Object); // The function returning the class object representing // java.lang.Class. static Class __class(); // The vtable for java.lang.Class. static __Class_VT __vtable; };
__Class
has aname
field to denote the class' name as well as aparent
field to reference the parent class. The latter is used to implement thegetSuperclass
method, which return's a reference to an object’s superclass.// The vtable layout for java.lang.Class. struct __Class_VT { Class __isa; int32_t (*hashCode)(Class); bool (*equals)(Class, Object); Class (*getClass)(Class); String (*toString)(Class); String (*getName)(Class); Class (*getSuperclass)(Class); bool (*isInstance)(Class, Object); __Class_VT() : __isa(__Class::__class()), hashCode((int32_t(*)(Class))&__Object::hashCode), equals((bool(*)(Class,Object))&__Object::equals), getClass((Class(*)(Class))&__Object::getClass), toString(&__Class::toString), getName(&__Class::getName), getSuperclass(&__Class::getSuperclass), isInstance(&__Class::isInstance) {} };
We have a
__rt
namespace for anull
value function and aString
literal to convert a C string to ajava.lang.String
object instead of letting C++ implicitly convert the C string to astd::string
namespace __rt { // The function returning the canonical null value. java::lang::Object null(); // Function for converting a C string lieral to a translated // Java string. inline java::lang::String literal(const char * s) { // C++ implicitly converts the C string to a std::string. return new java::lang::__String(s); } }
java.lang.Object
// java.lang.Object() __Object::__Object() : __vptr(&__vtable) {} // java.lang.Object.hashCode() int32_t __Object::hashCode(Object __this) { return (int32_t)(intptr_t)__this; } // java.lang.Object.equals(Object) bool __Object::equals(Object __this, Object other) { return __this == other; } // java.lang.Object.getClass() Class __Object::getClass(Object __this) { return __this->__vptr->__isa; } // java.lang.Object.toString() String __Object::toString(Object __this) { // Class k = this.getClass(); Class k = __this->__vptr->getClass(__this); std::ostringstream sout; sout << k->__vptr->getName(k)->data << '@' << std::hex << (uintptr_t)__this; return new __String(sout.str()); } // Internal accessor for java.lang.Object's class. Class __Object::__class() { static Class k = new __Class(__rt::literal("java.lang.Object"), (Class)__rt::null()); return k; } // The vtable for java.lang.Object. Note that this definition // invokes the default no-arg constructor for __Object_VT. __Object_VT __Object::__vtable;
- For
hashCode
we cannot cast__this
directly toint32_t
because that doesn’t work on 64 bit architectures. So we cast first tointptr_t
and then toint32_t
hashCode
and other methods takethis
as an implicit parameter. Sincethis
is a reserved keyword in C++, we useObject __this
as a reference to the instance receiving the method call.- See the rest of the code in java_lang.cc and main.cc
- The implementation of
__Class
is important because without it we would not be able to track the dynamic type of objects.Class
is what links objects in the inheritance hiearchy. isInstance
traverses the inheritance hierarchy upwards (until it hitsnull
) to determine whether an object is an instance of a given class// java.lang.Class.isInstance(Object) bool __Class::isInstance(Class __this, Object o) { Class k = o->__vptr->getClass(o); do { if (__this->__vptr->equals(__this, (Object) k)) return true; k = k->__vptr->getSuperclass(k); } while ((Class)__rt::null() != k); return false; }
- We have no notion of classes, inheritance, or virtual methods in the target language of the translator. That is, we need to translate these concepts by hand because using C++ inheritance and virtual methods in our translator IS NOT ALLOWED
- In a statically typed language we can build a per class vtable that represents the behavior of each class.
- We only need the contract once for all instances, and we can hook the behavior up to the vtable with a pointer.
- Then the question becomes, how do we fill things in.
- So, say we have a class B, which inherits from A, which inherits from
Object
, and we want to do the data layout for B. - B, by definition of inheritance, has all the same fields and data as A.
- So the data layout for B consists of the data layout for A, and then the new data of B appended to that.
- So, say we have a class B, which inherits from A, which inherits from
- Similarly, the data layout for A consists of the data layout
for
Object
, and the new data of A appended. - We know that the data layout for
Object
only consists of a vptr, because we programmed it today. If we have a class C that is also a subclass of A, its data layout will also consist of the data layout of A andObject
, but because it is a sibling of B, it will not have access to the data that is unique to B. - The vtable for
Object
also has the__isa
pointer and the pointers to the four methods we need – that’s the contract of__Object
. - If we override a method with a new implementation, we already know what slot it has in the vtable.
- That is, overriding of virtual methods is implemented by replacing a pointer in the vtable.
- The pointer required for an object in C++ to use virtual methods adds 8 bytes of space to the object, but any additional virtual methods will not further increase the size of the objects if we add more virtual methods, i.e. once a class has a single virtual method the increase in size is set
Without Virtual MethodsWith Virtual Methodssizeof(Point)
32 bytes = 4 doubles40 bytes = 1Point
+ pointer for vtablesizeof(ColorPoint)
40 bytes = 1Point
+ 1Color
+ padding48 bytes = 1ColorPoint
+ pointer for vtable
Implementing inheritance and virtual method dispatch by hand
We talked about inheritance on Tuesday 9/24, but the challenge of
implementing it efficiently is that we have to worry about the layout
in memory. We will first generally go over inheritance and virtual
methods in Java by drawing the data layout of classes, instances, and
vtables. Then we will write C++ code that
implements java.lang.Object
, java.lang.String
and java.lang.Class
so that you can see what these
structures look like in C++ code without using inheritance and virtual methods.
Note that you CANNOT use C++ inheritance or virtual methods in your translator
This is why we are going to write these Java object structures by hand in C++.