JVM:  Verification

Honors Computer Systems Organization (Prof. Grishman)

Refs:  Meyer & Downing, chapter 5
          Engel, chapter 6
          The Java Virtual Machine Specification, Second ed., section 4.9

When a Java class file is loaded from an untrusted site, it is first verified.  To insure system security --- to keep incorrect or malicious programs from harming the system --- it is important that Java programs be thoroughly checked.  This checking could be done at execution time, but that would slow things down signficantly.  The verifier tries to do as much checking as possible at load time, thus minimizing the impact on run time.

The verifier first checks that the structure of the Java class file is correct.  Once that is checked, it applies the bytecode verifier to the code of each method.

The bytecode verifier makes some straightforward checks which can be applied separately to each instruction;  for example,

In addition, the verifier checks that each instruction will find a correctly formed stack and local variable array (Meyer & Downing, p. 103;  Engel, p. 144, JVM Spec., section 4.9.2).    Specifically, it checks that the items on top of the stack, and in the local variables, are of the correct type for the instruction --- that we don't do an iload followed by an astore, for example.

For straight-line code, the process is straightforward.  The verifier begin with the stack empty;  the local variables that correspond to method parameters are initialized to the types of those parameters, while the other local variables are marked uninitialized.  The verifier then steps through the instructions, first checking that the initial state is valid for the instruction, and then updating the type information for the stack and local variables appropriately.  (For example, for an iadd the top two items on the stack must be ints;  the iadd replaces this with a single int.)

For code with branches, the bookkeeping is more complicated:

mark the first instruction;  annotate it with the initial stack and variable state
while some instruction is marked
pick a marked instruction
check that stack and variables are valid for that instruction
update stack and variables
identify instructions which can follow this instruction;  for each such instruction,
if it is not marked, mark it and annotate it with stack and variable state
if it is marked, check that new stack and variable state match earlier annotation
Here, matching generally means identity:  if an instruction can be reached in two or more ways, the state of the stack must be the same regardless of the path.  This excludes some potentially safe JVM code sequences, but this restriction is imposed to make verification feasible.  [Two states can also match if they contain object types which are different but can be unified -- if they are subclasses of a common class.]

In addition, the verifier checks that each newly created object is initialized exactly once, and is not used until it has been initialized.

Note that, because references to other classes (method invocations and field references) must carry their own type information, it is possible to verify each class separately, without loading other classes to which it refers.  As a result, the loading of these other classes can be postponed until the methods are invoked.

Implementing a JVM:  Memory Allocation

The JVM interpreter has to allocate memory to hold various types of information during execution:  local variables and stacks for the active methods, and arrays and objects which are created during execution.

The Java virtual machine stack is used to hold the local variables and stacks for active methods.  When a method is invoked, a stack frame is created which has space for the local variables of that method (as determined by .limit locals) and the operand stack for that method (whose size is determined by .limit stack).  This stack frame is placed on top of the Java virtual machine stack;  when we return from the method, this stack frame is popped from the JVM stack.

Arrays and objects cannot be allocated on the stack because they may persist after the method which created them has exited.  They are allocated on the heap. Unlike some languages (such as C and C++), the user does not explicitly allocate and deallocate heap space.   However, if no provision is made for deallocating objects, the program can quickly run out of space, so most JVM implementations incorporate an automatic storage management system -- a garbage collector.  A garbage collector identifies objects which are no longer reachable and deallocates them.  This may be done by maintaining reference counts or by tracing paths in memory to see what objects are reachable.

If there are multiple threads, there will be one JVM stack for each thread.

For animated examples of JVM programs, see the following page prepared for Inside the Java 2 Virtual Machine by Bill Venners: