New York University

Computer Science Department

Courant Institute of Mathematical Sciences

Session 13: Persistence in EJB Frameworks

Course Title: Extreme Java Course Number: g22.3033-007

Instructor: Jean-Claude Franchitti Session: 13

Persistence in Enterprise JavaBeans is encapsulated in the notion of EntityBeans. This handout describes bean- and container-managed EntityBean persistence and the relative merit of these techniques with respect to portability, productivity and performance.

Sun Microsystems produced the Enterprise JavaBeans specification (version 1.0) in March 1998. According to its cover page, the goal of the specification was to provide a "component architecture for the development and deployment of object-oriented distributed, enterprise-level applications." Two important conceptual components were described by this architecture: "EnterpriseBeans" and their "container." EnterpriseBeans are of two varieties: SessionBeans and EntityBeans. SessionBeans don't have persistent state; EntityBeans do. In other words, you can't expect the state of SessionBeans to outlive their process. EntityBeans on the other hand have some or all of their state persist between incarnations.

EJB Architecture Overview

To discuss EntityBeans and their persistence, we first have to outline the components of the EJB architecture that will be important to our discussion. This section won't attempt to describe all the components, only the relevant ones. From a bean implementor's point of view, the first decision is whether the bean is going to be a SessionBean or an Entity-Bean. Both can have business logic but, generally speaking, SessionBeans are unshared servants to a client and have no persistent state. EntityBeans are shared among clients and do have persistent state.

For every bean there is a set of other interfaces that must be in place for the bean to be accessible to clients. Each bean must have a mandatory remote object interface as well as a Home interface (think factory). The EJBObject-derived interface is a remote interface that typically delegates calls to the bean; the EJBHome-derived interface contains bean lifecycle and management functionality. The home interface is also remote.

You're probably asking yourself, Where does this so-called "container" fit in? Well, from a bean-provider perspective, everything that facilitates the life cycle of the bean and the dispatch of a remote client method invocation to your bean is "the container." For the purposes of this handout, this includes the EJBObject- and EJBHome-derived remote interfaces (which are application- dependent) as well as services like the implementation of JTS and JNDI (which are application-independent) that your bean can depend on. Some containers may also support container-managed persistence. This is an API that an EntityBean developer can use to delegate the persis-tence of the bean to instead of writing all of the persistence himself (by using JDBC or JSQL).

Figure 1 shows the relationship of an EntityBean (with container-managed persistence), its container and the client. In this figure the EmployeeBean is the implementation of all the logic for the bean. Employee is just a remote object interface declaration and the same is true for EmployeeHome. These three elements along with a DeploymentDescriptor (described later) are supplied by the bean provider and are application-dependent. Tools supplied by the container provider will typically generate the implementations of the remote interfaces.

These application-dependent components wrap the bean for the container. We say they "wrap the bean for the container" because they're mandatory and you can access the bean only through them.

Application-independent components are also part of the container, of course. Figure 1 shows that developers can take advantage of the application-independent APIs, that is, JTS, JNDI and the container-managed persistence (CMP) API. Notice that in this figure, which depicts container- managed persistence, the container can add connection pooling and transactional caching as well as dynamic schema management. These capabilities provide a lot of transparent functionality to the bean with container-managed persistence.

Figure 2 is very similar except that the bean provides its own persistence. Most of the rest of the container is intact, but the ability to provide caching, dynamic schema management and connection pooling transparently is lost. So in this figure the bean is fatter because to write these capabilities on top of JDBC would be a tremendous burden on productivity. Plus your bean's code is just plain fatter.

Since this handout is mainly about Entity-Beans and how their persistence is managed, we're going to skip the discussion of SessionBeans except to reiterate that they're unshared and typically don't have persistent state. They may, however, have transactional, conversational state. SessionBeans will typically contain most of your session-based (unshared between clients) business logic. The good news is that they will typically make use of other EnterpriseBeans through remote-object interfaces. This leads to a highly scalable federated archictecture.

You're probably asking yourself, Where does this so-called "container" fit in? Well, from a bean-provider perspective, everything that facilitates the life cycle of the bean and the dispatch of a remote client method invocation to your bean is "the container." For the purposes of this handout, this includes the EJBObject- and EJBHome-derived remote interfaces (which are application- dependent) as well as services like the implementation of JTS and JNDI (which are application-independent) that your bean can depend on. Some containers may also support container-managed persistence. This is an API that an EntityBean developer can use to delegate the persistence of the bean to instead of writing all of the persistence himself (by using JDBC or JSQL).

The Development/Deployment Process

As you can see, there is a substantial amount of code needed to make an EJB application server of any complexity. You may wonder whether you need to write all that code. Well, the development process that you will follow will be highly dependent on the tools supplied by your EJB application server and development tool vendor (which may be the same). A large part of the code required is the glue that plugs your bean into the container of your EJB app server vendor. The fine folks who wrote the EJB specification provided a nifty item called the DeploymentDescriptor. This serialized class is provided for each bean and contains enough meta data for tools to generate the framework to plug the bean into an EJB container. This would include the EJBHome, derived class, the remote object interface and other hooks to manage the bean.

Contained in a bean's DeploymentDescriptor is a ControlDescriptor for each method. The descriptors contain meta information about the bean and its methods with respect to their behavior in several dimensions (e.g., transactional, security).

With the state of EJB implementations today, there are basically three approaches to EJB development: (1) hand-code everything, (2) hand-code the bean and the deployment descriptor and (3) generate most of the bean and the deployment descriptor from a graphical environment. Let's look at these in a little more detail.

Hand-Code Everything

This means that the EJB app server doesn't support any autogeneration tools but does provide the API for its container. In this case you'll be expected to write implementations for the beans, their remote object interfaces, their home interfaces and any other hooks required by the vendor's API.

Hand-Code the Bean and Descriptor

EJB was written with this in mind. By producing a EJB-jar file containing compiled code for beans, their remote object and home interface declarations, and their associated serialized DeploymentDescriptors, a tool could be used to generate the rest of the infrastructure. In this manner the container's API doesn't necessarily need to be exposed to the programmer. Rather, the code generator takes care of this mapping. Figure 3 illustrates this approach.

Graphical Bean and Descriptor Generator

For ease of use, an EJB-jar file generation from a graphical environment can be employed. In this development environment you simply create an object model of the beans and their relationships. The graphical environment takes care of generating the .jar file containing the beans and their descriptors. Tools like PowerTier for EJB from Persistence Software, Inc., take this graphical approach and not only generate the container framework but also an RDBMS-independent object-relational mapping for EntityBeans. Figure 4 illustrates this approach.

The bottom line with respect to generation is that once you have the EJB-jar file container-specific tools can then generate the framework for adapting beans to their container.

EntityBeans: The Persistence in EJB

Given an understanding of how Entity-Beans fit into the EJB architecture, the choice that a bean developer has to make is between bean-managed and container-managed persistence. How do you choose between them? Well, let's look at the relative merits of the approaches with respect to productivity, performance and portability. In a sense, the choice for developers is a choice between fat beans, where the bean is smart about data management, and fat servers, where the EJB container is smart about data management.

Container-Managed Persistence

For container-managed persistence the EJB container automatically implements the object-relational mapping services for the bean. The EJB container uses additional meta information in the deployment descriptor to determine how to implement the object-relational mapping for the bean. This makes the bean itself "thin" because the bean only needs to contain the custom business logic added by the developer. Figure 1 shows this notion of "thin" beans and a "fat" container.

The main advantages of container-managed persistence are performance, runtime/ development-time flexibility and productivity.

EJB containers with container-managed persistence can provide data management services that are unavailable with bean-managed persistence, including:

· Object-relational mapping: automates the task of mapping complex beans - including inheritance, aggregation and association - to relational tables.

· Shared transactional caching: many clients can share access to beans in the EJB container with the same transactional integrity provided by a database.

· Object state management: container-managed persistence can keep track of which objects have been changed within a transaction. In bean-managed persistence it's up to the developer to manage this, usually by setting a dirty flag in each object.

· Object management optimizations: the container can support joint database queries in the database that return networks of heterogeneous beans and enable navigation between bean classes in the container cache.

· Data management optimizations: the container can defer write operations in a transaction until commit, then batch multiple database operations into one call.

· Connection management: the container can transparently manage database connections and implement native database operations for optimal performance.

· Container-based agents: the container can monitor events within the server and notify clients when specific events occur.

To develop a class with container-managed persistence, the developer performs the following steps (this example describes the Persistence Software, Inc., PowerTier for EJB container):

1. Define the object model and object-relational mapping using a CASE tool.

2. Use the EJB container tools to load the modeling information from the CASE tool and generate beans whose deployment descriptors and implementations allow them to use container-managed persistence.

3. Add custom code to the bean class (code insertion points protect the custom code when the bean is regenerated).

The generated beans implement all required EJB methods but add convenient query methods and methods for complex relationship management. The container provides transparent connection management, exception handling, integrity and transactions.

In addition to enabling rapid application development, container-managed persistence also makes enterprise bean classes independent from the database schema. Thus beans that use container-managed persistence are more flexible and are portable across data schemas. Changes to the object model or database schema are handled by the container's object-relational mapping, providing a high degree of flexibility and support for iterative development. Finally, containers whose implementation of Container Managed Persistence includes shared and transactional caching can provide extreme performance and scalability advantages because they can manage concurrent transactional access to the bean from multiple clients.

The main disadvantage is that they aren't as portable as bean-managed persistence beans with respect to the current EJB specification. This is mainly because EJB is underspecified in the area of a portable container API for container-managed persistence.

Bean-Managed Persistence

For bean-managed persistence the developer writes database access calls using JDBC and SQL directly in the methods of the bean. This makes the bean itself "fat," requiring hundreds of lines of developer code to implement each bean. On the other hand, this makes the EJB container relatively "thin," giving it a small footprint and high portability. Figure 2 showed this notion of the "fat" bean with a "thin" container.

The main advantage of bean-managed persistence is its portability. A bean managing its own persistence using JDBC or JSQL is very portable, especially with respect to the current state of the EJB specification.

The main disadvantages of the bean-managed persistence approach is that writing the object-relational mapping by hand can be time consuming and requires extensive knowledge of SQL. For example, the developer must hand-code database access calls that implement the object-relational mapping in the enterprise bean callback methods (ejbCreate(), ejbLoad(), ejb-Store(), etc.).

Another severe limitation of this approach is that it hardwires the mapping to a particular database schema into the guts of the bean. This means that any change to the object model or the underlying database schema can require extensive changes to the code for a particular bean, limiting the flexibility and reusability of beans built using bean-managed persistence.

To develop a class with bean-managed persistence, the developer must perform the following steps:

1. Define the object-relational mapping for the bean.

2. Implement required EJB class methods. Using the JDBC API and SQL, write methods to perform ejbCreate(), ejbRemove(), ejbLoad(),ejbFind<>() and ejbStore(). This requires a detailed knowledge of SQL and the underlying database schema.

3. Implement accessor and mutator methods. For each attribute or relationship write methods to manipulate the bean value. Relationship access code to manage many-to-one relations and delete constraints can be very complex.

4. Manage database connections. Each method must manage its own database connections by interacting with the database connection pool.

5. Handle exceptions. Each method must handle both Java and database exceptions appropriately.

6. Manage bean integrity. Any method that changes the state of the class must set a flag to mark the bean as modified so the changes will be sent to the database.

7. Ensure transaction integrity. Collectively, all the objects in any transaction must be able to "roll back" their state if a transaction fails.

8. Add custom code to the bean class.

Even if the SQL mapping were generated for you, the gains in performance and flexibility would not be as great as container-managed persistence unless you wrote all of the outlined features yourself…and then you've thrown out productivity.

Finally, because you typically won't find shared and transactional cache management in Bean Managed Persistence, it will have extremely poor performance and scalability since multiple clients will block each other for the entire scope of their transactions. This can be costly even in the case of implicit transactions.

Conclusions

While bean-managed persistence provides the most portability, it will generally fall short with respect to productivity.

It should be clear at this point that container- managed persistence in EntityBeans provides the most productivity. This method will also typically provide the highest degree of performance since, together, the code generation tools and the bean programmer can take advantage of container- specific mapping and caching technology. However, container-managed persistence currently falls short in the area of portability.

Below are some guidelines for making decisions with respect to which kind of persistence management to use.

When to Use Container-Managed Persistence

Developers who will benefit most from container-managed persistence are those requiring rapid application development, high reliability and scalable performance in their deployed application.

· Rapid application development: productivity is the number one reason for choosing container-managed persistence. For simple beans, bean-managed persistence can require 25 lines of hand-written code for every line of hand-written code required by container-managed persistence. For more complex beans the ratio is even worse.

· Reliability: the markedly smaller amount of code the developer has to write using container-managed persistence guarantees fewer defects and higher reliability.

· Performance: container-managed persistence allows use of shared transactional caching, deferred database operations and native database implementation to optimize performance.

The kinds of applications that benefit most from container-managed persistence include:

· Data intensive applications: these applications tend to have larger and more complex object-relational mappings.

· OLTP applications: these often require good database performance, robust scalability and rock-solid data integrity.

· Application integration: which must map information from several different data sources.

When to Use Bean-Managed Persistence

The primary reason for using bean-managed persistence today would be to improve portability for beans across servers from multiple vendors. One known omission in the Enterprise JavaBean 1.0 specification is that it did not define a standard API for EJB containers to ensure portability among containers from different vendors.

This omission was corrected in the Enterprise JavaBean 2.0 specification, as JavaSoft made clear that full support for entity beans is mandatory in EJB 2.0. Even with bean-managed persistence, however, the number of differences between EJB 1.x servers in areas such as security and transaction management are likely to make full portability difficult in the next 12 to 18 months.

Future Directions in Portability and Scalability

Container-managed persistence will be made portable through standard deployment descriptor formats that can help define the schema as well as standard meta-data- driven internal container APIs for transparent, portable, container-managed persistence. This can be done in a fashion similar to the OMG's Persistent State Service specification.

Speaking of the OMG, CORBA integration is essential for any EJB app server to be scalable to the enterprise. IIOP is the perfect unifying protocol to support EJB applications and is necessary to overcome some of the limitations of JRMP, the most serious of which are bugs in distributed garbage collection (IIOP would do away with DGC) and service context propagation (without this, it is difficult to implement JTS and other services).