Lecture 19

Database programming in Java

The goal of today's lecture is to look at java.sql package, which offers relational database
access through Java. Throughout the lecture we will develop a small application, which
will allow users to execute SQL queries.

We start with the quick refresher on relational databases. Relational database is essentially persistent collection of tables. We can roughly think of a table as class and
a row in a table as an object. 


Usually databases allow users to create new tables, populates them with data and then
query them in different ways. The databases that we will discuss today are called 
RDBMS (Relational Database Management Systems). The idea is data is distributed
among tables in a canonical way, to minimize duplication of data. The keys between tables are specified which allow users to select data from multiple tables by joining them together. For example, if we want to select Authors and Books from the above tables
we could write the following select

select Author.FirstName, Author.LastName, Book.Title 
  from Author, Book
 where Author.AuthorID = Book.AuthorID

The result will be 

Standard relational databases (Sybase, Oracle, Microsoft SQL server, Access, etc. ) support SQL (Standard Querying Language). Or at least suppose to support :). In reality, you will 
find out some peculiarities & quirks in everyone of them. The standard SQL language 
allows users to retrieve rows from multiple tables, according to given criteria.
So generally you would see statements like this:

 select  [COLUMNS] 
 from   [TABLES]
where  [CONDITIONS]

This is the general theme. So now the question is how can we issue database queries
from Java application? First of all we have an issue of connection. How do we connect
to relational database from Java? We need to know the url of the database and we need
to have a driver which is capable of talking to this database. As you've probably realized
by now, different database require different interface protocols. Fortunately sometime 
in the mid nineties people came up with the idea of ODBC (Open Database Connectivity). Essentially, ODBC offers a standard interface to most commercial relational databases. This means that when you need to issue a query it goes through the bridge which adapts it for particular database, then on the way back, it converts it to standard ODBC format
(there is, of course, an overhead of using the bridge).
Now the advantages are tremendous. You no longer need to worry about customizing 
your code to work with several databases. As long as it uses ODBC, and there is an ODBC driver for this particular database, you don't have to change a thing.
Java offers interface JDBC - Java Database Connectivity. The interface essentially 
works uses Driver to connect to a database. It is up to you to instantiate the driver
that you want to use. The simplest thing to do is to use the driver provided by Sun, called
jdbc.odbc.Driver. This is the driver which utilizes ODBC to connect to the database.
While this driver is generic its performance is inferior to the one of a native driver.
For example Oracle provides free Java driver which by passes ODBC & is much, much faster. 

So what are the actual classes & interfaces offered by Java sql package? First of all 
there is a Driver interface. You load specific driver by simply calling Class.forName 
with the class String of the driver you wish to use. Once the driver is loaded
it is registered with DriverManager which allows us to request connection to a given url.

where getConnection relies on presence of particular driver and getDriver tries to find a driver which knows how to handle given url. What kind of url we are talking about?
It can be an ODBC source url like 

  jdbc:odbc:books

Which simply means ODBC database source local to this particular host. Or we could have

jdbc:oracle:thin:@spunky.cs.nyu.edu:660:book

Which is using oracle thin driver instead. The general form looks like this

source[@host:port:catalog]
 

Now once you find appropriate driver, you may request connection, by calling 
getConnection with desired url, user and password. This essentially logs you into 
the database. The Connection object that you get, looks like this


First of all close method does exactly what you would expect - terminates the connection
(it gracefully disposes all the pending results & acquired resources).
Commit and rollback go hand in hand together. Most contemporary databases 
offer transactions - a way to make a sequence of operations atomic. User may 
start transaction, then perform several database operations and then either commit 
or rollback, based on how things went. Only when user says commit, then data gets stored permanently. Should the error occur and user will decide not to save, rollback 
will guaranty that all the changes will be reversed. If a connection is in auto commit 
mode, then all its SQL statements will be executed and committed as individual
transactions. Otherwise, its SQL statements are grouped into transactions that are 
terminated by either commit() or rollback(). By default, new connections are in 
auto commit mode. setCatalog is simply allows user to choose which catalog in the
database is current. This is necessary because most databases consist of multiple catalogs. 
createStatement returns you a reference to a Statement object which is ready to 
execute a query. The statement looks like this

The simplest thing that we can do is to execute a statement like this
 execute( "select Author.FirstName, Author.LastName, Book.Title " +
    " from Author, Book " +
   "  where Author.AuthorID = Book.AuthorID " );
This means that we are executing a statement which can potentially return multiple 
result sets. On the other hand, executeQuery will always return one result set. 
executeUpdate should be used for insert, update and delete statements (statements 
which may change data in tables). The iteration over the results is done like this:

while ( statement.getMoreResults() ) {
   rs = statement.getResultSet(); 
   // Do whatever
}

This means that as long as there are different result sets we will be in this loop 
fetching them. This brings us to result set, which is essentially a table of results that
we just got for our query. The ResultSet interface contains the following

So the result set allows users to fetch the rows coming from the query. Notice that it
doesn't contain the information about columns. This information is stored in 
ResultSetMetaData, which can be obtained by calling getMetaData()

Notice that we can obtain all the information that we need about columns through 
this interface. Here is an example:

ResultSetMetaData md = rs.getResultSetMetaData();
for ( int i = 0; i < md.getColumnCount(); i++ ) {
  System.out.println( "Col[" + i + "] : name = " + 
                   md.getColumnName( i ) +
                   " type = " + md.getColumnType( i ) );
}

The column type is an integer which corresponds to one of the types defined in
utility class java.sql.Types

User can loop through the result set by calling next() method. What this does is
sets the row pointer to the next row and returns true if there are more rows. 
Thus, we can traverse all results like this:

while ( rs.next() ) {
  // do whatever
}

Then, we can access data in any of the columns by calling one of the get methods found
in ResultSet with specific column index. The most general method is getObject. 
It does its best in matching supplied column type with Java type. The following 
code loops though the result set and builds a buffer out of it

StringBuffer sb = new StringBuffer( 1024 );
// Loop through the rows
ResultSetMetaData md = rs.getResultSetMetaData();
while ( rs.next() ) {
  // Loop though columns
  for ( int i = 0; i < md.getColumnCount(); i++ ) {
    sb.append( rs.getObject( i ) + ";" );
  }
  sb.append( "\n" );
}

So you see how this works. Okay, now with this in mind, lets develop small 
application which allows user to connect to a database and execute queries. 
For this, we will design generic JDBCDatabase class. We start by noting that
establishing database connection can be quite time consuming. If possible we 
would like to obtain and reuse certain number of connections. How do we do 
this? We utilize GuranteedObjectPool. First time when we need connection, we
reserve N connections instead of one and place them into GuranteedObjectPool. 
Then we just acquire and release them as necessary. 
(The number of connections should not be too large. Many databases are 
configured to have limited number of sessions, like 15-30, and we don't want
to acquire them all). 
Another issue that we need to address is the ResultSet. The problem is that we 
can't really keep the ResultSet around because once we loop though we can't 
get the data back.
One way of solving the problem is to transfer the data into another data structure
(some kind of table). Another way is not to deal with it in the database all together,
instead, the database can have a listener, which decided what to do with the data. 
Here is the simple IQueryListener interface


You see that listener gets notified when meta data is available and whenever next 
value is fetched. What listener does with the data, is not one of the database concerns.

Lets look at the basic things that we want our database to have

When do we open connections? We can do this whenever the first query arrives


Okay, now comes the code for executing queries

This is pretty much it. One more very important thing is to provide finalize in database 
class. The reason for this is that we need to close all connections:

We can now test our application. For this we will create simple Query Listener which
just prints to standard output.

And here is the simple driver that prompts user for a query


 

Two more very useful interfaces offered by java.sql package are PreparedStatement
and CallableStatement. PreparedStatement is useful in the situations when you want to 
execute the same query many times in a row. For example, if you are doing bulk inserts.
When you create PreparedStatement, JDBC preprocess it and saves it, thus making its 
execution faster. In addition, if the database supports such feature, the statement may get 
compiled first time around and then just reused. You can have only one PreparedStatement 
per Connection. CallableStatement is representation of stored procedure. Stored 
procedures in many databases are just compiled sql code, which is stored in the database.
The executions of stored procedure are better because database doesn't need to
recompile it every time. So in java.sql, CallableStatment represents a way to bind 
arguments to a stored procedure and execute it. 

Finally, the package has a class called DatabaseMetaData, which contains a lot of
information about the database. Here is just a beginning of a long java doc


 
 

We conclude with partial UML diagram of java.sql package


Please read

JDBC tutorial @ sun