FinTime --- a benchmark for financial databases

Kaippallimalil J. Jacob
Morgan Stanley Dean Witter & Co., New York (kjacob@acm.org)

Dennis Shasha
Prof, Computer Science Dept Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
Dennis Shasha's home page
(shasha@cs.nyu.edu)

Why is Finance Special?

How many times have you seen projects start using a relational engine only to find that they take more and more of their functionality out of the engine for performance purposes? The result is a nightmare of maintenance, but it does the job well.

One way we in the financial industry can do better is to demand that the database vendors give us products that give us relevant functionality. The Transaction Processing Council (www.tpc.org) benchmarks are helpful, but they essentially ignore time series and other forms of ordered data, a critical component of financial database systems.

The two of us, an academic interested in time series databases and a practitioner who has built many database systems for finance, have created a financial time series benchmark in order to challenge vendors to produce products we won't have to implement around

FinTime (http://cs.nyu.edu/cs/faculty/shasha/fintime.html) is a set of data and queries that reflects the needs of financial analysts who are studying patterns in stock market data, but it should apply to any time-dependent financial instruments.

Unlike some other benchmarks, this one makes no requirement that all queries be expressed in a given language (e.g. SQL 2000). If a vendor has a query language, that's good enough. It's up to the vendor's customers to decide on syntactic and semantic elegance.

FinTime has evolved from a tutorial on time series databases given by Shasha during VLDB 98 (see his web page) and reflects Jacob and Shasha's best understanding of typical data analysis queries issued by users. Since many vendors have products that handle ordered data, such a benchmark can help would-be customers to evaluate them.

Benchmark Description and Generation

The models suggested in FinTime reflect two frequently occurring cases in the financial industry, namely, a historical market data system (decision support) and real-time price tick database (on-line transaction processing). FinTime also suggests and defines metrics that capture three useful dimensions of any time-series system, namely, performance in a single-user mode, performance in a multi-user mode and the price to performance ratio.

 

Models for a Time-series Benchmark

Before deciding on a model, we have to examine the different parameters that determine a model for time-series system. The most important parameters that influence a time-series database system are:

  1. Periodicity of data (Regular/irregular)
  2. Density of data (Dense/Sparse)
  3. Schedule of updates (periodic, continuous)
  4. Types of queries (Simple/Complex)
  5. Time interval between queries (Ad hoc/Batch)
  6. Number of concurrent users (Few/Many)

Combinations of these factors will give rise to 64 possible models but for simplicity we can focus on the following commonly occurring cases in the financial industry

Model 1: Historical market Information

 

Attribute

Specification

Periodicity of Data

Periodic

Density of Data

Dense

Schedule of updates

Periodic updates (e.g. at the end of a business day)

Complexity of queries

Complex (e.g. Decision support queries)

Nature of query arrival

Batch

Concurrency of users

Low (e.g. Few concurrent users)

 

Model 2: Tick databases for financial instruments

 

Attribute

Specification

Periodicity of Data

Non-periodic

Density of Data

Sparse to moderately dense

Schedule of updates

Continuous

Complexity of queries

Simple

Nature of query arrival

Ad hoc

Concurrency of users

High (e.g. Many concurrent users)

 

Let us now discuss the characteristics of the two models in some detail (Datatypes used in the models are explained in greater details in the glossary)

Model 1: Historical Market Information

Historical Market data systems are closely related to decision support systems of the traditional relational database world. The various elements of this model are: