Advanced Database Systems
Tuesdays 5:00 - 7, Warren Weaver 101.
Prof Dennis Shasha
Instructor: Dennis Shasha (firstname.lastname@example.org)
Office hours: Tuesdays after class
Listserv for email announcements:
Rasika Sanjay Jangle and Sonali Malik
To study the internals of database systems
as an introduction to research and as a basis
for rational performance tuning.
The study of internals will concern topics at the intersection
of database system, operating system, and distributed computing
research and development.
Specific to databases is the support of the notion of transaction:
a multi-step atomic unit of work that must appear to execute
in isolation and in an all-or-nothing manner.
The theory and practice of transaction processing is the problem
of making this happen efficiently and reliably.
Tuning is the activity of making your database system run faster.
The capable tuner must understand the internals and externals
of a database system well enough to understand what could be
affecting the performance of a database application.
We will see that interactions between different levels of the system, e.g.,
index design and concurrency control, are extremely important,
so will require a new optic on database management design
as well as introduce new research issues.
Our discussion of tuning will range from the hardware to conceptual
design, touching on operating systems, transactional subcomponents,
index selection, query reformulation, normalization decisions,
and the comparative advantage of object-oriented database systems.
This portion of the course will be heavily sprinkled with case
studies from database tuning in biotech, telecommunications,
Also, since the book that Philippe Bonnet and I have written
has many tests associated with it,
you will get the benefit of those tests.
Because I do a lot of work on ordered data (such as financial time
series), we will explore
databases that support ordered queries such as those found in finance
and science and we will do compare two freely downloadable
systems kdb and mysql.
Here is the syllabus in
Please find homework 1 here.
Please find homework 2 here.
This one includes a tuning practicum so give yourself
You may want to use q for that tuning practicum
download the free version of q for that purpose.
some basic notes about q.
an introduction to the arrable model in q
(everything you need to know for the homework).
Homework 2 also includes a required second part, called the supplement.
Please find that
Because reproducibility is an important discipline to acquire,
find notes on Reprozip.
are notes on using bit bucket.
frequently asked question write-up
about the replicated concurrency control project.
Also, here is a set of
for that project.
There are two principal sets of class notes:
transaction processing in pdf
However there are ancillary lecture notes.
Here is a lecture on
concurrent search structure algorithms.
find a brief lecture traitorous failures
from the paper
"Easy Impossibility Proofs for Distributed Consensus Problems"
by Fischer, M. and Lynch, N. and Merritt, M.
is a relevant picture.
Network partitions alone can lead to problems
in these notes that are derived from Lynch, Fisher, and Patterson
A related topic is the so-called
which states that achieving consistency (up-to-date information is always
available), availability (if some site is up, it will have the most up-to-date
information), and partition tolerance (this all works even if the network
is broken) cannot all be done simultaneously.
Jake Loveless's notes on
high frequency trading system
One lecture will be on
array query languages.
The system that aquery is most similar to
a nice way of scaling data.
In fact, here is an
argument for why it should be used for Big Data .
Here are Alberto Lerner's lecture notes
partitioning in large data systems
accompanying prose notes
Here is a nice
talk about Not Only SQL databases (nosql).
along with my
notes on that talk.
Here is a link to
a course on NoSQL by Shahram Ghandeharizadeh.
Here is a
taxonomy of NoSQL systems.
Here is one of many
Here are Alberto Lerner's notes on
parallelizing locking, logging, accessing and buffer management
Here is an article about some
successes and failures of big data analysis.
Here is an article about a very
fast transactional system called VoltDB
Here is a
discussion about the merits of eventual consistency vs. transactions.
Here is a
tuning case study from a travel application.
Here is a description of
Here is a lecture about a
database-inspired language called pig.
Here is a
on PC, Linux, and Mac.
Eric Hielscher's notes on running mysql
Another lecture will be on
an approach to allow database outsourcing
Here is an approximation to an n-server
We will discuss information gain, decision trees and clustering,
perhaps among other topics.
Here is a web site of tutorials on
data mining by Andrew Moore
Here is a lecture on
by Andrew Moore and colleagues.
Here is a lecture
on time series analysis
by my colleagues and me.
is a cartoon concerning a related puzzle about sketches.
Here are some notes about
We will also discuss, if we have time
several related topics found at the end of these lecture notes
(i) high performance replicated
database systems without the use of two phase commit, (ii) case studies from
Wall Street, (iii) a self-tuning technique that improves on LRU,
and (iv) a data structure for data warehouses.
Here is a nice
poster showing the genealogy of relational databases.
Here are Sonali Malik's and Rasika Jangle's excellent notes on
git and bitbucket.
Concurrency Control and Recovery in Database Systems
by Bernstein, Hadzilacos, and Goodman, Addison-Wesley, 1987.
Here is a local copy in zip format.
Your reading in this book should focus on the topics we hit in lecture.
Note that the available copies algorithm I present is a slightly different
version from tha presented in the book. Please use mine for the project.
Database Tuning: principles, experiments, and troubleshooting
by Dennis Shasha and Philippe Bonnet 2002
Morgan Kaufmann Publishers; ISBN: 1558607536
We will go through essentially this whole book.
Additional books can be found in the syllabus.
Here are some
experiments having to do with database tuning
Here is how to call C from K:
Don Orth's description of how
to call C from K.
Here are Alan Fekete's slides on snapshot isolation
Snapshot Isolation and Fixes to It
and the even better fixes (but in a paper)
due to Michael Cahill, Uwe Roehm, and Alan Fekete.
Here is Joe Conron's nice paper on indexes
(from when he was a master's student).
Some results from database tuning projects.
Presented as rules of thumb.
Here is one very nice tuning project by
Here is another by
Here is a third by
Here is a fourth by
Here are Alberto Lerner's excellent notes on performance monitoring.
Here you can find
Here are notes about
materialized views in Oracle.
Here is a call to a new organization of
databases by the Turing Award winner Jim Gray
Here are two very nice and very practical papers about
data warehousing by one of the foremost practitioners in the field:
Ralph Kimball on constructing an
enterprise data warehouse
Ralph Kimball on why to use star schemas in dimensional modeling.
Finally, here are the rules about