Advanced Database Systems

CSCI-GA.2434-001
Fall 2020
Tuesdays 5:00 - 7, Warren Weaver 109

Prof Dennis Shasha

Instructor: Dennis Shasha (shasha@cs.nyu.edu)
Zoom: Join at class time the following URL: https://nyu.zoom.us/j/96365781363

Except for the first class which had pre-recorded lectures, all lectures are live and recorded here. The classes will be live but also on zoom.

Office hours by zoom on Tuesdays 2 PM to 3 PM, no appointment necessary. Starts on September 8, 2020. https://nyu.zoom.us/j/95764190139
Recorded lectures will be on the classes page on the left side labeled Panopto.



Graders:
Nandhitha (nr2229@nyu.edu, point of contact for questions regarding mySQL and Reprozip, but first look at documents below) and
Nishchitha ( nhv215@nyu.edu, point of contact for questions regarding Aquery, but first look at documents below)


GOALS

To study the internals of database systems as an introduction to research and as a basis for rational performance tuning.

The study of internals will concern topics at the intersection of database system, operating system, and distributed computing research and development. Specific to databases is the support of the notion of transaction: a multi-step atomic unit of work that must appear to execute in isolation and in an all-or-nothing manner. The theory and practice of transaction processing is the problem of making this happen efficiently and reliably.

Tuning is the activity of making your database system run faster. The capable tuner must understand the internals and externals of a database system well enough to understand what could be affecting the performance of a database application. We will see that interactions between different levels of the system, e.g., index design and concurrency control, are extremely important, so will require a new optic on database management design as well as introduce new research issues. Our discussion of tuning will range from the hardware to conceptual design, touching on operating systems, transactional subcomponents, index selection, query reformulation, normalization decisions, and the comparative advantage of object-oriented database systems. This portion of the course will be heavily sprinkled with case studies from database tuning in biotech, telecommunications, and finance. Also, since the book that Philippe Bonnet and I have written has many tests associated with it, you will get the benefit of those tests.

Because I do a lot of work on ordered data (such as financial time series), we will explore databases that support ordered queries such as those found in finance and science and we will do compare two freely downloadable systems kdb and mysql.

Class materials

Course Videos

To see the videos of previous lectures: registered students should be able to go to newclasses.nyu.edu, click on "Advanced Database Systems", and then click on a Mediasite link on the left-hand side of the page. If you find the videos choppy, then (from the 2016 Teaching Assistant Nicholas Souris): "you can use a plugin for Firefox called flashgot to download the videos and I'm pretty sure there's similar plugins for other browsers as well. Though keep in mind that the downloads will take some time to finish and the files are pretty large."

Books

Here are some experiments having to do with database tuning .

Here is how to call C from K: Don Orth's description of how to call C from K.

Here are Alan Fekete's slides on snapshot isolation Snapshot Isolation and Fixes to It and the even better fixes (but in a paper) due to Michael Cahill, Uwe Roehm, and Alan Fekete.

Here is Joe Conron's nice paper on indexes (from when he was a master's student).

Some results from database tuning projects. Presented as rules of thumb. Here is one very nice tuning project by Yuhong Chen. Here is another by Ilya Finkelshteyn Here is a third by Marina Balina Here is a fourth by Pratik Daga.

Here are Alberto Lerner's excellent notes on performance monitoring. Here you can find his thesis.

Here are notes about materialized views in Oracle.

Here is a call to a new organization of databases by the Turing Award winner Jim Gray

Here are excellent notes on problems, systems, and algorithms having to do with social media queries by Sara Cohen. Typical questions: What does it mean to be a central personality in graph databases? What is the best way to find a group of people with the right interests who are likely to be compatible? Here are notes on random graphs.

Finally, here are the rules about academic honesty.