Dennis Shasha's Research Summary


Goals

I work on quite a few different projects -- most having to do with large data and pattern matching or machine learning. Area interests include computational biology and biomedicine (data analysis, visualization, experimental design), time series (fast algorithms for fundamental problems such as correlation and burst detection as well as applications like query by humming), and pattern matching in trees and graphs. A general pattern? I like puzzles. The other general pattern is that I program a lot in a fast, extremely exressive language called K. The third pattern is that I work with excellent people -- undergraduates, master's students, doctoral students, post-docs, other profs, and non-academics. If you have skill, energy, and initiative and if you like what you see below, then drop me a line. I might have a project for you.


Biological Computing

I've worked on several projects over the last few years. The first five projects below are new:

Pattern Recognition

Tamper-resistant file systems

With David Mazieres, I have been working on a network file system that supports the following scenario (among others): a group of people work together in a distributed fashion but trust neither one another nor their system administrator. For example, they might outsource their system administration to an organization that they don't necessarily trust. Historically, that organization could do subtle changes to their data, read their data, and so on. The system we have designed makes makes tampering changes quickly detectable. (Secrecy can be achieved with straightforward cryptographic techniques.) The only assumption we make is that each client has a secret signature key. Please see our PODC paper here . Some promising performance results due to the great efforts of David, Jinyuan Li, and Maxwell Krohn appear in OSDI.

AQuery: a database system for querying ordered data

An order-dependent query is one whose result (interpreted as a multi-set) changes if the order of the input records is changed. In a stock-quotes database, for instance, retrieving all quotes concerning a given stock for a given day does not depend on order, because the collection of quotes does not depend on order. By contrast, finding the five price moving average in a trade table gives a result that depends on the order of the table. Query languages based on the relational data model can handle order-dependent queries only through add-ons. SQL:1999, for example, permits the use of a data ordering mechanism called a ``window'' in limited parts of a query. As a result, order-dependent queries become difficult to write in those languages and optimization techniques for these features, applied as pre- or post-enumerating phases, are generally crude. The goal of our work is to show that when order is a property of the underlying data model and algebra, writing order-dependent queries in a language can be natural as is their optimization. We introduce AQuery, an SQL-like query language and algebra t has from-the-ground-up support for order. We also present a framework for optimization of the order-dependent queries categories it expresses. The framework is able to take advantage of the large body of query transformations on relational systems while incorporating new ones described here. We show by experiment that the resulting system is orders of magnitude faster than current SQL:1999 systems on many natural order-dependen You can see our paper here. You can see a power point presentation here. Joint work with Alberto Lerner. We are looking for Beta users.

AJAX: a data cleaning system

Le Subscribe: a publish-subscribe system

Fault Tolerant Parallel Programming

Database Internals Work

Database Tuning and Wall Street

Real-Time Scheduling

Thinksheet

Dave Tanzer's thesis is on efficient backwards reasoning in the thinksheet context. The work has been completed.

Recently, we (Stacey Kuznetsov and I) have designed a new system called Stratpal. Stratpal is a simplified thinksheet that can be used to model laws and strategies. Its key features is that given a linear document, it is easy to create a StratPal application that can be improved incrementally over time.

For Kids

Please see the earthtime project not yet functional

To learn arithmetic and elementary algebra in a fun way. Look at superply

Fun Stuff


Last modified February 2006.