Title: DNA Hash Pooling and its Applications

(NYU-CS-TR901)

Authors: Dennis Shasha and Martyn Amos

Abstract:
In this paper we describe a new technique for the characterisation of
populations of
DNA strands. Such tools are vital to the study of ecological systems, at both
the micro (e.g., individual humans) and macro (e.g., lakes) scales. Existing
methods make extensive use of DNA sequencing and cloning, which can prove
costly and time consuming. The overall objective is to address questions such
as:
(i) (Genome detection) Is a known genome sequence present at
least in part in an environmental sample?
(ii) (Sequence query) Is a specific fragment sequence
present in a sample?
(iii) (Similarity Discovery) How similar in terms of
sequence content are two unsequenced samples?

We propose a method involving
multiple filtering criteria that result in pools" of DNA of high or
very high purity.
Because our method is similar in spirit to hashing in computer
science, we call the method {\it DNA hash pooling}.
To illustrate this method, we describe examples using
pairs of restriction enzymes.
The {\it in silico} empirical results we present
reflect a sensitivity to
experimental error.
The method requires minimal DNA sequencing and, when sequencing
is required, little or no cloning.