William Casey, Thomas Anantharaman and Bud Mishra
Abstract
The polymorphisms due to restriction fragment length variations in the genomes in a human population had been studied intensely in the past. In parallel, the development of novel single molecule approaches has made it possible to construct high-resolution multi-enzyme ordered restriction genome-wide maps. In particular, a powerful method, called "optical mapping," provides the possibility of making high-coverage accurate genome-wide maps of a population relatively quickly and inexpensively. Furthermore, as each polymorphic site (e.g., modeled by statistical variations in the location of a restriction site) is "covered" by large number of molecules from the two copies of the chromosomes and as neighboring sites are likely to be covered by a large fraction of these molecules, it is possible to detect each restriction fragment length polymorphism (RFLP) marker accurately using a Maximum Likelihood Estimator (MLE) algorithm or an Expectation Maximization (EM) algorithm and then "phase" these RFLP markers using sophisticated statistical algorithms. While it is only speculative as to how densely these RFLP markers are distributed and how long a typical "haplotype block" would be, the algorithms we develop have applications to other areas of genomics (e.g., placing probes along the genomes to study copy number fluctuations in cancer genomes, RH mapping, phasing single nucleotide polymorphisms (SNPs), etc.)
Our algorithm works in two phases: