Title: Fast and Cheap Genome wide Haplotype Construction via Optical Mapping


(NYU-CS-TR852)

Authors: Thomas Anantharaman, Venkatesh Mysore, and Bud Mishra 



Abstract:

We describe an efficient algorithm to construct genome wide haplotype restriction maps of an individual 
by aligning single molecule DNA fragments collected with Optical Mapping technology. Using this algorithm 
and small amount of genomic material, we can construct the parental haplotypes for each diploid chromosome 
for any individual, one from the father and the other from the mother. Since such haplotype maps reveal 
the polymorphisms due to single nucleotide differences (SNPs) and small insertions and deletions (RFLPs), 
they are useful in association studies, studies involving genomic instabilities in cancer, and genetics. 
For instance, such haplotype restriction maps of individuals in a population can be used in association
studies to locate genes responsible for genetics diseases with relatively low cost and high throughput. 
If the underlying problem is formulated as a combinatorial optimization problem, it can be shown to be 
NP-complete (a special case of K-population problem). But by effectively exploiting the structure of the 
underlying error processes and using a novel analog of the Baum-Welch algorithm for HMM models, we devise a 
probabilistic algorithm with a time complexity that is linear in the number of markers. The algorithms were 
tested by constructing the first genome wide haplotype restriction map of the microbe T. Pseudoana, as well 
as constructing a haplotype restriction map of a 120 Megabase region of Human chromosome 4. The frequency of 
false positives and false negatives was estimated using simulated data. The empirical results were found very 
promising.