Algorithmic Approaches to Biological Sequence Analysis

Eran Halperin
University of California, Berkeley

Monday, May 5, 2003
11:00 a.m.
Room 1302 WWH
251 Mercer Street
New York, NY 10012-1185

The availability of the human genome provides an unprecedented insight into biology and medicine. Due to the tremendous amount of genomic data, computer science methods to analyze this data are becoming tremendously important. Computational methods in biology are especially needed when handling large amounts of data, overcoming limitations of current technologies using mathematical models, and automation of the underlying processes.

In this talk I will briefly present two applications, aiming at the above goals. The first problem addresses human variation and DNA sequencing techniques. Contemporary standard DNA sequencing methods only provide partial information on the sequence at hand, called the genotype. I will present an algorithm that infers the full information of a sequenced DNA (the haplotypes), given the genotypes of a population. The algorithm is available for public use on a web server, and is used in biological research. The second application is a new algorithm for finding similar proteins in protein databases using powerful techniques from approximation algorithms theory. Both algorithms are shown to be biologically meaningful and efficient in practice.