Computational Approaches for the DNA Sequencing Data Deluge
Speaker: Ben Langmead, Johns Hopkins Bloomberg School of Public Health
Location: Warren Weaver Hall 1302
Date: April 13, 2012, 11:30 a.m.
Host: Denis Zorin
Second-generation DNA sequencers are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data in about a week for a few thousand dollars. Consequently, sequencing has become a common tool in many fields of life science. But with these developments comes a problem: growth in per-sequencer throughput is drastically outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power.
Along these lines, I will discuss a new set of methods for very time- and space-efficient alignment of sequencing reads: Bowtie and Bowtie 2. These tools build on the insight that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome. I will also discuss two tools, Crossbow and Myrna, that take Bowtie, scale it up to run on large clusters (potentially hosted by a cloud vendor), and apply it to specific problems in life science.
Ben Langmead is a Research Associate in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He completed his Ph.D. in Computer Science in February 2012 at University of Maryland, advised by Steven L. Salzberg. His research addresses problems at the intersection of computer science and genomics, and he is the author of several open source software tools for analysis of high-throughput genomics data, including Bowtie, Bowtie 2, Crossbow and Myrna. His paper describing Bowtie won the Genome Biology award for outstanding paper published in 2009. At Johns Hopkins, he collaborates with biostatisticians, biomedical engineers, biologists, and other computer scientists to develop methods for analyzing second-generation DNA sequencing data.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.