French fries, well done hamburgers, a glass of wine, your cell phone, and hair dye. If you read the newspaper, you will recognize these causes of cancer. Every week the science section has a new story to scare you, and every week you eliminate one more pleasure from your life. But what is this based on? What can we believe?
The goal of this course will be to develop powerful and reliable computational algorithms for personal genomics, drug and vaccine discovery and association studies. Many of the genome- and systems-biology-based techniques (coming from both academic and commercial research) have proven inadequate, prompting some to dub this field ``recreational genomics." We will address the question of how to augment genomic and transcriptomic data with other information that will enable better inference.
For this purpose, we need to understand what it means for one thing to cause another. In biology, one method of finding genetic causes of a particular behavior is to knock out indivdual genes, and observe the effect this has on the system's behavior. However, the underlying relationships governing such systems can be far more complex than just one gene causing the regulation of another. For example, there could be two genes whose up-regulation for some period of time until they are joined by a third up-regulated gene causes the suppression of a fourth gene. But we must first identify these hypothesis to be tested. Frequently we begin with an observation of a system's activities and from that attempt to discover how it works. When we are looking at high-throughput experiments, such as timecourse microarrays, which measure the activities of thousands of genes, we cannot simply do this by hand. We need automated approaches to draw our attention to the most plausible hypotheses - that can later be validated through experimental testing. However, these approaches must be able to distinguish causation from mere correlation.
Motivated by our desire to understand these claims and propose alternatives, we will examine each facet of the problem in turn. We will begin with the necessary biological background in order to assess the strengths and weaknesses of possible experiments (and determine what is actually being measured). Then, we will look at an overview of how we can infer causality from data (including the necessary philosophical background), and introduce concepts from temporal logic and model checking. Finally, we will discuss how to apply these computer science tools and philosophical notions to the problems of systems biology.
(1) Uri Alon, An Introduction to Systems Biology: Design Principles of Biological Circuits, Chapman & Hall/CRC, 2006.
(2) Edmund M Clarke, Orna Grumberg, Doron A Peled, Model Checking, The MIT Press, 2001.
(3) Patrick Suppes, A probabilistic theory of causality, 1970. Available at: http://suppes-corpus.stanford.edu/article.html?id=106-1
(4) Jon Williamson, Bayesian Nets and Causality: Philosophical and Computational Foundations, Oxford University Press, 2005.