Topics in Computational Biology:

Comparative and Functional Genomics

[ Lecture 1 | Lecture 2 | Lecture 3 ... ]

Professor B. Mishra

Office Hours: Friday, 4-5pm
Office Phone: 212.998.3464
Email Address:

Day and Time:
Thursdays, 5:00-6:50pm EST, Room 613, WWH (251 Mercer St.)

Credits for Course:

Mathematical Maturity, Statistics and Introductory Genomics

1. Comparative Genomics:
Evolutionary Models, Statistics..
2. Phylogeny:
Models, Algorithms and Complexity
3. Comparing Genome Structure among the Species:
Genomic Rearrangements
4. Comparing Genome Structure within a Specie:
Gene Duplications, Gene Families, Pseudo-genes, etc.
5. Transcription Maps:
Gene Finding, Regulatory Sequences
6. Functional Genomics:
Genetic Networks, Gene Expression Arrays,
Clustering Algorithms, Ideas from Learning Theory
7. Gene Expression Arrays and its Effectivity:
8. Combining with with other Data:
9. Proteomics:
10. Population Genomics:
SNPs, Linkage Analysis
11. Cancer Genomics:

Required Text(s):

Course Description:

The genes of all cells are composed of DNA. Proteins serve as structural components as well as enzymes within cells but the genes contain the blueprints for each protein and the program for controlling the production of proteins. Genes are transcribed to produce complementary molecules of mRNA (messenger RNA) and the mRNA is translated to proteins. There is a one to one correspondence (almost) between genes and proteins. Proteins perform the work of cells such as energy production, reaction catalysis, inter-cellular signaling, transcription and translation, cell reproduction, etc. All cells of an organism contain the same DNA. The level of production of the each of the types of proteins specifies the state of a cell. This state is determined by spatial and temporal variables such as tissue location and extra-cellular stimuli. Level of production of a protein is determined primarily by level of transcription of the corresponding gene into mRNA. This picture seems to be at the core of a universal story of life!

With the recent availability of DNA sequence data, proteomics data and development of tools for whole-genome assays (e.g., gene expression arrays), it has become possible to understand the basic biology of the cells, identification and function of the genes and how a common/universal theme varies over all life.

The greatest hurdles to the effective development and use of new tools for the "post-genomic informatics" are problems of mathematics and statistics. There are difficult problems of combinatorial mathematics, statistics, modeling and algorithm design.

There are challenging problems of how to elucidate genetic networks based on time-sequenced gene expression data. There are important problems of how to classify cells based on expression pattern and how to develop diagnostic disease classifications systems. Because of the high dimensionality of the data, there are many challenging problems of multiplicity and multivariate analysis that must be addressed.

Midterm Date:
No Midterm.
Final Date:
Class Project.
Class Presentation.

Bud Mishra
January 1 2001