Computation In Biology


  Course description
      Copyright Notice


                                               Important Notice
   Some of the slides that we use in the class contain images which may be subject to copyright issues. Keep this in mind if you intend to use these images for non-educational purposes (make sure that you check the copyright link associated with each lecture).
   You can freely use all other class material as long as you attach the following note:    

"material reproduced by permission from class notes of the course 'Computation in Biology', Department of Computer Science, New York University, Spring 2000."



About downloads

   All downloadable files in this page have been compressed using the "gzip" program. Some downloads (most notably the "jpeg" version of each presentation) contain collections of files. In this case, unziping of the file you downloaded will create a new file with the extension ".tar". You must then un-tar this new file (using the Unix command "tar") in order to recover the original files. For information on the commands "gzip" and "tar", check the corresponding manual pages of Unix. 
   For Windows95/98/NT users: you can unzip and untar the above files using the program WinZip.




Material: Introduction to molecular biology: 
  • chemical bonds, DNA structure, RNA structure, protein structure
  • genes
  • transcription of DNA to mRNA.
Reading: Brown, chapters 1, 2, 3, 4, 5.
Visuals:                           copyright


Material: Introduction to molecular biology (continued): 
  • Ribosomes, tRNA, translation of mRNA to protein.
  • DNA replication.
  • Structure and shape of cells.
  • Phylogenetic domains and properties. Archaea, bacteria, eukaryotes.

Miller's and Pasteur's experiments. Evolution.

Reading: Brown, chapters 6, 7, 8, 9, 11.
Visuals:                           copyright


Material: Technologies used in genome sequencing: 
  • Varieties of molecule labeling: radioactive, chromogenic, chemiluminescent.
  • Denaturation of DNA.
  • Hybridization.
  • Gel electrophoresis.
  • Chain-termination sequencing (the Sanger-Coulson method).
  • Restriction endonucleases and their role in consistent DNA cleaving.
  • Building a genomic library: vector DNA and host cells. Example system:  E.Coli and its lambda-phages.
Reading: Brown, chapters 20, 21, 22


Material: Introduction to Computer Science concepts:
  • O-notation.
  • Recurrence equations and recursion.
  • Sorting (bubblesort, quicksort) ans searching.
  • Graphs and trees.

Physical maps: restriction site mapping and hybridization mapping.

Reading: Any introductory text for Computer Science. We suggest the  following: A.V.Aho, J.D.Ullman and J.E.Hopcroft, "Data Structures and Algorithms", Addison-Wesley (look at the chapters 3, 6, 7, 8 and 9).

For the physical maps, start looking at chapter 5 in Meidanis/Setubal.


  • Physical mapping: models and algorithms.
  • Fragment assembly.
  • The complexity classes P and NP (and sample problems).
Reading: Setubal & Meidanis, chapters 1, 4 and 5.
  • Slides: pdf format.


Material: Dynamic programming, Part I
  • Algorithms.
Reading: Setubal & Meidanis, sections 3.1 - 3.3
Gusfield, chapters 10, 11 and sections 15.1 - 15.3
  • Slides: pdf format.


Material: Dynamic Programming, Part II
  • Use in comparing biological sequences.
  • Dermining the appropriate cost functions: PAM and BLOSUM matrices.
  • Multiple sequence alignment.
Reading: Setubal & Meidanis, sections 3.4, 3.6
Gusfield, chapter 14 and sections 15.7-15.10
  • Slides: pdf format.


Material: Gene finding, Part I
  • Learning algorithms. Sensitivity/Specificity.
  • Elementary probability. Bayes rule.
  • Geography of prokaryotic and eukaryotic genomes.
  • Gene finding using cDNA libraries and ESTs.
  • ORFs (Open Reading Frames) as the starting point in prokaryotic gene finding.
  • Training simple probabilistic models.
Reading: Eddy/Krogh/Mitchison/Durbin, Sections 1.3 and 3.1
  • Slides: pdf format.

Material: Gene finding, Part II
  • Training probabilistic models: 
    • maximum likelihood estimation.
    • impact of training set size
  • Modeling sequence homologies with patterns
  • Modeling sequence homologies with profiles; log-likelihood ratios.
  • Higher order Markov chains.
  • GeneMark
Reading: Eddy/Krogh/Mitchison/Durbin, Section 3.5.

The GeneMark papers:

  • Borodovsky, M. and J. McIninch. (1993) "GeneMark: Parallel Gene Recognition for both DNA Strands", Computers & Chemistry 17: 123-133.
  • Besemer J. and M. Borodovsky (1999) "Heuristic approach to deriving models for gene
    , Nucleic Acids Research, 27, 3911-3920. [PDF version from author's site]


Material: Sequence similarity in database searching
  • Quantification of sequence similarity through alignment and scoring.
  • Dynamic programming revisited: the effect of boundary conditions in the alignment semantics.
  • The Smith-Waterman algorithm: an O(mn) exact solution to the optimal alignment problem and its restrictions in searching large databases.
  • Inexact but fast pairwise alignment: FASTP, BLAST.. 
Reading: Gusfield, sections 15.1 - 15.6.
Setubal & Meidanis, sections 3.5.

For more details on FASTP and BLAST, you can look at the original papers:

  • Lipman, D.J. and Pearson, W.R., "Rapid and sensitive protein similarity searches", Science, 227:1435-1441, 1989.
  • Altschul, S., Gish, W., Miller, W., Myers, E.W. and Lipman, D., "A basic local alignment search tool", J. Mol. Biology, 215:403-410, 1990.

11 & 12 

Material: Pattern Discovery
  • Definitions and examples.
  • The TEIRESIAS algorithm.
  • Bio-Dictionary: Deifnition & Applications.
  • Association discovery.
  • Gene expression analysis.

Invariant representations for 2 dimensions: rotation, translation, rigid transformations, scaling, affine transformations.

  • Brazma A, Jonassen I, Eidhammer I, Gilbert D., "Approaches to the automatic discovery of patterns in biosequences.",  J Comput Biol, 5(2):279-305, 1998.
  • Rigoutsos, I. and A. Floratos, "Motif Discovery Without Alignment Or Enumeration", Proceedings 2nd Annual ACM International Conference on Computational Molecular Biology (RECOMB 98), 1998.
  • Slides: pdf format.



Guest Lecture by
Barry Robson

Issues in protein folding.

Reading: Brazen & Tooze: chapters 1-6.
Setubal & Meidanis: chapter 8.
  • Slides (courtesy of Barry Robson): pdf format.