Algorithms and Statistics for Nucleic Acid Secondary Structure Prediction
Friday September 20, 2002
Host: Tamar Schlick, email@example.com, 212-995-0049
Secondary structures (foldings) for single-stranded nucleic acids can be predicted for single sequences, or for groups of homologous sequences that can be reliably aligned.
The foldings of individual sequences are predicted using dynamic pro- gramming methods borrowed and adapted from algorithms used to align molecular sequences. Prediction is based on the minimization of free energy, and nearest neighbor parameters derived from physical chemistry are used to assign these energies.
Minimum energy folding works very well on some sequences, poorly on others, and within a given sequence is often more reliable in some regions than in others. The prediction of multiple secondary structures close to the minimum predicted energy mitigates the uncertainty. Dot plots that display all possible foldings close to the minimum energy or base pair probabilities also help. These dot plots convey \well-de nedness" information that may be mapped onto individual foldings to show what regions are better predicted than others.
When a number of aligned, homologous sequences are available, mutual information computations allow one to determine base pairs that are con- served in evolution even though the sequences themselves vary. Phylogenetic information and energy methods may be used together, each helping the other, when only a few aligned sequences are available. Large databases of aligned RNAs now exist where secondary structure has been predicted with con dence. Statistical analysis of the frequencies of small structural motifs can lead to folding rules that are analogous to the energy rules currently used. Such studies may eventually lead to independent, statistically derived rules. Even if this is not achieved, the statistical folding parameters can point out signi cant motifs that can then be investigated by physical chemists or structural biologists.