One reason to think that trees do not give an accurate representation of evolution is that different gene trees for a particular set of species vary. The reasons for these discrepencies could be that:
Phylogenetic Networks:
Reticulation events can be represented using Directed Acyclic Graphs (DAGs)
of a certain form. There are two kinds of nodes in these
special DAGs: tree nodes and reticulation nodes. Tree nodes have one edge entering them and two or more edges leaving them. Reticulation nodes have many edges entering them and one edge leaving them. Reticulation nodes give phylogenetic networks the ability to depict reticulation events.
At present there is no universally accepted network reconstruction algorithm.
Back to top
Project Goals
The goals of my project were to study various proposed algorithms and to compare these algorithms and their results.
Back to top
Algorithm 1:Galled network reconstruction (Nakhleh et. al)
Definitions:
- Reticulation Cycle: Suppose there is a node w in a network from which two directed paths come out, and that the two paths meet at a node x. This node x is called a reticulation node, and the 2 directed paths (that start at w and end at x) together form a reticulation cycle.
- Gall: A reticulation cycle that shares no nodes with any other reticulation cycle is known as a gall
- Galled Network: A network in which all the reticulation cycles are galls is called a galled network
- Partition: Every edge e in a tree has two sets of nodes associated with it: the first set contains all those nodes below the edge and the second set contain all the remaining nodes. These two sets of nodes together
constitute a partition of e
- Incompatible Edges: An edge e1 in tree T1 is said to be incompatible with tree T2 if there are no edges in tree T2 with the same partition as e1
- Ends of a path: These are the subtrees attached to the first or last nodes of a path.
See the
example to see these relationships. In the diagram, there are two reticulation cycles: one made up of the edges (1,2)(2,4)(1,3)(3,4) and the second made up of the edges (5,8)(8,10)(5,9)(9,10). Both of these are galls since none of the nodes in the first reticulation cycle are in the second and vicce versa. Since there are only two reticulation cycles in the graph, and both of them are galls, the network in the example is a galled network.
The partition of the dashed edge (3, 6) is {{g,h},{1,2,4,5,7,8,9,10,11,a,b,c,d,e,f}}
Basic concept:
-
Given two different gene trees T1 and T2, we first find the incompatible edges in the two trees.
- For each path p made up of the incompatible edges, we find the subtree X in one of the ends of p such that when X is removed from both trees T1 and T2, we get a common tree T3.
- Once you find such a subtree X which is always attached to one of the ends of p, we can build the network by adding a new edge from the other end of p to the root of X.
Assumptions:
- An ortholog of a gene can only appear once in a gene tree (i.e there is no convergence)
- The true Phylogenetic nework is a galled network
Reasons given to justify assumption 2
True evolutionary history is likely to be a galled network if:
- there are moderate levels of recombination
- most recombination events are recent
- we are studying a single recombination event
To download this algorithm, please contact
chandni@brandeis.edu
Back to top
Algorithm 2: Ancestral Network Reconstruction (Dennis Shasha)
This algorithm takes as input a set of different gene trees for a particular set of species and constructs a network.
Notation:
If A is a gene then A
x is a variant or ortholog of A, and A
xy is a descendant of A
x.
Suppose the gene tree for a set of species {Sp1, Sp2, Sp3} for gene A is (A1, (A21, A22)) where
Sp1:A
1Sp2:A
21 Sp3:A
22
Suppose also that the gene tree for the same set of species for gene B is (B
11, (B
21,B
22)) where
Sp1: B
22Sp2:B
21Sp3:B
11
Combining this data we get that
Sp1:A
1B
22Sp2:A
21B
21 Sp3:A
22B
11
We can already see a conflict in that Sp1 and Sp2 must have an ancestor AB
2 (which means that gene B must have changed before gene A) whereas Sp2 and Sp3 must have an ancestor A
2B (which means that gene A must have changed before B). This algorithm constructs the maximal base tree containing as many nodes as possible that satisfy tree conditions from the given data. Once this base tree is obtained, the algorithm adds the remaining nodes and edges to incorporate all the evolutionary information available.
Assumptions:
- An ortholog of a gene can only appear once in a gene tree (i.e
the phylogeny of a gene is a tree)
- For each gene A in the phylogenetic network
for species, the ortholog of A must be the direct descendent of its parent/s or equal to the ortholog of the parent
To download this program, please contact
shasha@cs.nyu.edu
Back to top
Results
I ran the two algorithms on the two gene trees for genes
YBL015W and
YCL054W.
Here are the results for
algorithm 1 and
algorithm 2
Back to top
References:
- B.M.E. Moret, L. Nakhleh, T. Warnow, C.R. Linder, A. Tholse, A. Padolina, J. Sun, and R. Timme, "Phylogenetic networks: modeling, reconstructibility, and accuracy." To appear in the IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 2004. PDF
- L. Nakhleh, T. Warnow, and C.R. Linder, "Reconstructing Reticulate Evolution in Species -- Theory and Practice." Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), 337-346.PDF
- D. Gusfield, S. Eddhu, and C. Langley. "Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination." Proceedings of Computational Systems Bioinformatics (CSB 03), 2003. PDF
- W.P. Maddison. "Gene Trees in Species Trees" systematic biology, 46(3):523-536, 1997.
-
Genome-scale approaches to resolving incongruence in molecular phylogenies
- Deriving the genomic tree of life in the prescence of Horizontal Gene Transfer: Conditioned reconstruction
- Lake 1994 -
Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances
- NeighborNet: An agglomerative method for the construction of planar phylogenetic networks
Back to top