JANSEN, ROBERT K.1*, DAVID BADER2, BERNARD M. E. MORET2, LINDA A. RAUBESON3, LI-SAN WANG4, TANDY WARNOW4, and STACIA WYMAN4. 1Section of Integrative Biology, University of Texas, Austin TX 78712; 2Computer Science, University of New Mexico, Albuquerque, NM 87131; 3Biological Sciences, Central Washington University, Ellensburg, WA 98926; 4Computer Science, University of Texas, Austin TX 78712. - New approaches for using gene order data in phylogeny reconstruction.
The rapid accumulation of whole genome sequences for a wide diversity
of taxa is generating a huge amount of comparative data for
biologists. The availability of whole genome sequences is providing a
new set of molecular characters for phylogenetic reconstruction, which
are especially useful for resolving deep branches of the tree of life.
Changes in gene order are caused primarily by inversions,
transpositions, and transversions. One of the major challenges for
using genomic changes is the development of computational methods for
handling these types of characters, especially in groups with large
numbers of genes and highly rearranged genomes. We have been
developing and testing a variety of methods for reconstructing
phylogenies based on gene order data using both real and synthetic
data. Two of the primary methods we have developed and tested are
Maximum Parsimony on Binary Encodings (MPBE) and methods for
correcting previously published distance measures (inversion and
transposition distances). Our similations show that all methods
perform very well when the rates of change are low relative to the
number of genes and that all methods perform poorly when rates of
change are high relative to the number of genes. Furthermore,
corrected distance measures greatly improve the accuracy of
phylogenies. We have applied these new methods to a data set for the
highly rearranged chloroplast genomes of the Campanulaceae. In this
group, which generally has low rates of change relative to the number
of genes, all methods recover congruent tree topologies. Analyses of
this data set using our new program GRAPPA (Genome Rearrangements
Analysis under Parsimony and other Phylogenetic Algorithms) also
produced a congruent tree topology. Gene order phylogenies for the
Campanulaceae are considered accurate because they are largely
congruent with trees generated from three chloroplast gene sequences
(atpB, matK, and rbcL).
Key words: Campanulaceae, comparative genomics, computational biology, gene order, phylogenetic theory