Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/4783
Title: Information geometry for phylogenetic trees
Authors: Garba, Maryam Kashia
Issue Date: 2019
Publisher: Newcastle University
Abstract: Phylogenetic trees represent evolutionary relationships between existing organisms, and are fundamental to many applications in molecular biology. These applications often require comparisons to be made between different phylogenetic trees, and this is generally achieved by using a metric or distance defined on pairs of trees. Distances between trees are used to perform hypothesis testing, cluster trees to identify differing patterns of evolution, averaging of trees, the postprocessing of results of phylogenetic analysis, among many applications. Most existing measures of distance between phylogenetic trees are based purely on the branching structure and edge lengths of the trees, and thus ignore the fact that phylogenetic trees represent probability models for gene sequence data. This project concerns the development of distance metrics and geodesics between trees based on the underlying probability distributions on genetic sequence data induced by trees. The field of information geometry offers specific methods for constructing distance metrics and geodesics on spaces of probability distributions, and hence on spaces of phylogenetic trees. The opening chapters of the thesis give background information on phylogenetic models, inference of phylogenies from sequence data, various notions of tree space, and the fundamental ideas of information geometry. Two main areas are then developed in the rest of the thesis. First, we present methods for computing distances between trees based on the probability distributions on genetic sequence they induce. This enables metrics such as the Hellinger distance and Jenson-Shannon distance to be pulled back from the space of distributions on sequence data to tree space. Approximate calculation of these metrics on tree space involves Monte Carlo simulation methods. We compare these probabilistic metrics to existing metrics on trees, and describe various interesting properties, such as their behaviour when trees have some leaves which are not in common. The second area concerns the construction of geodesics between trees using methods from information geometry. In the most widely studied tree space, Billera-HolmesVogtmann tree space, the local metric is taken to be Euclidean, and this metric extends to give a well-defined global geodesic geometry on the whole space. Existence of geodesics enables basic statistical procedures such as computation of means and variances, or principal component analysis, to be carried out in Billera-Holmes-Vogtmann tree space. This part of the thesis is motivated by the aim of reproducing such methods using an alternative and more meaningful geometry on the space of trees. As an alternative to the local Euclidean metric, we consider the metric and corresponding geodesics on trees induced by embedding tree space in the space of n×n symmetric positive definite matrices where n is the number of leaves on each tree. Equivalently, this corresponds to the information geometry arising when a certain multivariate normal distribution is associated to each phylogenetic tree. Geodesics in the space of symmetric positive definite matrices can be computed via existing exact methods. We describe algorithms for constructing geodesics with respect to the metric on tree space induced by the embedding. These are based on projecting geodesics between symmetric positive definite matrices down into the embedded tree space. In addition to the change in local geometry relative to BilleraHolmes-Vogtmann tree space, it is necessary to change the underlying topology of tree space by gluing together parts of tree space corresponding to edges with infinite length. The resulting space is known as the phylogenetic orange space, or edge-product space, and the computational tools we have developed are used to explore our proposed geometry for this space. Many open questions remain about this geometry, and the thesis closes with a discussion of future work.
Description: PhD Thesis
URI: http://theses.ncl.ac.uk/jspui/handle/10443/4783
Appears in Collections:School of Mathematics and Statistics

Files in This Item:
File Description SizeFormat 
Garba MK 2019.pdf17.18 MBAdobe PDFView/Open
dspacelicence.pdf43.82 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.