Please use this identifier to cite or link to this item:
Title: Bayesian phylogenetic modelling of lateral gene transfers
Authors: Vieira, Rute Gomes Velosa
Issue Date: 2015
Publisher: Newcastle University
Abstract: Phylogenetic trees represent the evolutionary relationships between a set of species. Inferring these trees from data is particularly challenging sometimes since the transfer of genetic material can occur not only from parents to their o spring but also between organisms via lateral gene transfers (LGTs). Thus, the presence of LGTs means that genes in a genome can each have di erent evolutionary histories, represented by di erent gene trees. A few statistical approaches have been introduced to explore non-vertical evolution through collections of Markov-dependent gene trees. In 2005 Suchard described a Bayesian hierarchical model for joint inference of gene trees and an underlying species tree, where a layer in the model linked gene trees to the species tree via a sequence of unknown lateral gene transfers. In his model LGT was modeled via a random walk in the tree space derived from the subtree prune and regraft (SPR) operator on unrooted trees. However, the use of SPR moves to represent LGT in an unrooted tree is problematic, since the transference of DNA between two organisms implies the contemporaneity of both organisms and therefore it can allow unrealistic LGTs. This thesis describes a related hierarchical Bayesian phylogenetic model for reconstructing phylogenetic trees which imposes a temporal constraint on LGTs, namely that they can only occur between species which exist concurrently. This is achieved by taking into account possible time orderings of divergence events in trees, without explicitly modelling divergence times. An extended version of the SPR operator is introduced as a more adequate mechanism to represent the LGT e ect in a tree. The extended SPR operation respects the time ordering. It additionaly di ers from regular SPR as it maintains a 1-to-1 correspondence between points on the species tree and points on each gene tree. Each point on a gene tree represents the existence of a population containing that gene at some point in time. Hierarchical phylogenetic models were used in the reconstruction of each gene tree from its corresponding gene alignment, enabling the pooling of information across genes. In addition to Suchard's approach, we assume variation in the rate of evolution between di erent sites. The species tree is assumed to be xed. A Markov Chain Monte Carlo (MCMC) algorithm was developed to t the model in a Bayesian framework. A novel MCMC proposal mechanism for jointly proposing the gene tree topology and branch lengths, LGT distance and LGT history has been developed as well as a novel graphical tool to represent LGT history, the LGT Biplot. Our model was applied to simulated and experimental datasets. More speci cally we analysed LGT/reassortment presence in the evolution of 2009 Swine-Origin In uenza Type A virus. Future improvements of our model and algorithm should include joint inference of the species tree, improving the computational e ciency of the MCMC algorithm and better consideration of other factors that can cause discordance of gene trees and species trees such as gene loss.
Description: PhD Thesis
Appears in Collections:School of Modern Languages

Files in This Item:
File Description SizeFormat 
Vieira, R.V. 2015.pdfThesis4 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.