Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

Craddock, Tracy

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/2065

Title:	Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes
Authors:	Craddock, Tracy
Issue Date:	2008
Publisher:	Newcastle University
Abstract:	The aim of the project described in this thesis is to understand and predict the characteristics and behaviour of a family of bacteria through an analysis of genome wide data from a variety of sources. The focus of the research is a family of bacteria, Bacillus, whose members show a diverse range of phenotypes, from the non-pathogenic B. subtilis to B. anthrncis, the causative agent of anthrax. Specifically, the focus was on the genomic scale identification and characterisation of secreted proteins from Bacillus species. Firstly, the application of Grid-based computational approaches to problems in genomic analysis and annotation was investigated, applying mllGrid technology to a biological problem not previously addressed using this approach. e-Science workflows and a service-oriented approach were developed and applied to predict and characterise secreted proteins, and the results automatically integrated into a custom relational database. An associated Web portal was also developed to facilitate expert curation, results browsing and querying over the database. Workflow technology was also used to classify the putative secreted proteins into families and to study the relationships between and within these families. The design of the workflows, the architecture and the reasoning behind the approach used to build this system, called BaSPP, are discussed. Analysis of the putative Bacillus secretomes revealed clear distinctions between proteins present in the pathogens and those in the non-pathogens. The properties of the protein families present in all Bacillus secretomes, as well as those specific either to the pathogens or to the non-pathogens were investigated. Many of the protein families contained members of unknown function. In the iv second part of the project, these families were investigated in more depth, using additional data integration strategies not previously applied to these organisms. The secretomes were modelled at the system level, in the broader context of interactomes. A system called SubtilNet was therefore developed, using B. subtilis as the reference organism. As part of SubtilNet, a toolkit and architecture were developed and implemented for building and analysing probabilistic functional integrated networks (PFINs). The PFINs built for each Bacillus species using this system were subsequently used to delve further into the interactions specific to the secreted proteins by extracting and exploring the cross-species PFINs of these proteins. The cross-species PFINs for the protein families specific to the pathogens and non-pathogens were explored, with particular emphasis on the core PrsA-like protein family, which acted as a use case to show how the PFIN s can be used to shed light on protein function. The addition of orthologous links between species was demonstrated to facilitate network clustering and analysis, enabling putative annotations to be applied to proteins previously of unknown function.
Description:	PhD Thesis
URI:	http://hdl.handle.net/10443/2065
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Craddock T. 2007.pdf	Thesis	33.23 MB	Adobe PDF	View/Open
dspacelicence.pdf	Licence	43.82 kB	Adobe PDF	View/Open

Show full item record