Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/1265
Title: Comparative genomics for studying the proteomes of mucosal microorganisms
Authors: Nakjang, Sirintra
Issue Date: 2011
Publisher: Newcastle University
Abstract: A tremendous number of microorganisms are known to interact with their animal hosts. The outcome of the interactions between microbes and their animal hosts range from modulating the maintenance of homeostasis to the establishment of processes leading to pathogenesis. Of the numerous species known to inhabit humans, the great majority live on mucosal surfaces which are highly defended. Despite their importance in human health, little is known about the molecular and cellular basis of most host-microbe interactions across the tremendous diversity of mucosal-adapted microorganisms. The ever-increasing availability of genome sequence data allows systematic comparative genomics studies to identify proteins with potential important molecular functions at the host-microbe interface. In this study, a genome-wide analysis was performed on 3,021,490 protein sequences derived from 867 complete microbial genome sequences across the three domains of cellular life. The ability of microbes to thrive successfully in a mucosal environment was examined in relation to functional genomics data from a range of publicly available databases. Particular emphasis was placed on the extracytoplasmic proteins of microorganisms that thrive on human mucosal surfaces. These proteins form the interface between the complex host-microbe and microbe-microbe interactions. The large amounts of data involved, combined with the numerous analytical techniques that need to be performed makes the study intractable with conventional bioinformatics. The lack of habitat annotations for microorganisms further compounds the problem of identifying the microbial extracytoplasmic proteins playing important roles in the mucosal environments. In order to address these problems, a distributed high throughput computational workflow was developed, and a system for mining biomedical literature was trained to automatically identify microorganisms’ habitats. The workflow integrated existing bioinformatics tools to identify and characterise protein-targeting signals, cell surface-anchoring features, protein domains and protein families. This study successfully demonstrated a large-scale comparative genomics approach utilising a system called Microbase to harness Grid and Cloud computing technologies. A number of conserved protein domains and families that are significantly associated with a speiii iv cific set of mucosa-inhabiting microorganisms were identified. These conserved protein regions of which their functions were either characterised or unknown, were quite narrow in their coverage of taxa distribution, with only a few protein domains more widely distributed, suggesting that mucosal microorganisms evolved different solutions in their strategies and mechanisms for their survival in the host mucosal environments. Metabolic and biological processes common to many mucosal microorganisms included: carbohydrate and amino acid metabolisms, signal transduction, adhesion to host tissues or contents in mucosal environments (e.g. food remnants, mucins), and resistance to host defence mechanisms. Invasive or virulence factors were also identified in pathogenic strains. Several extracytoplasmic protein families were shared among prominent bacterial members of gut microbiota and microbial eukaryotes known to thrive in the same environment, suggesting that the ability of microbes to adapt to particular niches can be influenced by lateral gene transfer. A large number of conserved regions or protein families that potentially play important roles in the mucosa-microbe interactions were revealed by this study. Several of these candidates were proteins of unknown function. The identified candidates were subjected to more detailed computational analysis providing hypothesis for their function that will be tested experimentally in order to contribute to our understanding of the complex host-microbe interactions. Among the candidates of unknown function, a novel M60-like domain was identified. The domain was deposited in the Pfam database with accession number PF13402. The M60-like domain is shared amongst a broad range of mucosal microorganisms as well as their vertebrate hosts. Bioinformatics analyses of the M60-like domain suggested a potential catalytic function of the conserved motif as gluzincins metalloproteases. Targeting signals were detected across microbial M60-likecontaining proteins. Mucosa-related carbohydrate-binding modules (CBMs), CBM32 was also identified on several proteins containing M60-like domains encoded by known mucosal commensals and pathogens. The co-occurrence of the CBMs and M60-like domain, as well as annotated potential peptidase function unveiled a new functional context for the CBM, which is typically connected with carbohydrate processing enzymes but not proteases. The CBM domains linked with members of different protease families are likely to enable these proteases to bind to specific glycoproteins from host animals further highlighting the importance of proteases and CBMs (CBM32 and CBM5_12) in host-microbe interactions.
Description: PhD Thesis
URI: http://hdl.handle.net/10443/1265
Appears in Collections:Institute for Cell and Molecular Biosciences

Files in This Item:
File Description SizeFormat 
Nakjang 11.pdfThesis21.7 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.