Knowledge derivation and data mining strategies for probabilistic functional integrated networks

James, Katherine

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/1436

Title:	Knowledge derivation and data mining strategies for probabilistic functional integrated networks
Authors:	James, Katherine
Issue Date:	2012
Publisher:	Newcastle University
Abstract:	One of the fundamental goals of systems biology is the experimental verification of the interactome: the entire complement of molecular interactions occurring in the cell. Vast amounts of high-throughput data have been produced to aid this effort. However these data are incomplete and contain high levels of both false positives and false negatives. In order to combat these limitations in data quality, computational techniques have been developed to evaluate the datasets and integrate them in a systematic fashion using graph theory. The result is an integrated network which can be analysed using a variety of network analysis techniques to draw new inferences about biological questions and to guide laboratory experiments. Individual research groups are interested in specific biological problems and, consequently, network analyses are normally performed with regard to a specific question. However, the majority of existing data integration techniques are global and do not focus on specific areas of biology. Currently this issue is addressed by using known annotation data (such as that from the Gene Ontology) to produce process-specific subnetworks. However, this approach discards useful information and is of limited use in poorly annotated areas of the interactome. Therefore, there is a need for network integration techniques that produce process-specific networks without loss of data. The work described here addresses this requirement by extending one of the most powerful integration techniques, probabilistic functional integrated networks (PFINs), to incorporate a concept of biological relevance. Initially, the available functional data for the baker’s yeast Saccharomyces cerevisiae was evaluated to identify areas of bias and specificity which could be exploited during network integration. This information was used to develop an integration technique which emphasises interactions relevant to specific biological questions, using yeast ageing as an exemplar. The integration method improves performance during network-based protein functional prediction in relation to this process. Further, the process-relevant networks complement classical network integration techniques and significantly improve network analysis in a wide range of biological processes. The method developed has been used to produce novel predictions for 505 Gene Ontology biological processes. Of these predictions 41,610 are consistent with existing computational annotations, and 906 are consistent with known expert-curated annotations. The approach significantly reduces the hypothesis space for experimental validation of genes hypothesised to be involved in the oxidative stress response. Therefore, incorporation of biological relevance into network integration can significantly improve network analysis with regard to individual biological questions.
Description:	PhD
URI:	http://hdl.handle.net/10443/1436
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
James K 12.pdf	Thesis	43.83 MB	Adobe PDF	View/Open
dspacelicence.pdf	Licence	43.82 kB	Adobe PDF	View/Open

Show full item record