Please use this identifier to cite or link to this item:
http://theses.ncl.ac.uk/jspui/handle/10443/6237
Title: | Machine learning for actionable knowledge discovery in synthetic biology |
Authors: | Huang, Yiming |
Issue Date: | 2023 |
Description: | This thesis focuses on the development of machine learning algorithms to discover valuable knowledge in omics data, with the purpose of informing actionable decisions in synthetic biology. In synthetic biology, genetic circuits play a crucial role in designing and constructing synthetic biological systems for producing high-value materials. These genetic circuits, consisting of engineered networks of genes regulated at the transcriptional and post-transcriptional levels, are inserted in a host organism to introduce desired properties. To optimise the performance of synthetic circuits and maximising product yields, it is crucial to understand the dynamic behaviour of genes and proteins within the cells and recognising the harmful cellular states. Transcriptomics technologies enable the measurement of gene expression profiles in different organisms under various conditions. However, to fully unlock the potential of these high-throughput transcriptomics datasets, the use of data mining techniques is motivated. Therefore, this work aims to propose a portfolio of computational methods for analysing transcriptomics data, which are specifically designed to identify detrimental cellular states in bioengineering host organisms that can lead to lower yields of target products. The proposed methods include unsupervised learning to detect distinct gene expression profiles, statistical tests to characterise detrimental states, feature selection models to identify key biomarkers, and a recommendation system to prioritise robust biomarkers. These methods are applied on multiple transcriptomics datasets to study two commonly used organisms in synthetic biology, Bacillus subtilis and Escherichia coli. For B. subtilis, 10 distinct cellular growth states relevant to various stress conditions (e.g. stationary phase, anaerobic, temperature perturbation, salinity) are discovered and a minimal biomarker panel consisting of 10 genes indicative of these states is identified. For E. coli, pairs of biomarker genes for sensing load stress states specific to heterologous gene expression are discovered. The contributions of this thesis are twofold. Firstly, it offers machine learning methods to extract meaningful biological knowledge from high-dimensional transcriptomics data. Secondly, it enables the identification of distinct cellular states and robust biomarker panels to distinguish the detrimental states in host organisms B. subtilis and E. coli. These identified biomarkers have the potential to inform the design of synthetic systems for monitoring and alleviating stress states in bacterial cells. |
URI: | http://hdl.handle.net/10443/6237 |
Appears in Collections: | School of Computing |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Huang Y 2023.pdf | 17.6 MB | Adobe PDF | View/Open | |
dspacelicence.pdf | 43.82 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.