Machine learning for actionable knowledge discovery in synthetic biology

Huang, Yiming

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/6237

Full metadata record

DC Field	Value	Language
dc.contributor.author	Huang, Yiming	-
dc.date.accessioned	2024-07-18T10:52:15Z	-
dc.date.available	2024-07-18T10:52:15Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://hdl.handle.net/10443/6237	-
dc.description	PhD Thesis	en_US
dc.description.abstract	This thesis focuses on the development of machine learning algorithms to dis cover valuable knowledge in omics data, with the purpose of informing actionable decisions in synthetic biology. In synthetic biology, genetic circuits play a cru cial role in designing and constructing synthetic biological systems for producing high-value materials. These genetic circuits, consisting of engineered networks of genes regulated at the transcriptional and post-transcriptional levels, are inserted in a host organism to introduce desired properties. To optimise the performance of synthetic circuits and maximising product yields, it is crucial to understand the dynamic behaviour of genes and proteins within the cells and recognising the harmful cellular states. Transcriptomics technologies enable the measurement of gene expression profiles in different organisms under various conditions. However, to fully unlock the potential of these high-throughput transcriptomics datasets, the use of data mining techniques is motivated. Therefore, this work aims to propose a portfolio of computational methods for analysing transcriptomics data, which are specifically designed to identify detri mental cellular states in bioengineering host organisms that can lead to lower yields of target products. The proposed methods include unsupervised learning to de tect distinct gene expression profiles, statistical tests to characterise detrimental states, feature selection models to identify key biomarkers, and a recommendation system to prioritise robust biomarkers. These methods are applied on multiple transcriptomics datasets to study two commonly used organisms in synthetic bi ology, Bacillus subtilis and Escherichia coli. For B. subtilis, 10 distinct cellular growth states relevant to various stress conditions (e.g. stationary phase, anaero bic, temperature perturbation, salinity) are discovered and a minimal biomarker panel consisting of 10 genes indicative of these states is identified. For E. coli, pairs of biomarker genes for sensing load stress states specific to heterologous gene expression are discovered. The contributions of this thesis are twofold. Firstly, it offers machine learning methods to extract meaningful biological knowledge from high-dimensional tran scriptomics data. Secondly, it enables the identification of distinct cellular states and robust biomarker panels to distinguish the detrimental states in host organ isms B. subtilis and E. coli. These identified biomarkers have the potential to inform the design of synthetic systems for monitoring and alleviating stress states in bacterial cells	-
dc.description.sponsorship	Engineering and Physical Sciences Research Council (EPSRC) ’Synthetic Portabolomics’ Project	en_US
dc.language.iso	en	en_US
dc.title	Machine learning for actionable knowledge discovery in synthetic biology	en_US
dc.type	Thesis	en_US
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Huang Y 2023.pdf		17.6 MB	Adobe PDF	View/Open
dspacelicence.pdf		43.82 kB	Adobe PDF	View/Open

Show simple item record