Newcastle University eTheses >
Newcastle University >
Faculty of Science, Agriculture and Engineering >
School of Computing Science >
Please use this identifier to cite or link to this item:
|Title: ||Knowledge extraction from biomedical data using machine learning|
|Authors: ||Lazzarini, Nicola|
|Issue Date: ||2017 |
|Publisher: ||Newcastle University|
|Abstract: ||Thanks to the breakthroughs in biotechnologies that have occurred during the recent
years, biomedical data is accumulating at a previously unseen pace. In the field of
biomedicine, decades-old statistical methods are still commonly used to analyse such
data. However, the simplicity of these approaches often limits the amount of useful
information that can be extracted from the data. Machine learning methods represent
an important alternative due to their ability to capture complex patterns, within the
data, likely missed by simpler methods.
This thesis focuses on the extraction of useful knowledge from biomedical data using
machine learning. Within the biomedical context, the vast majority of machine learning
applications focus their e↵ort on the generation and validation of prediction models.
Rarely the inferred models are used to discover meaningful biomedical knowledge. The
work presented in this thesis goes beyond this scenario and devises new methodologies
to mine machine learning models for the extraction of useful knowledge.
The thesis targets two important and challenging biomedical analytic tasks: (1) the
inference of biological networks and (2) the discovery of biomarkers. The first task
aims to identify associations between di↵erent biological entities, while the second one
tries to discover sets of variables that are relevant for specific biomedical conditions.
Successful solutions for both problems rely on the ability to recognise complex interactions
within the data, hence the use of multivariate machine learning methods. The
network inference problem is addressed with FuNeL: a protocol to generate networks
based on the analysis of rule-based machine learning models. The second task, the
biomarker discovery, is studied with RGIFE, a heuristic that exploits the information
extracted from machine learning models to guide its search for minimal subsets of
The extensive analysis conducted for this dissertation shows that the networks inferred
with FuNeL capture relevant knowledge complementary to that extracted by standard
inference methods. Furthermore, the associations defined by FuNeL are discovered
- 6 -
more pertinent in a disease context. The biomarkers selected by RGIFE are found to
be disease-relevant and to have a high predictive power. When applied to osteoarthritis
data, RGIFE confirmed the importance of previously identified biomarkers, whilst also
extracting novel biomarkers with possible future clinical applications.
Overall, the thesis shows new e↵ective methods to leverage the information, often
remaining buried, encapsulated within machine learning models and discover useful
|Description: ||PhD Thesis|
|Appears in Collections:||School of Computing Science|
Items in eTheses are protected by copyright, with all rights reserved, unless otherwise indicated.