Machine learning for digital phenotyping from accelerometer data with applications to type-2 diabetes

Lam, Benjamin Puitong

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/6124

Title:	Machine learning for digital phenotyping from accelerometer data with applications to type-2 diabetes
Authors:	Lam, Benjamin Puitong
Issue Date:	2023
Publisher:	Newcastle University
Abstract:	This thesis presents our research on using machine learning for digital phenotyping. We attempt to address some of the associated data science challenges in using digital phenotypes for building a vision for Predictive, Personalised, Participatory and Preventative (P4) Medicine. Some of the most significant data science challenges associated with P4 medicine lie in identifying and engineering feature representations for raw digital data and then identify and learning predictive models for disease-related outcomes. This is a significant challenge due to the variety and volume of Big Health Data generated by digital technologies but also because of the diversity in disease-related phenotypes that can be associated. Our research has used accelerometery data generated from wearable digital activity trackers applied to Type-2 Diabetes as a case study. Type-2 Diabetes is a chronic disease that is becoming more prevalent in the UK and other countries known to be closely associated with low levels of physical activity, poor sleeping patterns and a sedentary lifestyle. Accelerometry data are an objective and reliable measure of physical activity. The key hypothesis in our research is that physical activity traces collected by digital accelerometers can be used to build machine learning models that can predict Type-2 Diabetes related outcomes and used to characterise a person’s digital phenotype. The goal of this case study is to then demonstrate and address some of the data science challenges of predictive modelling using ubiquitous digital devices. Two accelerometer datasets were used for our experimentation test beds: The UK Biobank and IMI DIRECT. We first begin by using features extracted with a state-of-the-art tool for accelerometer data analysis, GGIR, to learn predictive models for Type-2 diabetes related target clinical outcomes. This gave us a baseline on which to improve upon in our research. The first stage of our research involved using methods from neuroscience and gait analysis are adopted in another approach to build a new representation using the features generated from Human Activity Recognition (HAR) to learn predictive models for Type-2 Diabetes outcomes. Human Activity Recognition is one of the main areas where machine learning models are applied to accelerometry data. Furthermore, we devise a strategy for refining the training data which helped to enhance predictive performance and address some of the associated data science challenges. In the next stage of our research, advanced deep learning techniques were used to construct a latent representation using an unsupervised autoencoder approach thereby removing the need for manual feature engineering. This latent representation was then also used to learn predictive models for Type-2 diabetes related clinical target outcomes. This, however, produced a poor level of performance. We then sought to validate the representations for digital accelerometry data we developed in our research. We used the DIRECT dataset’s longitudinal studies over time to then evaluate whether physical activity changes are demonstrated in clinical disease progression over time. The goal of this set of analyses was to determine whether these two phenotypes are associated with one another, since we know type-2 diabetes is closely associated with physical activity levels. The final set of experiments in our research focused on unsupervised machine learning approaches to explore and validate our representations for digital accelerometry data. This is to demonstrate and validate how the representations for digital accelerometer data we developed in our research can be used to characterise physical activity patterns by clustering them into groups that exhibit similar behaviours. The goal of this work was to demonstrate how our representations produced meaningful characterisations of physical activity patterns. This would demonstrate a challenge in the data-driven P4 vision for medicine by illustrating how important it is to choose the optimal set of clinical target outcomes for learning predictive models for disease. Overall, our research developed and evaluated various representations of digital accelerometer data. Although the prediction results have been weak for our case study of Type-2 Diabetes, we were able to validate our representations of digital accelerometer data through unsupervised machine learning approaches. We have been successful in illustrating the significance of the data science challenges that still need to be addressed before a truly data-driven, predictive and personalised vision for the future of healthcare can be fully realised.
Description:	PhD Thesis
URI:	http://hdl.handle.net/10443/6124
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Lam B P 2023.pdf		19.29 MB	Adobe PDF	View/Open
dspacelicence.pdf		43.82 kB	Adobe PDF	View/Open

Show full item record