Visual Active Tracking in Simulation with Task-Relevant Features and Deep Reinforcement Learning

Crane, Kirsten Nicole

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/6410

Full metadata record

DC Field	Value	Language
dc.contributor.author	Crane, Kirsten Nicole	-
dc.date.accessioned	2025-03-21T09:44:19Z	-
dc.date.available	2025-03-21T09:44:19Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://hdl.handle.net/10443/6410	-
dc.description	PhD Thesis	en_US
dc.description.abstract	Using an autonomous underwater vehicle to film marine animals such as dolphins in their natural habitat can greatly aid monitoring, health assessment and animal behaviour research. Having a vehicle autonomously follow and orient toward a species of interest, without the need for tagging, presents a challenging visual active tracking (VAT) problem using image data from the onboard camera. This thesis investigates model-free deep reinforcement learning (DRL) algorithm Soft Actor Critic (SAC) as a potential solution. The utility of this approach is demonstrated in simulation. A follow-up robotics project would then need to look at integrating the simulation-trained control policy with the real vehicle. DRL was selected given that it can support accurate, real-time tracking without needing to model the complexities of a marine environment. In the VAT literature, research can be divided into end-to-end and task-separated solutions, based on whether or not the state estimation and control sub-tasks are jointly optimised. The benefit of joint optimisation is that state estimation can respond to control performance, and the control can adapt to imperfect state estimation. The challenge is that this requires a network large enough for learning rich representations, whilst also needing to limit the number of network parameters for the difficult credit assignment problem faced by the DRL agent. This thesis explores an approach to VAT which is end-to-end but alleviates some of the burden by learning the majority of perceptual skills prior to agent training, with a separate model - a variational autoencoder (VAE). Furthermore, the task-relevance of these perceptual skills is ensured through the use of a multi-part loss function fed by three auxiliary tasks of target state prediction. This approach to a constrained VAE was presented by Bonatti et al. (2020) in the aerial navigation space, upstream of imitation learning. This thesis extends the approach to DRL and VAT, with a new framework called T2FO (tracking with task-relevant feature observations). T2FO achieves mean episodic return of 2,057 from a possible 3,000, across 100 inference runs of the trained policy. The framework outperforms three baseline SAC policies trained with raw image observations (1,049), unconstrained VAE features (1,198) and target state predictions from the auxiliary networks (1,987). Neither agent training nor VAE training were possible without first developing a cus tom environment for the custom problem. This thesis additionally presents three environ ments developed using the commercial game engine Unity and Open AI’s widely used library Gym: a toy environment CubeTrack, a car environment DonkeyTrack, and an application-focused underwater environment SWiMM DEEPeR. For supplementary videos see https://www.youtube.com/channel/UCA4fgSfe2IctRv5N-Gr0OrQ.	en_US
dc.language.iso	en	en_US
dc.publisher	Newcastle University	en_US
dc.title	Visual Active Tracking in Simulation with Task-Relevant Features and Deep Reinforcement Learning	en_US
dc.type	Thesis	en_US
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
CraneKN2024.pdf	Thesis	31.61 MB	Adobe PDF	View/Open
dspacelicence.pdf	Licence	43.82 kB	Adobe PDF	View/Open

Show simple item record