Advanced informatics for event detection and temporal localization

Chan, Teck Kai

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/5559

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chan, Teck Kai	-
dc.date.accessioned	2022-09-02T15:30:47Z	-
dc.date.available	2022-09-02T15:30:47Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://hdl.handle.net/10443/5559	-
dc.description	PhD Thesis	en_US
dc.description.abstract	The primary objective of a Sound Event Detection (SED) system is to detect the prescene of an acoustic event (i.e., audio tagging) and to return the onset and offset of the identified acoustic event within an audio clip (i.e., temporal localization). Such a system can be promising in wildlife and biodiversity monitoring, surveillance, and smart-home applications. However, developing a system to be adept at both subtasks is not a trivial task. It can be hindered by the need for a large amount of strongly labeled data, where the event tags and the corresponding onsets and offsets are known with certainty. This is a limiting factor as strongly labeled data is challenging to collect and is prone to annotation errors due to the ambiguity in the perception of onsets and offsets. In this thesis, we propose to address the lack of strongly labeled data by using pseudo strongly labeled data, where the event tags are known with certainty while the corresponding onsets and offsets are estimated. While Nonnegative Matrix Factorization can be used directly for SED but with limited accuracy, we show that it can be a useful tool for pseudo labeling. We further show that pseudo strongly labeled data estimated using our proposed methods can improve the accuracy of a SED system developed using deep learning approaches. Subsequent work then focused on improving a SED system as a whole rather than a single subtask. This leads to the proposal of a novel student-teacher training framework that incorporates a noise-robust loss function, a new cyclic training scheme, an improved depthwise separable convolution, a triple instance-level temporal pooling approach, and an improved Transformer encoding layer. Together with synthetic strongly labeled data and a large corpus of unlabeled data, we show that a SED system developed using our proposed method is capable of producing state-of-the-art performance.	en_US
dc.language.iso	en	en_US
dc.publisher	Newcastle University	en_US
dc.title	Advanced informatics for event detection and temporal localization	en_US
dc.type	Thesis	en_US
Appears in Collections:	School of Engineering

Files in This Item:

File	Description	Size	Format
Chan T K 2022.pdf		9.81 MB	Adobe PDF	View/Open
dspacelicence.pdf		43.82 kB	Adobe PDF	View/Open

Show simple item record