Please use this identifier to cite or link to this item:
Title: Advanced informatics for event detection and temporal localization
Authors: Chan, Teck Kai
Issue Date: 2022
Publisher: Newcastle University
Abstract: The primary objective of a Sound Event Detection (SED) system is to detect the prescene of an acoustic event (i.e., audio tagging) and to return the onset and offset of the identified acoustic event within an audio clip (i.e., temporal localization). Such a system can be promising in wildlife and biodiversity monitoring, surveillance, and smart-home applications. However, developing a system to be adept at both subtasks is not a trivial task. It can be hindered by the need for a large amount of strongly labeled data, where the event tags and the corresponding onsets and offsets are known with certainty. This is a limiting factor as strongly labeled data is challenging to collect and is prone to annotation errors due to the ambiguity in the perception of onsets and offsets. In this thesis, we propose to address the lack of strongly labeled data by using pseudo strongly labeled data, where the event tags are known with certainty while the corresponding onsets and offsets are estimated. While Nonnegative Matrix Factorization can be used directly for SED but with limited accuracy, we show that it can be a useful tool for pseudo labeling. We further show that pseudo strongly labeled data estimated using our proposed methods can improve the accuracy of a SED system developed using deep learning approaches. Subsequent work then focused on improving a SED system as a whole rather than a single subtask. This leads to the proposal of a novel student-teacher training framework that incorporates a noise-robust loss function, a new cyclic training scheme, an improved depthwise separable convolution, a triple instance-level temporal pooling approach, and an improved Transformer encoding layer. Together with synthetic strongly labeled data and a large corpus of unlabeled data, we show that a SED system developed using our proposed method is capable of producing state-of-the-art performance.
Description: PhD Thesis
Appears in Collections:School of Engineering

Files in This Item:
File Description SizeFormat 
Chan T K 2022.pdf9.81 MBAdobe PDFView/Open
dspacelicence.pdf43.82 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.