Please use this identifier to cite or link to this item:
http://theses.ncl.ac.uk/jspui/handle/10443/5559
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chan, Teck Kai | - |
dc.date.accessioned | 2022-09-02T15:30:47Z | - |
dc.date.available | 2022-09-02T15:30:47Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://hdl.handle.net/10443/5559 | - |
dc.description | PhD Thesis | en_US |
dc.description.abstract | The primary objective of a Sound Event Detection (SED) system is to detect the prescene of an acoustic event (i.e., audio tagging) and to return the onset and offset of the identified acoustic event within an audio clip (i.e., temporal localization). Such a system can be promising in wildlife and biodiversity monitoring, surveillance, and smart-home applications. However, developing a system to be adept at both subtasks is not a trivial task. It can be hindered by the need for a large amount of strongly labeled data, where the event tags and the corresponding onsets and offsets are known with certainty. This is a limiting factor as strongly labeled data is challenging to collect and is prone to annotation errors due to the ambiguity in the perception of onsets and offsets. In this thesis, we propose to address the lack of strongly labeled data by using pseudo strongly labeled data, where the event tags are known with certainty while the corresponding onsets and offsets are estimated. While Nonnegative Matrix Factorization can be used directly for SED but with limited accuracy, we show that it can be a useful tool for pseudo labeling. We further show that pseudo strongly labeled data estimated using our proposed methods can improve the accuracy of a SED system developed using deep learning approaches. Subsequent work then focused on improving a SED system as a whole rather than a single subtask. This leads to the proposal of a novel student-teacher training framework that incorporates a noise-robust loss function, a new cyclic training scheme, an improved depthwise separable convolution, a triple instance-level temporal pooling approach, and an improved Transformer encoding layer. Together with synthetic strongly labeled data and a large corpus of unlabeled data, we show that a SED system developed using our proposed method is capable of producing state-of-the-art performance. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Newcastle University | en_US |
dc.title | Advanced informatics for event detection and temporal localization | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Chan T K 2022.pdf | 9.81 MB | Adobe PDF | View/Open | |
dspacelicence.pdf | 43.82 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.