Single channel audio separation using deep neural networks and matrix factorizations

Wu, Di

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/4053

Title:	Single channel audio separation using deep neural networks and matrix factorizations
Authors:	Wu, Di
Issue Date:	2017
Publisher:	Newcastle University
Abstract:	Source Separation has become a significant research topic in the signal processing community and the machine learning area. Due to numerous applications, such as automatic speech recognition and speech communication, separation of target speech from the mixed signal is of great importance. In many practical applications, speech separation from a single recorder is most desirable from an application standpoint. In this thesis, two novel approaches have been proposed to address this single channel audio separation problem. This thesis first reviews traditional approaches for single channel source separation, and later elicits a generic approach, which is more capable of feature learning, i.e. deep graphical models. In the first part of this thesis, a novel approach based on matrix factorization and hierarchical model has been proposed. In this work, an artificial stereo mixture is formulated to provide extra information. In addition, a hybrid framework that combines the generalized Expectation-Maximization algorithm with a multiplicative update rule is proposed to optimize the parameters of a matrix factorization based approach to approximatively separate the mixture. Furthermore, a hierarchical model based on an extreme learning machine is developed to check the validity of the approximately separated sources followed by an energy minimization method to further improve the quality of the separated sources by generating a time-frequency mask. Various experiments have been conducted and the obtained results have shown that the proposed approach outperforms conventional approaches not only in reduction of computational complexity, but also the separation performance. In the second part, a deep neural network based ensemble system is proposed. In this work, the complementary property of different features are fully explored by ‘wide’ and ‘forward’ ensemble system. In addition, instead of using the features learned from the output layer, the features learned from the penultimate layer are investigated. The final embedded features are classified with an extreme learning machine to generate a binary mask to separate a mixed signal. The experiment focuses on speech in the presence of music and the obtained results demonstrated that the proposed ensemble system has the ability to explore the complementary property of various features thoroughly under various conditions with promising separation performance.
Description:	PhD Thesis
URI:	http://hdl.handle.net/10443/4053
Appears in Collections:	School of Electrical and Electronic Engineering

Files in This Item:

File	Description	Size	Format
Wu D. 2017.pdf	Thesis	3.75 MB	Adobe PDF	View/Open
dspacelicence.pdf	Licence	43.82 kB	Adobe PDF	View/Open

Show full item record