Enhancement of Noisy Speech using Deep Neural Networks and Time-Frequency Masking

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

Removing the noise from speech signal, which contains background noise, is considered one of the most challenging research topics in the area of speech processing.

The objective of my work is the implementation of system for the noise reduction with using deep neural networks for the estimation of time-frekvency masking.

In the first step I have to create a database of speech signals, which contains the combination of different voice recordings and noises. I create the form of the speech signals in frequency domain. Furthermore I extract the corresponding features, for example the mel-frequency cepstral coefficients (MFCC), linear prediction coefficients (LPC), as well as the optimal real valued and complex valued time-frequency masks, which can be applied to the deep neural network.

Using a state-of-the-art machine learning framework, I will design and train a deep neural network, and then I will test this model by means of different instrumental evaluation criteria and a listening test on a subset of the database which is not used in the training process.

The research questions that this work tackles concern the overall speech enhancement performance in matched training- and testing scenarios.


Please sign in to download the files of this thesis.