Deep neural networks in speech recognition

OData support
Dr. Mihajlik Péter
Department of Telecommunications and Media Informatics

In my thesis project I solved a classic speech recognition problem – phoneme classification – by applying artificial neural network. Great emphasis has been put on the effects of neural network’s structure on the classification’s precision. I intended to briefly introduce the basics of speech recognition, and to review the MLP (Multi-Layer Perceptron) neural networks’ structure, principals.

I performed the study’s tests by using two databases (Hungarian and English). At the conversion of the corpuses I used the file formats defined by the HTK (Hidden Markov Model Toolkit of Cambridge) as transitional formats between the corpuses and the neural networks.

In an effort to make my results comparable to those of the standard techniques, I implemented the phoneme classification by using GMM (Gaussian Mixture Model) as a reference.

The experiments performed by different neural network approaches, and the reference experiments has been performed in MATLAB environment (the training as well as the testing). An optimal training algorithm was selected from the available options through numerous training-test iterations. With the functional manageability in view, I considered the running times.

I evaluated the results of the tests correlated to the standard method. I examined the effect of the neural network’s structure on the qualification precision and the training time. Finally, I was able to find an interesting correlation between the hidden layer’s depth and the training set’s size.


Please sign in to download the files of this thesis.