Currently, one of the the most promising direction in automatic speech recognition is the use of acoustic models based on deep artificial neural networks. Although they significantly outperform traditional generative acoustic models, there has been little research conducted on their applications to Hungarian large vocabulary continuous speech recognition tasks. The aim of this thesis is to examine the possibilities of using neural network based models built on Hungarian speech databases for industrial speech recognition tasks, to analyze the obtainable improvement compared to traditional models, and to find the optimal neural network architecture and parameter configuration for each task.
Chapter 1 describes the theoretical background of speech recognition, including traditional approaches and the characteristics of neural networks, as well as the acoustic modeling methods based on them. It also reviews some of the international results acheived on similar tasks. Chapter 2 specifies the toolkit in which the experiments are conducted. Chapters 3 and 4 analyze the performance of acoustic models built for two different Hungarian language speech recognition tasks. Chapter 3 describes and examines systems built on television broadcast media. The models introduced and evaluated in Chapter 4 are based on telephone helpdesk conversation databases. In both cases the goal is to build acoustic models with optimal accuracy that can be used for creating subtitles or audio transcriptions automatically. Chapter 5 summarizes the computational resources necessary for the training of neural networks. Finally, the r