Improvements of Hungarian hidden Markov-model based TTS

OData support
Dr. Tóth Bálint Pál
Department of Telecommunications and Media Informatics

Text-to-speech systems are speech synthesis systems which are able to transform a text written in a given language to speech. The general goal is to produce a synthetic speech that is intelligible and resembles human speech as much as possible. Several approaches have been studied and used until now to solve this problem. Lately the statistical approach of the problem has become more and more popular. Such statistical methods are used by hidden Markov model based speech synthesizers. They are among the best quality text-to-speech systems.

The greatest advantage of this approach is that a good speech quality can be achieved with a small runtime database. Furthermore, it also makes it possible to synthesize speech with various speaker styles and emotions. The small size of the runtime database results from the fact that the system does not store sound samples but parameters extracted from speech. These parameters are modelled by hidden Markov models. The functioning of the system requires a training phase during which these parameters are extracted from a training database. At the synthesis, based on the trained models, the parameters that match the input text the best are determined. The final waveform is generated from these parameters with the application of techniques used in speech coders. The present efforts wish to improve the quality of the generated speech, so that it should sound as natural and human-like as possible.

The starting point of my work was a functioning hidden Markov model based synthesiser for the Hungarian language. All of the efforts presented in my thesis aim to improve the quality of the synthetic speech. I modified several parts of the system for this purpose. Some modifications are based on specificities of the Hungarian language. Nevertheless, the methods could be used for other languages as well in the same way. After the changes I tried to measure the quality of the modified system. To measure a speech synthesizer objectively is not a solved problem yet. Therefore, subjective tests are used commonly. I also used subjective listening tests to measure the effect of the modifications. The test results prove the achieved improvement of the system compared to the starting point.


Please sign in to download the files of this thesis.