Applying modern speech coder in hidden Markov-model based TTS

OData support
Dr. Gyires-Tóth Bálint Pál
Department of Telecommunications and Media Informatics

Naturally, the interface between a person and a computer is not perfect. There are several approaches all over the world, to make this interface better by including speech.

There are numerous speech synthesis methods, for example formant synthesis, and corpus-based unit selection synthesis. The synthesis technique, that produces the best quality, is the corpus-based. Quality comes at a price, so the size of a corpus database is quite big, and the voice characteristics are defined by the database, and can be changed only by transformations that usually significantly degrade quality.

This thesis focuses on Hidden Markov Model (HMM) based text-to-speech (TTS) synthesis. It is based on a statistical method to extract parameters from the waveform (training phase) and create HMM models. To read a text, HMMs are applied for generating the best sequences of parameters.

HMM based TTS systems have several advantages. They are able to produce voice in good quality, and the size of the database it needs is small (a few megabytes), so it is worth applying the system on mobile devices. Another advantage of HMM based TTS systems is that it is possible to synthesize speech with various voice characteristics such as speaker individualities and emotions.

In this paper I review the steps of changing the coder to a new one, which help to improve the quality of generated speech and decrease the time that speech synthesis needs.


Please sign in to download the files of this thesis.