Speech synthesis with spontaneous characteristics

Dr. Csapó Tamás Gábor
Department of Telecommunications and Media Informatics

This thesis provides an insight into my research and development with the aim of doing speech synthesis with spontaneous characteristics for the Hungarian language.

In the first part of the paper, I review the international literature on the spontaneous-like speech synthesis in the last decade. In doing so, I describe in detail three different algorithms that try to make spontaneous speech from text.

Subsequently, my automated method is presented, which transforms natural read speech to spontaneous-like speech by modifying pitch and duration, on the basis of spontaneous reference samples.

At the same time, the reader can gain insight into the design and development of a prototype system that is an extension of the Profivox Text-To-Speech system, which was developed in the Budapest University of Technology and Economics, Department of Telecommunications and Media Informatics. This can synthesize spontaneous-like speech by using pitch and duration modification, if appropriate spontaneous reference samples are available.

Finally, in order to investigate the success of the transformations subjectively, the environment and the results of a listening test are published.

With the further development of the published results a spontaneous-like Text-To-Speech system can be achieved which is suitable for online language teaching or spontaneous-like speech recognition system research as well as applications where the more natural human-machine interaction is important.


