Automatic evaluation of prosody - development of a nonlinear time warping

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

General purpose of my work is to allow the computational comparison of prosodic features of different announcements. My further purposes are to represent the prosodic features mentioned above in a correct visual way, to support programs teaching speech pronunciaton audio-visually, and to solve these tasks preferably more and more in a language-independent way.

For a correct visual display of the prosodic features of two different announcements, it is necessary to warp sentences with a different length and time structure in a non-linear way in time. I implemented it by transforming the phoneme segmentator developed by Laboratory of Speech Acoustic at Department of Telecommunication and Mediainformatics of Budapest University of Technology and Economics to a special, language-independent, phoneme-levelled segmentator, generating acoustic models of 10 phonetic classes defined by myself, to solve this way the segmentation of the acoustically homogeneous voice-sized parts. I was teaching the acoustic models of the segmentator with a Hungarian speech material, and its operation was tested with 4 European languages (Hungarian, English, German, and Finnish). After the mechanical segmentation of announcements (the standard, and the sentence announced by the user) I made a pattern matching for the individual wovels with the help of a pattern-matching method developed by myself, and I performed the non-linear time warping on the basis of this process, achieving a phoneme-levelled accuracy with it.

I tested the operation of this system for sentence intonation and for the dynamics of the sentence intensity with a visual display and comparison, and I calculated the average square distance of the standard and the announcements. It occurred clearly from my results, that non-linear time warping is necessary for the correct comparison of the prosodic features of the announcements, and to represent them visually.


Please sign in to download the files of this thesis.