General purpose of my work is to allow the computational comparison of prosodic features of different announcements. My further purposes are to represent the prosodic features mentioned above in a correct visual way, to support programs teaching speech pronunciaton audio-visually, and to solve these tasks preferably more and more in a language-independent way.
For a correct visual display of the prosodic features of two different announcements, it is necessary to warp sentences with a different length and time structure in a non-linear way in time. I implemented it by transforming the phoneme segmentator developed by Laboratory of Speech Acoustic at Department of Telecommunication and Mediainformatics of Budapest University of Technology and Economics to a special, language-independent, phoneme-levelled segmentator, generating acoustic models of 10 phonetic classes defined by myself, to solve this way the segmentation of the acoustically homogeneous voice-sized parts. I was teaching the acoustic models of the segmentator with a Hungarian speech material, and its operation was tested with 4 European languages (Hungarian, English, German, and Finnish). After the mechanical segmentation of announcements (the standard, and the sentence announced by the user) I made a pattern matching for the individual wovels with the help of a pattern-matching method developed by myself, and I performed the non-linear time warping on the basis of this process, achieving a phoneme-levelled accuracy with it.
I tested the operation of this system for sentence intonation and for the dynamics of the sentence intensity with a visual display and comparison, and I calculated the average square distance of the standard and the announcements. It occurred clearly from my results, that non-linear time warping is necessary for the correct comparison of the prosodic features of the announcements, and to represent them visually.