Automatic speech detection and frase level segmentation in spontaneous speech

OData support
Supervisor:
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

In my thesis edited in the Laboratory of Speech Acoustic my goal is mainly to summarize and review speech detection systems. Speech detection is often an elemental part of complex speech recognition systems, but my work is directly connected to the development of emotion recognition, where, instead of focusing on actual speech recognition, I will rather focus on detecting parts of sentences, phrases, and also separating them from background noises, which will ease the function of the real emotion recognizer.

Through the plan my tasks are the following: cognition, utilization and comparison of present day emotion and speech detection researches, as well as rating their results. When getting closer to speech detection systems, the task is to familiarize with the latest development and currently applied systems.

To move on and prepare for the next steps, the previously collected phone sample database was reviewed and fixed with the help of an annotation software. The practical adaptation of the detectors, in my view, is the usage of the not real time Hidden Markov Model Toolkit system and the early stage of Emotion Recognition Application real time speech recognizer, which will use the database formerly gathered in the laboratory.

The comparison between the detection systems currently being used in the Laboratory of Speech Acoustic is made by filling both detectors with the same voice sets, and compare them with the manual detection results.

In the laboratory the running evaluation system is already used for comparing a variety of recognition tasks. My job is to find, plan and develop the best joint evaluation method which can help during the examination of detection tests. I have set up two significant expectations against the evaluation method developed to meet the requirements; it should execute in a single software application, and should contain one percentage as result.

Finally the prospects of speech detection and emotion recognition are discussed.

Downloads

Please sign in to download the files of this thesis.