Automatic punctuation of speech transcripts based on speech prosody

OData support
Supervisor:
Dr. Szaszák György József
Department of Telecommunications and Media Informatics

Prosodic factors in speech, as tone, intonation etc. possess a lot of relevant information in them. Since these artifacts of prosody and also the grammar factors (e.g. syntax) as well have important role in the segmentation of the fluent speech, therefore growing number of researches have been going on since the beginning of the 21st century related to extending speech recognition with automatic punctuation detection. However, the currently used automatic speech recognition systems do not use automatic punctuation detection at all or they do, but only by some measure, so interpreting the output of automatic speech recognisers difficult due to the lack of punctuation marks. This is also problematic in various tasks, like the case grammar analysis.

In this field, while I am writing my thesis, there is no solution for approximate punctuation with phonological phrases, in the case of Hungarian language, so my approach concludes in the initial steps of the automatic punctuation detection.

My duties included understanding the theoretic background of this specific field, training a HMM (Hidden Markov model) based system for prosodic segmentation, evaluating the automatic segmentation with precision-recall metrics, and comparing the output of the segmentation with the text included punctuation marks, based on statistic analyses. I collated the phrases before and after the punctuation marks in this part.

Based on the aforementioned analyses I provide a suggestion for the automatic punctuation detection of the automatic speech recognition system, and I also discuss possible ways of improvement and areas of application.

Downloads

Please sign in to download the files of this thesis.