Inflection of nonlinear parameters for the classification of pathologycal and healthy speech

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

The goal of my thesis is to help to establish the proper parameter set for the creation of an efficient classification model. The goal of the classification model is to allow us to determine whether the patient has any otolaryngological disorders, using continuous speech. A database containing healthy and pathological samples was available to be used in this thesis, which I also expanded during my research. The quality of the samples was determined by a phoniatric specialist using the RBH code, which is a subjective scale of speech quality. I have worked with 148 healthy and 222 pathological samples in total.

During preprocessing of the samples I have extracted the individual sound samples for the vowel ’E’, from which I have got 20 different acoustical parameters and I took the mean of the values of these parameters, which left me with a vector of 20 elements for each sample. I classified the parameters into three categories: classical, non-linear and modern. I have performed a Mann-Whitney u-test between the healthy and pathological groups, examining the male and female sounds separately. Although in many cases there were significant differences between the healthy and pathological groups in the acoustical parameters, the mean values did not yield the expected result. To determine how the parameters effect the classification process, I have used SVM classification. During this operation I have tried multiple variations using four types of procedures for choosing parameters, and I have examined the four variations of parameter sets using linear and RBF kernels. The best results were 88% for female and 87% for male samples while using both types of samples resulted in an 84% accuracy. In this case the procedure for choosing parameters yielded the following acoustical parameters: jitter_ddp, shimmer_apq3C, mean_hnrC, mfcc01, GQ->std_cycle_closed, GNE->SNR_TKEO and GNE->NSR_TKEO.

It can be concluded that when examining continuous speech, the non-linear and modern acoustical parameters work differently than it is described in previous works that focused on examining sound samples of one sustained vowel. From the results of the classifications we can also conclude that using these parameters alongside the classical acoustical parameters can improve the accuracy of the classification. However this matter requires further and more detailed research.


Please sign in to download the files of this thesis.