Dependence of the recognition of Pathological Speech on the increasing of the training examples

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

For my thesis work I have joined an ongoing project at Laboratory of Speech Acoustics about automatic recognition of impared speech. My main task was to investigate how increasing the speech samples can affect the accuracy of separating healthy and impared speech.

For the experiments I have used the already available speech database and added 49 new samples from MRBA, Hungarian Reference Speech Database. Finally I had a reference database of 302 speech samples. I have also worked with new recordings, which have never been processed before. For 15 samples I have also made an annotation file.

I have grouped all the speech recordings according to the H parameter of RBH classification. There were altogether 127 healthy (H0) and 175 impared (H1, H2, H3) samples. The jitter, shimmer and MFCC1 parameters of the ‘E’ and ‘O’ sounds were used for the experiments.

The best recognition rate has been achieved with the sound ‘E’: on a set of 119 healthy and 152 impaired speech samples the accuracy of separation has been 94%. It was the first time in this project that accuracy was above 90%.

I have also examined whether separating of the male and female samples have any influence on the results. The measurements showed that with separation of sex and right proportion of used test sets, the accuracy above 90% can be achieved with even half of the samples which have been used in the mixed measurement before.

The H0-1 speech samples are hard to classify, therefore I have also created a 3 class SVM measurement: beside the healthy and impaired sets an interim class has been introduced as well. With small number of samples the classification has given an accuracy above 90%, but increasing the number of healthy samples made the result drop under 85%.

In the future I want to increase the accuracy of recognition, and investigate the detection of articulation problems as well. For the latter instead of RBH codes a new classification needs to be worked out and also improve and increase the database with new recordings which would make this classification possible.


Please sign in to download the files of this thesis.