Influence of the increasing of the number of samples for the classification of pathological and healthy speech samlpesy

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

My thesis deals with the topic of automatic separation of the pathological and healthy speech samples. I examined the earlier results and measurement arrangements of the students dealing with this topic. I made use of the experiences gained from them in the course of the solution of my tasks.

As a first step I prepared the compilation of a reference database. This database is built up by the sound database of the Speech Acoustics Laboratory and the Hungarian Reference Speech Database (MRBA). I sorted out 211 sound patches from the one mentioned before, while from the latter one 49 sound patches. Beside the 15 sound files I prepared the annotation files, too.

The two developed classes were the healthy and unhealthy groups. I established the classes based on a RBH scale used in phoniatry, taking into account the H component. The H hoarseness index is interpreted on a four-stage scale. The sound patches with H0 are classified as healthy ones. The H1-H3 ones were assigned to the unhealthy class.

On the developed reference database I executed a number of classification experiments with the help of a two class SVM (Support Vector Machine) mechanical classifier. Examining the "E" vowel in continuous speech, I applied in all cases full cross evaluation and RBF (Radial Base Function) kernel. I got the best result with the jitter ddp, shimmer dda, and mfcc1, using separately the average and the scattering values of vector, which was 85,38% .

My aim in the course of categorizations was to achieve 90 % of recognition accuracy. I tried to give an estimate for the training data needed for this. I executed a number of classification experiments, receiving a curve originating from the measurement results. This curve depicts the recognition percentages reached in the course of categorization of the pathological and healthy samples depending on the speaker's number (the number of sound patches). During the curve fitting I used a logarithmic trend, because this gave the most reliable fit. According to my results, with the compilation of the access vector given by me, having 400-500 sound patches, using two classes examinations, 90% recognition accuracy can be reached.


Please sign in to download the files of this thesis.