Classification possibilities of patological voices by acoustical parameters

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

This essay focuses on the classification of pathological voice from healthy voice. I used a sound database, which was made by the Laboratory of Speech Acoustic (LSA). This database contains voices of male and female patients. Most of the patients were suffering from phonation disorder, but some of the subjects where healthy or fully recovered. The quality of their voices was assessed by a specialist according to a four-point ranking of subjective voice quality called RBH (0 = least abnormal, 3 = most abnormal).

Jitter, shimmer, mean Harmonic-to-Noise-Ratio (HNR) and Mel Frequency Cepstral Coefficients (MFCC) where used to analyze sustained and running vowels from the sound database by the LSA. This research showed that the running vowels are better for the classification of pathological from healthy voice.

During my work I’ve done numerous segmentation of the database with an automatic segmentation tool, which made the labeling faster. I made two classes of these voice samples. The patients whose voice was listed H1, H2, H3 constitute the group pathological, while the subjects with voice H0 or H0-1 became the group healthy. I had 32 healthy (including 15 males and 17 females) and 41 pathological (including 16 males and 25 females) samples to train and test a Support Vector Machine (SVM) classifier using the radial base function (RBF) as a kernel function.

The SVM was investigated using various combinations of jitter, shimmer, HNR and MFCC and the best result for all of the samples, was obtained using the mean of jitter, shimmer and MFCC. In this case an accuracy of 79.45% was achieved with the vowel „a”. The full cross-validation showed that the classification rates of up to 80.95%, when I tested just female voices with vowel „a” and I used mean of jitter, shimmer, HNR, MFCC parameters. The male results were up to 77.42% with „a”, „e” and „i”, when minimum, maximum, mean, median and standard deviation were used as statistic parameters.

In the future I would like to increase the accuracy of the classification, and my further goal is to develop a software, which can diagnose the patient from his or her voice.


Please sign in to download the files of this thesis.