Developement of acoustical preprocessor for the automatic detection of pathologycal speech

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

In the current health care system, people with disorders in connection with speech producing have to wait weeks or months to get a specific diagnosis because of the number of medical examinations they have to take. This wasted time can lead to abnormality in speech producing organs (laryngitis), or it can lead to more serious consequences (throat cancer). Medical and acoustical research has proved that vowels have some physical parameters through which we can distinguish heathy and pathological voices.

My task was to create the prototype of an interactive acoustical analysis program which can examine continuous speech in real time. The idea is that the patient reads a sentence or text for the software and then the software finds the border of the vowels and calculates four basic parameters: fundamental frequency, fluctuation of the fundamental frequency (jitter), fluctuation of the amplitude (shimmer) and the ratio of the energy of the harmonic and noise components (HNR). Then from the calculated values, the program classifies the voice into disorders. This way, it is possible to give information to the patient about the possible disorders and start the required treatments as early as during the first visit.

First of all, I implemented the algorithms used by a non-real time analysis software (Praat) in my program’s preprocessing module to calculate the acoustical parameters. I compared my program’s and Praat’s algorithms with statistical tests and with their statistical parameters. The results were promising. My implemented algorithms calculated the previously mentioned four parameters correctly and consistently.

After that, I developed the program further to work with real time continuous voices. The user reads a predetermined sentence, from which my program’s forced alignment module determines the position of the vowels (I examined the vowel sound ‘e’). Finally, the preprocessing module calculates the parameters and visualizes their value them for the user, who gets feedback just a few seconds after the reading process.

The most important plan for the future is the implementation of the classification module.


Please sign in to download the files of this thesis.