Real-time speech-diagnostic system improvement

OData support
Dr. Vicsi Klára
Department of Telecommunications and Media Informatics

The topic of my thesis is the improvement and testing of a speech diagnostic module, then integrating it into a web-application. The module is able to process speech files, calculate speech parameters, and with them make decisions regarding the medical state of the speaking person. The human speech is a very complex and complicated brain process, thus the speech of the subject who suffer from a neurological or psychological disease, such as Parkinson’s disease or depression, is different than the speech of a healthy speaker. The speech of a sick person has a lower tempo, deteriorating articulation, which causes the speech to become more monotonous, and lowers the rate of rapid and significant changes.

During my thesis I created an algorithm, which calculates the ratio of the transient (RoT) parts of a speech file. I analyzed whether RoT is affected in case of depression, dysphonia or Parkinson’s disease, and I found that the mean value of RoT decreased by 6 to 8% in speech files recorded by patients with these diseases compared to healthy clients.

In the second part of my thesis, I created a second algorithm, calculating the auto- and cross correlation matrices of vectors. According to several studies, the correlation structure of two or more speech parameters can be successfully used for separating healthy speech, from the speech of people suffering from dysphonia, Parkinson’s disease or depression. I tested the created algorithm using the first three formant frequencies, and showed that the created structure can be used effectively for speech categorization.

After testing, the created algorithms could be used in the speech diagnostic module. The module can be used to calculate several speech parameters, and based on the calculated values make a prediction if the speaking person is healthy, or is suffering in one of the known diseases. The module is be used in the business logic tier of a web-application. The web-app has to be able to create speech recordings, to process the speech with the diagnostic module, and to display the results.

In the first part of my thesis, I started planning and designing the web-application, while in the second part I created the implementation and conducted several different tests.


Please sign in to download the files of this thesis.