Application of Generative Adversarial Networks for Ultrasound Tongue Image and speech processing

OData support
Dr. Csapó Tamás Gábor
Department of Telecommunications and Media Informatics

The Silent Speech Interface (SSI) can be defined as the technology which enables the

synthesis of speech in the absence of an audible acoustic signal. This technology can be

applied in many applications such as: providing a solution to laryngectomy patients, enabling

communication within noisy environments or via silent calls. This thesis addresses the

particular case of SSI using ultrasound images of the tongue as input signals.

In order to achieve our goal, we have chosen the Generative Adversarial Networks (GANs)

[2] as a branch of unsupervised learning techniques in machine learning which are able to

mimic any data distribution and generate data like it. And as a branch of GAN we have chosen

the conditional GAN in order to map our ultrasound images with the Mel generalized

coefficient required to synthesis the speech.

We have prepared our dataset that we are going to use for the training and the testing of

our network and explained the GAN architecture that we have adopted. Finally, we have

presented the objective and subjective evaluation of our approach.


Please sign in to download the files of this thesis.