Enhanced Text-to-Speech features on mobile devices

OData support
Dr. Gyires-Tóth Bálint Pál
Department of Telecommunications and Media Informatics

The widespread usage of mobile operating systems on smart devices has rendered the information access a semi-real time activity. However, common daily tasks are in many cases preventing us from using our mobile devices appropriately (e.g. mobile phone usage while driving a car). To address this problem, text-to-speech (TTS) applications are commonly used on the latest generation mobile devices. This thesis proposes new, software-based features for increasing the speech synthesizing capabilities of current TTS application without modifying the complex algorithms of the TTS itself. The concept is presented in the form of a standalone Android application, which collaborates with a TTS application developed by the BME-TMIT Speechlab.

Many successful mobile applications use the benefits of community-based contributions. The first part of the thesis introduces a new, community-based ’special word dictionary’, where the users can actively propose new special word-resolve candidate pairs. The items are synchronized with a common server, which distributes the content in real time. The dictionary entries are used for modifying the actual text input of the TTS application to achieve the best possible output. The second part describes a platform where the users can effortlessly try and download voice packs through an FTP connection built into the Android application. The increased selection of these voice packs can effectively improve user experience while providing a way to test different business models regarding TTS applications.

As part of the work, an online survey was completed to gain more information about current TTS usage habits and user interest. Based on this data, the thesis reveals some aspects of the potential in TTS applications and also provides guidance for future research carried out in this direction.


Please sign in to download the files of this thesis.