Voice activity detection based on prosodic features

OData support
Supervisor:
Dr. Szaszák György József
Department of Telecommunications and Media Informatics

In my work I focused on Voice Activity Detection (VAD). With VAD we can separate the speech from the noise and we can separate the speech and non speech. It can be used in speech communication systems, speech decoders, speech recognizers, mobile communication systems and in real time speech communication over the Internet and naturally in speech recognition. Even if we say the VAD algorithms are relatively safe and efficient, we have to conclude that, the trustiness depends on the environment noise level. In my work I focused on VAD for speech recognition based on MFCC and prosodic features.

In the first step, I started to study the VAD at large and after that I studied the prosodic features too. Hereafter I implemented a VAD algorithm with two different computer programs. The first was MKBF, which is the university’s own development (based on Windows). The second program is HTK, which has two versions (based on Windows, or based on Linux). I achieve the VAD with the Linux based version.

The initial VAD mechanism is based on low energy level signal associated with non-speech and the high energy level signals associated with speech. To reach better performance in noisy environment, the initial algorithm has been modified in a lot of ways, but using prosodic features for VAD is rarely addressed. In my work I will focus on prosodic features for VAD for speech recognition. To achieve my goals, I had to study the VAD in detail. An overview of this is provided in the Introduction.

In the second step, I have to implement VAD with the HTK program. In the first case I worked with prosodic features, and after that without prosodic features and the third version without energy and prosodic features.

The implemented algorithm was tested and compared with the basic algorithm’s results. At the last step through comparing the VAD algorithm I assess the results and I suggest further development of the implemented system.

Downloads

Please sign in to download the files of this thesis.