Speech-based controller development

OData support
Dr. Berényi Richárd
Department of Electronics Technology

My task was to create an embedded speech recognition system, for command operations. This paper consists of 5 main chapters.

The 3rd chapter gives a view of what theoretical knowledge you need to know, to understand how a speech recognizing system works, what makes it possible to decode the human speech into words. These speech recognizers have two different types: one is used for continuous speech recognition (CSR), for dictating purposes, and the other is to solve control features, by recognizing isolated words (IWR).

The recognition techniques can be grouped by the importance of who is speaking. There is the speaker independent (SI) method, which is very flexible, so it can be used several kinds of ways. And there is the speaker dependent method (SD), which is so much stiffer than the SI, so it’s perfect for biometrical password such as a speech password.

My dissertation is mainly about control systems which use SI recognition, thus there are some mathematical abstractions and algorithms which are indispensable to understand how they work. These are the hidden Markov model (HMM), and for decoding it, the artificial neural networks (ANN) and the Viterbi algorithm.

The 4th chapter contains ideals, how can the speech recognition help us in our daily routine, some professional use, and there are applications which ease the life of people who has some kind of disabilities.

The 5th chapter is about speech recognizing software, and speech modules which use Sensory’s RSC-4128 speech processor. These are the VR-Stamp and the VoiceGP.

The 6th chapter is about the specific usage and setup of the VR-Stamp and the VoiceGP. It’s also about the additional electronic components and software which are needed for them.

The final 7th chapter is about my working application uses VoiceGP. The speech recognition module listens to the words of colors, and if one is recognized than, the developing board lights the LED up with the matching color, while via serial port sends a character to the computer. A running Python application watches these characters, and displays the word which has been said.


Please sign in to download the files of this thesis.