Voice conversion using modern machine learning

OData support
Dr. Csapó Tamás Gábor
Department of Telecommunications and Media Informatics

Voice conversion (VC) is a relatively new topic for the researchers, its history dates back to end of 1980s. The goal of VC is to change one certain speaker voice in such a way that it sounds like another certain speaker’s voice. There have been different aspects to this topic in improvement period. Some researchers tend to focus representing on the speech signal, which is one of the most important stages of the voice conversion. In other words, encoding and decoding of the speech signal was one of the main interest. However, some researchers see this topic as a feature-mapping task and they mostly focus on this part. They knew their limitations so they created voice conversion applications, which can be good only for some certain applications. Therefore, we do not have such application until now, which we can use in every case. Thanks to technological improvements, researchers could use some algorithms where limitations much lower than earlier methods and computers are able to deal with this huge number of data. These algorithms are called machine learning based algorithms. At this right moment, one of the modern machine learning method is more interested than others by the researchers, which is called deep learning. Deep neural network based voice conversion, which is used in this thesis, is an open source Merlin toolkit. Our contribution to this open source toolkit is to integrate another vocoder known as Ahocoder. In this thesis, I dealt Merlin with the implemented vocoder. My expectation from the deep neural network based voice conversion was to decrease the dependency of source speaker and quality of the voice. To be able to check the dependency, I made some comparisons between earlier methods. These comparisons were subjective and objective.


Please sign in to download the files of this thesis.