Psychoacoustical models in audio compression

OData support
Supervisor:
Dr. Rucz Péter
Department of Networked Systems and Services

The objective of this thesis is to create a lossy audio encoder in Matlab environment. To achieve this, I had to acquire new knowledge about the functionality and implementation of such encoders.

This document first describes the characterization of human hearing and focuses on the deficiencies that can be exploited in psychoacoustics-based lossy audio compression methods.

The next step is to implement the encoder in Matlab. After reviewing the corresponding literature, I found a proper documentation, based on which the lossy encoder was created. Using this encoder I introduce the aforementioned properties of human hearing and the applied psychoacoustical signal processing methods through simpler sample patterns and a music time slice.

I also made changes and improvements on the encoder, so it can not only code shorter time slices, but is able to code arbitrarily long sound samples.

The subjective results obtained by listening tests did not prove to be sufficiently informative to rate the encoder algorithm, thus, an objective quality measurement method needed to be estabilished. The PEAQ, which is an audio quality analyzer model is suitable for this need. The thesis contains the history of the development of PEAQ and discusses the operation of an existing implementation. After minor modifications, this Matlab code became suitable for testing my encoder. Multiple voice samples encoded with the same encoding parameters can be tested using the quality measurements calculated by PEAQ in order to examine the capabilities of the encoder algorithm.

I realized a significant deficiency of the reference encoder, I proposed an improvement to overcome the limitations, and then implemented my solution. Using the modified encoder, the same sound samples can be re-encoded objective measurements can be used to validate the test results.

The completed audio encoder and this document that discusses it in detail can also be useful for further educational purposes, as all steps of the encoding are clearly followable, and the intermediate results of each step can be visualized and examined separately.

Downloads

Please sign in to download the files of this thesis.