Deep neural networks for object recognition in image database

OData support
Dr. Gyires-Tóth Bálint Pál
Department of Telecommunications and Media Informatics

The research on machine vision dates back to decades. One of its cornerstones is to make the system able to detect, recognize and segment visible objects. Detection means perceiving the presence and position of an object, while recognizing means the identification of the object’s category or type. And segmentation is finding the boundaries of the objects - as a result of the two previous operations - on an image. Such features can already be found in mobile phones, video surveillance software and autonomous cars.

The purpose of my thesis is to get acquainted with the modern object detection methods using deep convolutional neural networks and to develop an own system for the same function. The system for implementation is made of two distinctive parts. The first one is a command-line scaling tool which is able to merge data from different image databases to a single, unified one. The second half of the system is a convolutional neural network for object detection trained with the unified data mentioned above.

I started my work by finding and interpreting the most recent and important papers on Convolutional Neural Networks. After examining the theoretical side of the topic, I began the implementation with the prototype of the scaling script. After the completion of its final version, I worked on the parallelization of the software. While working on the convolutional network, first I created a compact version of an existing and moderately complex method, and then by seeing the results of this version I changed my approach and made a simpler concept based on an older object detection network.

Besides developing the scaling script successfully, I implemented a metric often used for object detection. In addition, I prepared the basis of the rethought system by making necessary alterations in the data processing part and after that I parallelized these modifications for better performance.


Please sign in to download the files of this thesis.