Deep Learning for Real Time Applications

OData support
Hadházi Dániel
Department of Measurement and Information Systems

In recent years, deep learning methods are outperforming previous state-of-the-art

techniques in several fields, with most prominent results in computer vision. Accuracy is the major

goal for most of the research in computer vision; however, the correctness of real-time applications

not only depends on accuracy but also on response time. The primary research question of the

presented thesis is: "How can deep learning be applied to real-time applications?". It aims at

understanding recent advances in the field of computer vision, focusing especially on object

detection using deep learning techniques. Sub-questions such as “What type of network architecture

can be used?”, “Which framework is better for development?”, and “How to pre-process data and

modify different architectures to get the real-time performance?” are answered through a literature

survey and by implementation results of the thesis.

The presented thesis involves a study of literature about the computer vision related tasks

such as image classification, segmentation, and localization with more emphasis on object

detection. A category of neural networks, also known as convolutional neural networks, are

specially designed to process images, hence they are better suited for computer vision related tasks.

The thesis also contains a brief study of CNN along with different CNN architectures such as RCNN, Fast R-CNN, Faster R-CNN and SSD. The performance of architectures in terms of accuracy

and speed is studied and compared. This study extends to understanding the ecosystem around AI

development, including different software frameworks, hardware resources, open-source tools and

developer communities support. A number of experiments containing modifications in the default

YOLO architecture are performed and their performance is presented. Finally, a modified CNN

architecture is developed which is better suited for a custom dataset provided by “Ericsson R&D”.

The presented model is supposed to better suit memory and speed requirements of real-time

applications. A separate model is trained on Mobilenet architecture, using Tensorflow framework

and comparison of these two frameworks and ecosystem around them, is also presented as part of

this thesis.


Please sign in to download the files of this thesis.