In recent years, deep learning methods are outperforming previous state-of-the-art
techniques in several fields, with most prominent results in computer vision. Accuracy is the major
goal for most of the research in computer vision; however, the correctness of real-time applications
not only depends on accuracy but also on response time. The primary research question of the
presented thesis is: "How can deep learning be applied to real-time applications?". It aims at
understanding recent advances in the field of computer vision, focusing especially on object
detection using deep learning techniques. Sub-questions such as “What type of network architecture
can be used?”, “Which framework is better for development?”, and “How to pre-process data and
modify different architectures to get the real-time performance?” are answered through a literature
survey and by implementation results of the thesis.
The presented thesis involves a study of literature about the computer vision related tasks
such as image classification, segmentation, and localization with more emphasis on object
detection. A category of neural networks, also known as convolutional neural networks, are
specially designed to process images, hence they are better suited for computer vision related tasks.
The thesis also contains a brief study of CNN along with different CNN architectures such as RCNN, Fast R-CNN, Faster R-CNN and SSD. The performance of architectures in terms of accuracy
and speed is studied and compared. This study extends to understanding the ecosystem around AI
development, including different software frameworks, hardware resources, open-source tools and
developer communities support. A number of experiments containing modifications in the default
YOLO architecture are performed and their performance is presented. Finally, a modified CNN
architecture is developed which is better suited for a custom dataset provided by “Ericsson R&D”.
The presented model is supposed to better suit memory and speed requirements of real-time
applications. A separate model is trained on Mobilenet architecture, using Tensorflow framework
and comparison of these two frameworks and ecosystem around them, is also presented as part of
this thesis.