Object recognition and tracking on video recordings with deep neural networks

OData support
Supervisor:
Dr. Gyires-Tóth Bálint Pál
Department of Telecommunications and Media Informatics

Nowadays, the role of artificial intelligence is increasing; it has more and more areas affecting our lives, the intertwining of humans and digital devices is becoming more and more common in our modern society. With machine vision - a branch of artificial intelligence -, our devices can detect the world around them, carry out automated tests and execute tasks needing coordination. Object detection algorithms serve the same purpose, but their real time usage is restricted due to high hardware needs.

In this work, I am trying to develop a deep learning based solution that can reduce this resource requirement. The solution is based on an existing object detection algorithm that will provide the input for the system. The introduced recurrent solution with Long Short-Term Memory (LSTM) cells tries to predict the future location of objects based on their class and their previous locations. Thus, it would not be necessary to run the object detection algorithm all the time, it would only need to be re-run after a certain period. The proposed network belongs to the time series prediction type of the LSTM networks.

To find the ideal model, I will use different metrics (MABO - Mean Average Best Overlap, mAP – Mean Average Precision). The task is made complex by the many variable factors such as the number of classes to train for, the number of input data (how many previous data should serve as input), and how far to predict in the future.

Briefly, I am going to predict the movement (which forms time series) of objects with a LSTM network, which I will evaluate with MABO and mAP metrics. The results predicted by the model are compared with the results determined by linear regression.

Downloads

Please sign in to download the files of this thesis.