Active learning methods at image classification

OData support
Dr. Szűcs Gábor
Department of Telecommunications and Media Informatics

The comparison of machine learning and human learning is an exciting topic, in which these processes can be compared. The accuracy of the image classification were measured based on the number of training images. Machine learning requires large amount of labeled training data, while humans are able to learn from much less or even a single image. The basic and biggest difference between the two processes is the different degree of priori information. One of the fundamental aims of our time is to maximize machine efficiency, thus saving a lot of human effort. In the case of machine learning, it has great importance how useful a labeled image is in terms of learning. Active learning is known for iteratively querying the unlabeled data set to select the image which label is most desired by the learning algorithm. There are numerous querying strategies that are still under development in present day.

My work had two objectives, once measuring and comparing the performance of human and machine learning, regarding the classification of visual content. To do this, I have elaborated a precise measurement plan that allows the comparison on equal terms, disregarding the knowledge and experience of people. Another goal was to create an active learning system which using a two dimensional query strategy. The proposed method is based on uncertainty sampling, complemented by a distribution analysis on the labeled data set.

In my paper I present the measurement plan for comparison, and the image classification system which at first was implemented with passive learning, and then an active leaning module was integrated into it. The human learning and machine learning were tested on 5 different images sets, which were manually collected and labeled. After that, the results of these tests were evaluated and compared. Finally, in order for a more extensive examination, further tests were carried out on larger data sets (PASCAL VOC2010 and Caltech 101) to separately compare and study the passive and active learning systems.


Please sign in to download the files of this thesis.