Groups searching in large image sets by passive and active learning

OData support
Supervisor:
Dr. Szűcs Gábor
Department of Telecommunications and Media Informatics

Nowadays photography is more popular than ever as almost everyone has a smartphone with a built-in camera which enables them to take photos and capture every moment of their life. These photos can be uploaded to photo-sharing websites e.g. Instagram, Facebook or to the cloud. We all like to keep and store our photos organized (e.g. according to categories- photos of our friends, pets, food/meal), but managing a large photo-set can be long and time-consuming, therefore there is a huge demand for sorting algorithms that can solve this problem. These algorithms gather together photos with similar features and as a result, organized collections of similar photo-pairs (or even bigger groups) will emerge. In my paper, I attempted to sum up my research for similar photo-pairs/groups in large datasets, which can be useful for those users, who want to delete almost the same images or would like to find similar images, thereby manage the images easier. In my paper, I would like to present the way I defined the similarity function, so the similarity measure in case of photo-pairs (groups) is high while it is low in every other case. Prior to computing the similarity value, the information of the images must be compressed. In other words, we have to extract features from the images. I used Scale-Invariant Feature Transform (SIFT) for this task. After that, I will introduce a pair-searching algorithm which I have developed, that helps correct mistakes of the similarity function and can find hidden similar pairs, thus improving the efficiency of sorting and organizing. To evaluate my method we entered an international image recognition competition and I tested my solution in whale individual recognition. In this competition, we had to collect image-pairs from a large unlabeled set of images. We had to return image-pairs of the same specimen, the base for recognition being the unique pattern of the whale's fluke, characteristic of the given specimen. Despite the difficulty caused by various lighting conditions and the foaming water, I managed to achieve the best result in the competition. I improved my solution using active learning, in which a human expert helps the system finding groups in the database.

Downloads

Please sign in to download the files of this thesis.