Classification of visual contents in a non-artificial environment is considered a hard problem in image processing and machine learning. Besides the numerous achievements in research, a lot of applications have been developed in the past few years using new algorithms (obstacle detection in cars, computer aided medical systems, face recognition, lie-detector using infrared camera, etc. ), which show the importance of this subject. The human brain can tell easily whether some property holds for a photo, should it be an object or a concept (winter, people on holiday, happy moment etc.). However, the today known, computationally complex algorithms still have a great error rate, because of the visual variety of these objects and concepts. The recognition of hundreds of thousands of categories at an appropriate precision would provide opportunity for numerous applications (such as more precise content based image retrieval, digital animal identification, recognition of the event or environment on photos, detection of rotten food etc.), therefore in the past few years the interest in image classification research has been increasing. Since the task is well defined, the solutions can be compared to each other in given circumstances. Thus prestigious image classifying competitions have been organized for more than five years (e.g. Pascal VOC, ImageCLEF Photo Annotation).
Results of the competitions show that major progress has been achieved in image classification methods. The reason of this, besides the increasing computational capabilities, is the improvement of low-level image features and bag-of-words methods. In my thesis I present the techniques used in state-of-the-art systems, both in the field of low-level image features and high-level semantic descriptors (e.g. the Fisher-vector based on Gaussian Mixture Model and the K-means Super-Vector).
The individual methods provide variable results in different categories, their advantages should be combined. I suggest a method for this, in which we create an aggregated kernel matrix from the different semantic descriptors. This general approach provides opportunity to combine information from any kind of modality, optimally for each category. For the evaluation of the method, also the implementation of new techniques in image classification is required. I test our system on the database of Pascal VOC 2011.