In case of annotating multimedia content, it would be really helpful to have a system which can automatically annotate certain content pieces and also can automatically classify them. In this paper such a system gets introduce, which can process videos, can classify them by genre, can detect advertisements in them and also able to create indices which can be used for subsequent searchability purposes.
The first step of the implementation is the parting of the video to picture and audio parts, then separating it to scenes which is the base of the subsequent detection and annotation tasks. The system also expects a text based summary from the content of the video, based on which it builds indices, which will be the base of text based search for videos.
For the genre based scene classification I used the visual features of the pictures, as well as the audio features of the audio part of the video extracted by using Matlab functions. To describe the audio features I calculated thirty-five feature vectors, each of which I had their mean value taken into consideration upon classification. To make contact between Matlab and my system written in Java, I used MatlabConsolCtr which is a free of use Java library.
To implement the genre based classification I used ensemble leaning. From the possible algorithms I chose the Random-Forest, before which the video can be submitted to advertisement detection. The advertisement detection is done by template based matching. If the advertisement detection is active, then the scenes which were detected as advertisement will not be classified by the Random-Forest.
In the application searching in the stored information is also possible. The search can be text or picture based. In case of text based search the system will search for the video which contains the search expression given by the user. The result of the search is the list of documents which contain the search expression, or a part of it ordered by relevancy in descending order. In case of picture based search the system will extract the features of the picture given by the user and compares it to the stored features taken from the pictures of the already processed videos and returns the scenes with the highest similarity.