Presently more and more content is being shared on social network, therefore this content is becoming a significant source of information. Despite this fact, information retrieval on these systems is highly limited. The web search engines can’t access the content of these pages because of user authentication, and social networks are not providing search abilities in the content, only among users. To solve this issue I have designed and implemented a search system – with a crawler specialized for content generated by social networks – which returns an ordered (ranked) list as result for the users’ question.
This document briefly summarizes the theory of the information retrieval, defining the steps and the tasks of what is needed to put information retrieval into practice. The two most common models, the Boole- and Vector model, are introduced. Under the last one, the tf-idf weighted method is described in more detail, which I used in the realization of task implementation.
To successfully realize information retrieval on social network, it is necessary to understand the structure of the content shared on these social networks. That’s why the theoretical part of this document presents the currently available search methods on social networks, the data structure of shared content, and the users’ behavior. All these are needed to improve an efficient information retrieval method.
The implementation was specialized for Facebook. It contains the breakdown for documents and crawling and searching algorithm working on these items. In the design of the algorithms and data structure, which are providing the search functionality, the reusability received high attention to make it easier to transform these functions and use them on different social networks (not only on Facebook).