Social data enrichment

OData support
Nagy István
Department of Telecommunications and Media Informatics

Nowadays, those data mining solutions which are based on getting information from social networks, have gained more ground recently. The analysis of written contents is a segment of focus on creating personal profiles, therefore this tool makes possible to know the person detailed and predict his behavior. Social networks on the Internet are inexhaustible sources for this kind of activity. The resulted personal profiles could help to access other and more complicated data mining projects.

The goal of my thesis is gathering data from social networks in order to create personal profiles, matched with the information of the given input. The processing progress and data mining are specialized on English and Hungarian profiles. My task was to gather, and analyse information, unfolding relationships, and furthermore to describe potential development directions.

During the data mining progress, three social network sites were used. The working project was a good chance to get acquainted with Twitter, LinkedIn and Instagram and structure of their user’s profiles. I also managed to know the method of accessing information. A so-called data mining software was architected and implemented, which is able to gather and sort information from these three social network sites. Then, in order to improve the quality of gathered information, I made some data preparing. Meanwhile, I took care to save just the profiles with sufficient quantity of information and filter the unnecessary.

Formulas were created to unfold relationships, which helps to connect the gathered information and the input name. With using heuristics, I ensured, that the software should be able to decide when more profiles are matched with the input name at the same time. Next, I tested the efficiency of my data enrichment software with real information. Incidental weak points and cross-sections were located. I succeeded in finding the most suitable solution to connect data to the given person by sampling several alternatives.

Finally, I reviewed the possible development directions of the finalized software, including the gathering module, data preparing and the actual analysis. I made a proposal for repairing the uncovered deficiencies, inaccuracies of the prototype and proposal for increasing effectiveness.


