Classification problems in large-scale networks

OData support
Supervisor:
Dr. Buza Krisztián Antal
Department of Computer Science and Information Theory

Complex networks exist everywhere. In the last decade structural and dynamic properties

of real-world complex networks have been studied across many scientific disciplines (e.g.

physics, social science, computer science, molecular biology or neurology).

Beside the remarkable research interest, complex networks became mainstream in business

intelligence problems too. Churn prediction in telecommunication industry or fraud

detection in financial sector or online behavioral targeting require social network analysis.

The task of the 2013 KDD Cup provided by Microsoft Academic Search challenges

participants to determine which papers in an author profile were truly written by the

author. The ability to search is fundamental for modern research. Academic and industry

researchers rely on to understand what has been published and by whom. The dataset

includes more than 50 million publications and 19 million authors so the task can be

considered as an edge classification problem in a large-scale bipartite graph.

As the captain of the beluga&razgon&ivo team, I present our solution for the author-paper

identification challenge which achieves 0.976 Mean Average Precision and ranks

11th on the Private Leaderboard. Although the Cross Industry Standard Process for Data

Mining is followed, the Data Understanding and Data Preparation and Modeling steps are

highlighted.

Downloads

Please sign in to download the files of this thesis.