Complex networks exist everywhere. In the last decade structural and dynamic properties
of real-world complex networks have been studied across many scientific disciplines (e.g.
physics, social science, computer science, molecular biology or neurology).
Beside the remarkable research interest, complex networks became mainstream in business
intelligence problems too. Churn prediction in telecommunication industry or fraud
detection in financial sector or online behavioral targeting require social network analysis.
The task of the 2013 KDD Cup provided by Microsoft Academic Search challenges
participants to determine which papers in an author profile were truly written by the
author. The ability to search is fundamental for modern research. Academic and industry
researchers rely on to understand what has been published and by whom. The dataset
includes more than 50 million publications and 19 million authors so the task can be
considered as an edge classification problem in a large-scale bipartite graph.
As the captain of the beluga&razgon&ivo team, I present our solution for the author-paper
identification challenge which achieves 0.976 Mean Average Precision and ranks
11th on the Private Leaderboard. Although the Cross Industry Standard Process for Data
Mining is followed, the Data Understanding and Data Preparation and Modeling steps are