Today there is a growing trend of online social network operators sharing sensitive information about their users and relationships with advertising companies, application developers and researchers. In order to protect users’ privacy the given information is anonymized, i.e., the names and other identifiers are sanitized, and the network structure is slightly modified. Unfortunately, this level of anonimization can be bypassed, as recent experiments show that data provided this way can be efficiently de-anonymized.
A social network can be represented with a graph structure where nodes of the graph represent people or other social entities and edges represent their relationships. There are several algoritmhs that can de-anonymize data only using the structural information of the graph, some even at a large-scale. In this thesis, I describe multiple de-anonymization attacks and analyze them in order to choose the state-of-the-art algorithm.
Then I analize the fault tolerance of the selected attack, and discuss related features, including my measurements concerning these features. This algorithm has two phases, the first one being an initialization phase, for which I propose various improved alternatives in my work, and after comparing them with the original procedure I show the new methods’ benefits.
Results of my experiments show that although, regardless of the initialization method, the success of the attack is primarily determined by network topology, however, the initialization algorithm does matter, and with the use of my proposed procedures the „cost” of the method can be reduced significantly. The number of de-anonymized nodes needed during initialization can be reduced to fifth-tenth of the original, as it is shown by the comparison of the Top- and the original method. Finally, I recommend further improvements to the algorithm and propose future work for further analysis in the area.