The main topic for this thesis is privacy protected data mining. At the beginning a comprehensive analysis can be found about the current landscape in the field. There is detailed reasoning about the necessity of the topic, introduction to privacy both on the legal, social and information side. Measures of anonymity are presented in detail from multiple perspectives.
Naturally most currently used algorithms and methods are also included. First a thorough examination of the randomization methods, both perturbative and non-perturbative. These methods try to protect anonymity with data manipulation and query restrictions. Latter the secure multiparty computation algorithms and principles are presented. These methods are concerned with distributed data mining both on vertical and horizontal data in different settings and with different data problems.
Practical tasks carried out and documented in the thesis are dual. The first part is the evaluation of different randomization techniques based on the level of anonymity reached and efficiency of the mining procedure. The measurements used, the methods implemented and the results of the tests can all be found in the document.
The second task was building an SMC data mining system with an SMC data mining algorithm that would be able to solve the task of classification in a horizontally distributed environment with multiple parties trying for a joint mining project. The test dataset, the testing method, and the results are all presented just as the specifications and implementations of both the algorithm and the SMC system. The SMC method in question is a new approach to the problem: Using K-means as managed learning algorithm for classification all with preserving high level anonymity and providing k-anonymity, k being a user parameter.
At the end of the thesis conclusions and results are presented.