Privacy Preserving Data Mining

OData support
Supervisor:
Dr. Szűcs Gábor
Department of Telecommunications and Media Informatics

Now we know all data is worth to be stored, since all data could carry useful information and the retrievable knowledge can be beneficial, if we can handle the data smartly. Hence a tremendous quantity of data is available for different organizational units and various data mining issues can be made based on these huge databases, which lead to useful models. However in this manner the handling and usage of these models can be conflicted with privacy preserving laws. They can't be published in the original form, therefore must be anonimized: the expectations about the individuals’ privacy data stored in databases are timely in our days.

Within the confines of my thesis I demonstrate the opportunities of storing data in data warehouses, the privacy preserving methods I have learnt from the professional literature, as well as the implementation and test of these methods on a publicly accessible dataset. By combining the anonimizing methods I have designed and realized my own method, which implies the advantages of the different algorithms. I qualify the efficiency of these methods with accuracy measures set to the classification models and by scaling the achieved privacy. The central question of my research (and of the privacy preserving data mining) is whether an accurate model could be built near preserving privacy, for example by data distortion. I manifest, that decision trees with scarce decay of the classification accuracy can be built on perturbed datasets, so it is worthy to deal with dataset anonimizing for the sake of publishing data mining results.

Downloads

Please sign in to download the files of this thesis.