Anonymization using Spark

OData support
Supervisor:
Dr. Dudás Ákos
Department of Automation and Applied Informatics

The technological advances of our time have created data driven systems. This means that most of the available applications collect data from users unwittingly. Most of the users got used to the fact that their personal data are in the hands of companies. However, the problem arises when our data is also given to a third party. At this time, the owner of the data must ensure that data can be removed from the system only in an anonymous form. This means that no personal information can be disclosed to any subject.

During my dissertation I am dealing with the subject of data anonymization. I am introducing the area of anonymity and the concept of k-anonymity. I implement the Mondrian anonymization algorithm and deal with the challenges of anonymisation in the Big Data and Fast Data environment.

In addition, I am dealing with parallelization and studying different parallelization techniques. I use the Apache Spark framework to implement the Mondrian algorithm and evaluate its advantages and disadvantages.

As a final result of the dissertation, I would like to present an anonymization system capable of performing anonymization on large data sets, and at the same time it can be applied to continuous input data.

Downloads

Please sign in to download the files of this thesis.