User and customer information is sensitive to personality rights. While storing data in a closed system, the data controller (who has captured the data) will have free access to customer identifying information (eg name, address). However, if you want to pass the data to a third party (for example, for analysis purposes), the data is anonymised. The essence of anonymisation is that sensitive data is not passed on to a third party.
This issue is particularly important in a big data environment where, due to the high amount of data, it is challenging to anonymize that the entire data set is never fully available (because the data is flowing continuously, or the amount is so large that the entire scan can not be solved). The task can be used in practice on the Azure platform for anonymization, so that the entire data set is not accessed by the algorithm and processed in increments.
At the beginning of my thesis, I look at the reasons why anonymisation is needed, and I present it with a concrete case study, and I present the theoretical basis of anonymization (including: k-anonymity, l-diversity, t-closeness) and a brief insight into Azure platform.
My main goal to write a model / software which is easily customizable / configureable, to be able to work with any data set, provide optimum performance, be able to incrementally anonymize data, and to minimize the data set so that it is the least distort the analyzes that are made from the anonymous data set.