Improvement of Data Cleansing Algorithms

OData support
Dr. Szikora Béla
Department of Electronics Technology

Quality data mean competitive edge for a company. One part of this often long lasting process is the retroactive initiation of the cleanness of data. Various procedures are required for efficient completion of this task according to the type of data.

In this thesis I aimed at working out and presenting such kind of procedures that can be used effectively to solve the following data cleaning problems: propriety of company names, person names, addresses, and filtering out duplications between records. It is inevitable for this to know the typical data quality problems and the basic data cleaning processes used widely to parry them.

After making a close study of the subjects (cleaning of company names; corrections of the person names, and other elements not connected to them tightly; basic cleaning of addresses from UK and Germany; dupe catching based on partial parity) it was possible to elaborate algorithms in PL/SQL language, and test them on representative sample to check up their operation.


Please sign in to download the files of this thesis.