Model-based intelligent data cleaning and analysis

OData support
Gönczy László
Department of Measurement and Information Systems

During the processing and analysis of data coming from information systems exploratory data analysis is an important step establishing further statistic examinations. The knowledge of the domain under analysis is a necessary requirement of the efficient data cleaning and analysis process. However, the data analyst and the domain expert may be different persons.

Therefore the focus of my research is how to utilize the knowledge of the data analyst and the domain expert to the simplicity and speed-up the difficult data cleaning process and to facilitate the data analysis.

High-level domain models can be captured by ontologies which also support the checking of the formal description of these models. During my work I investigate how the knowledge of a given domain (e.g. correlation between metrics, knowledge of topology connections) stored in ontology can support the data process, how further analysis steps can be supported and how the knowledgebase can be extended by information gathered by data analysis.

In my research I evalute the case study of the performance analysis of virtualized infrastructures and IP multimedia subsystems based on real measurement data. I use graph-based databases to store the knowledge base and the R statistical environment for data processing. I examine how to clean and prepare data independently from the data source and the database technology to explore new information during the data analysis.

The implemented software is independent from the data representation and can be applied with other analysis tools as well. Moreover it supports the investigation of the effect of changes of the analysed system or the measurement methods by providing traceability mechanisms between system models and measurement data.


Please sign in to download the files of this thesis.