Application of semantic technologies to design and implement data warehouses

OData support
Dr. Gajdos Sándor
Department of Telecommunications and Media Informatics

Data analytics is one of the most important and rapidly developing topic of the computing nowadays. Data warehouses - which store data in a historized and integrated manner - are one of the principal parts of the analytical infrastructures. Since the data sources are getting more heterogeneous both from semantic and technical angle, the data integration is getting to be more complicated. Although the techniques and methods of integration are already matured, their automation was not in the focus of the technology.

An other relevant topic is in our days the knowledge management because most of the knowledge of the organizations is mostly represented in informal manners, hence the knowledge transfer and distribution is inefficient or almost impossible. From the aspect of building of analytic systems it’s also notable, that - as the practice shows - the knowledge transfer between the business experts and the developers, as well as the semantic data integration for the analytic systems takes a significant part of the data warehouse projects.

The approach presented in my thesis work offers a solution for both of the previously mentioned problems. I’ve designed and implemented a system, which facilitates the automated generation of the data model and the loading and transformation procedures of a normalized detailed storage of a data warehouse, based on the formalized semantic description about a business area and the source database. The system uses OWL ontologies as formalisms to describe the subject area and the technical data needed by the system. The ontologies describing the business areas can be efficiently applied as bases of the organizational knowledge management infrastructure, as well as to reduce the resources needed for data warehouse projects. The subject area of my work was dosimetry, which is a special field dealing with radiation and its effect on the people and the environment.

In the first part of the thesis work I present the data warehouse architecture of Inmon and show the differences from Kimball and Linstedt. Furthermore, I present the most important semantic technologies and their applications in analytic systems, as well as show the most important concepts of dosimetry.

In the second part of the thesis work I present the design and implementation of the system: I describe the most important algorithms, techniques and specialities of the automation of the data model generation and the transformation, as well as present the ontologies, packages and classes, which implement the system. At least, I evaluate and summarize my work, as well as show the possibilities of the improvement of the system.


Please sign in to download the files of this thesis.