Evaluating clustering algorithms

OData support
Supervisor:
Kovács Ferenc
Department of Automation and Applied Informatics

Intensive research on clustering algorithms started in the middle of the 20th century. The development has been continuous since then and nowadays various clustering algorithms exist. However, these algorithms have distinct features with regard to the size, dimensions and type of the data that they can handle. Deciding which algorithm fits a certain problem best is a non-trivial task. Moreover, the parameterization of the algorithm as well as the choice of the distance function can also be challenging.

The topic of this thesis consists of introducing, implementing and testing clustering algorithms. For these tasks a system that can load, display, save and process data is needed. Therefore the algorithms are implemented in the Knowledge Modeling and Data Mining (KMDM) system, which is a data analytic and modeling software. The KMDM is an already working application that is under development at the faculty of Department of Automation and Applied Informatics. This thesis pioneers in implementing clustering algorithms as well as cluster validation and visualizations methods into the KMDM system. As the KMDM is an ideal environment for these tasks, I consider implementing clustering into the system very useful.

An advantage of the KMDM system is that it keeps computational details - such as the data flow between the different processes - in the background, and it provides a graphical interface for running the projects. Therefore I paid a lot of attention to implement the routines that are not directly related to the clustering in a manner that they can be accessed by different tasks. When implementing the clustering I aimed to create an environment that masks the KMDM system and therefore the implementation of the clustering algorithms is easier. In the past year I developed the clustering algorithms as well as the different means to display the results while focusing on the general goals defined above.

Downloads

Please sign in to download the files of this thesis.