Automatic selection of optimal clustering or classification algorithm and parametrization based on dataset features

OData support
Dr. Pataki Béla
Department of Measurement and Information Systems

The evolution of informatics led to growing quantities of datasets, whose comprehensive understanding and extraction of useful information became a hard task, which needed automation. One of the possible tasks is the classification of the instances present in a dataset, a procedure that assigns each instance or case to a class. Classes containing instances of similar data greatly helps the discovery of regularities in the data or in making predictions. Classification of complex datasets proved to be hard even for human experts, not only because the huge quantity of the data, but also because of the multiple automatic classification methods developed along the years, which work effectively on different types of datasets. Thus the need arose for an expert system which can automate fully or at least support with automated procedures the optimal algorithm selection.

My thesis is about the design, implementation and testing of such system, which is capable to automate the classification algorithm selection given an input dataset. An important design criteria was the systems expandability and portability. Since the systems main purpose is the selection between algorithms, the algorithms themselves were taken from publicly available libraries.


Please sign in to download the files of this thesis.