Developing Data Mining Framework

OData support
Kovács Ferenc
Department of Automation and Applied Informatics

The research of data-mining and other machine learning algorithms has been the subject of increasing interest in the past decade. When implementing these algorithms, developers face the problem of rewriting several general functions, which can slow down the development process. Such functions include the reading of input data, performance measurement, handling configuration etc. A possible solution could be to use a framework specifically designed to implement these tasks.

The combination of existing algorithms can produce new, more efficient data-mining methods. Data flow models provide a comprehensible way for this process: models can be assembled by connecting stand-alone processing components, thus resulting in a high abstraction level representation.

In my thesis, I design and implement a data flow-based, extensible framework, which allows creating, editing and running data-processing models.

Additional processing components can be implemented and imported into the framework in a simple way, using the developer API. The repeating tasks (such as serialization, connecting components etc.) are dealt with by the framework, so developers can focus on the implementation of the main functionality. Besides the basic features, the framework provides some additional services. These include the handling of metadata, validation and performance measurement.

During the development of the framework, some data-processing components have also been implemented, which can be used in real world scenarios. The main goal was to provide some examples of the usage of the API. Another important objective was adjusting the API to developer needs: the development of these components helped discovering some usability issues and requirements against the API.

Models can be created with an intuitive graphical interface. The configuration of the model includes several aspects, which are presented to the user in a clear, separated way.


Please sign in to download the files of this thesis.