Developing operator in RapidMiner to manage python based analytical methods

OData support
Nagy Gábor
Department of Telecommunications and Media Informatics

There are two very popular data mining technologies, one of them is

RapidMiner, the world-leading open source system for data mining and the other is

Python, a general purpose high level programming language, which has a wide range of

data mining modules. My task was combining these two technologies in order to

provide a new approach of data mining with RapidMiner. I had to find an interface that

creates a bridge between RapidMiner and Python, therefore RapidMiner is able to

integrate the widely used python modules into its data mining process.

My work consists of looking for the right technology in order to create an

appropriate interface between these technologies. Developing RapidMiner operator in

Java programming language and developing python objects in Python programming

language were also core parts of my tasks. In addition my work included conversations

with foreign developers of RapidMiner and other projects’ because of the new approach

of the problem and because of the publishing of the results. During my work I took

advantage of open source code of RapidMiner, and the source code of another open

source project called Pyrolite and the analytical methods of a python machine learning

library called scikits-learn. So consequently RapidMiner’s data mining features has

been expanded, and missing functions can be implemented and thanks to the high

performance of these third party cpython modules the execution time may decrease. The

documentation of the project can be used as a valuable know-how and the created

project is a useful extension of RapidMiner’s features, because the project and the

documentation will be published.


Please sign in to download the files of this thesis.