Data mining framework and parallel data mining algorithms

OData support
Supervisor:
Dr. Dudás Ákos
Department of Automation and Applied Informatics

The famous phrase, written by John Naisbitt in his book Megatrends:

"We are drowning in information but starved for knowledge." is gaining more relevance now, than it perhaps did in 1982 when the author penned this quote.

Since the role of data has changed in the past few decades, the industry must accept that and change with it, if it wants to keep up and adapt to the ever-changing market. Gaining insight into a huge amount of data (getting information out of it) requires cleverly designed algorithms and immense computational effort.

This thesis examines the possibility of parallelization of certain data mining algorithms, while proposing a fairly simple application framework within the .NET framework, which could serve as a dynamically extensible host for data mining libraries. Beside the host, we also provide a few libraries with the framework itself, which are implementations of famous data mining algorithms. These are single-threaded and multi-threaded versions of the exact same methods, showing the above mentioned possibility of leveraging the computational power of multiple processor cores.

Depending on the nature of the algorithms we also looked at the suitability for massive parallelization with general purpose GPU-Programming, using NVIDIA's CUDA platform.

The proposed framework also gives its user the opportunity to test and see what the selected method does on a simple data set, entered by the user after choosing the appropriate designer view. The set of designer views is similarly extensible like the already described method's is.

We have validated our algorithms with the help of online data sets (ex.: accidents, iris, etc.), thus making sure, that the implementations are correct and work as expected.

Downloads

Please sign in to download the files of this thesis.