The application of information theory based methods in data mining using a Bayesian approach

OData support
Supervisor:
Dr. Hullám Gábor István
Department of Measurement and Information Systems

In software industry data mining is a new dynamically developing frontier. Its applications vary from telecommunication, through automation, to medical engineering. Its main purpose is to process large data sets, and through it, find new, non-trivial dependencies. Various methods and metrics can be used to find these dependencies. One of them is the Mutual Information – known from Information Theory.

Mutual Information is a symmetric metric, which – as its name suggests – shows the quantity of Information that the variables share. The quality of the computed metric – similarly to many other – depends much on the size of the input data set, and on small sizes can be inaccurate.

A possible solution to this, can be the Bayesian approach: in which the queried parameter is shown as a variable, and the data as given and fixed.

This document will show a previously presented method to calculate the mutual information from the Bayesian approach, present an implementation on Java platform and evaluate its possible uses in structure learning, on generated sample data sets.

Downloads

Please sign in to download the files of this thesis.