Development of categorical data analysis software

OData support
Dr. Hosszú Gábor
Department of Electron Devices

My exercise was to create a mathematical and statistical software for categorical datas in Matlab.

First of all, I searched in data analyzing and data mining topics. What kind of data types are in Information Technology nowdays, or what kind of subtypes and using intervals these have.

Next I read all about the mathematical side of the data analysing, what kind of algorithms and procedures that I can use at categorical and quantitative datas. I got a bigger knowledge about discriminant analysis, cluster analysis and contingency table analysis.

My software based on contingency table analysis. First of all the goal is to make the software enable to load external vectors, which I can use for making a contingency table (that size become nxm) with observed probabilities.

From this table with the chi-square probe, I can determine many coefficient (Chi-square coefficient, Csuprov-T coefficient, etc.). In general with this coefficient it is able to run the Goodness of Fit probe on a populations, or make Independence or Homogenity test with a given significance level.

In addition of coefficients, it can calculate the general probability, the table’s item number and the degrees of freedom for the user. To increase the user experience I created other functions to the main code, that calculates the critical chi-square coefficients (which we can use for referencing). For the better illustration of the variables distribution, it can represents in three-dimensional bar graphs.

The functions, that I described above, I made it in such a way, where I don’t need to use Matlab’s mathematical user interface, already I can run the analysis on a task-oriented grafical interface.

This software was tested with six vectors, that I exclusively generated for testing and I compared the results with a contingency table internet applicaion’s result. It can prove the right operation of the software.

I already compared with other similar mathematical and statistical softwares. I represented all the possible ways of development.


Please sign in to download the files of this thesis.