The goal of my thesis was the comparison of local causal discovery algorithms, and their application on real world data.
Finding the optimal solution from observational data is NP-hard, therefore I resorted to a heuristic Bayesian score based algorithm, and compared its results with the results of the chi-square based statistical tests. The local causal discovery consists of two steps. First, the dependencies between the variables are discovered by chi-square tests or score based algorithms. This step is followed by the identification of the causal direction of said dependencies. I implemented three, non-exclusive algorithms for this purpose. These are the LCD and V-structure finder constraint based tests and the Y-structure finder algorithm. Also, I made the use of a priori knowledge available.
My goal was to join these algorithms together in one compact, well optimized and parallelized program that has an extensible framework and can be utilized to analyze large data files efficiently.