One of the main reasons the informational revolution is taking place is the emergence of rapidly growing datasets that enables us to extract more information from the given field than we could before. This evolution had a huge impact on human biological areas as well: a good example is the mapping of the human genome.
In my research I studied the area of gene regulatory networks: in the last few years, the appearance of DNA microarray technology has led to the availability of huge datasets on different websites, such as the Gene Expression Omnibus.
My development is aimed to process data acquired from such sources. I created an application that tries to reconstruct the actual connections with continuous Bayesian Networks. I implemented a scoring function that is able to calculate the likelihood of different sets of parent nodes based on the data. I try to find the exact structure by using two of the most common local searching algorithms: hill climbing and simulated annealing. In addition to that I introduced a regularization technique as well, which allows us to keep the number of edges in the graph to a minimum.
To assess the performance of the algorithms I introduced various evaluation metrics and examined how they perform with different amounts of data. I compared the two searching algorithms and came to the conclusion that in case of this unique problem hill climbing may perform better. Furthermore, I analyzed the effect of regularization on the test results, and offered a method to find the ideal weight of the regularization term.